Zoltán Nagy

57188789790

Publications - 9

Accelerating unstructured finite volume computations on field-programmable gate arrays

Publication Name: Concurrency and Computation Practice and Experience

Publication Date: 2014-03-10

Volume: 26

Issue: 3

Page Range: 615-643

Description:

In the paper, an field-programmable gate array (FPGA)-based framework is described to efficiently accelerate unstructured finite volume computations where the same mathematical expression has to be evaluated at every point of the mesh. The irregular memory access patterns caused by the unstructured spatial discretization are eliminated by a novel mesh node reordering technique, and a special architecture is designed to fully utilize the benefits of the predictable memory access patterns. In the proposed architecture, a fixed-size moving window of the input stream of the reordered state variables is cached into the on-chip memory and a pipelined chain of processing elements, which gets input only from the fast on-chip memory, is used to carry out the computations. The arithmetic unit (AU) of the processing elements is generated from the data flow graph extracted from the given mathematical expression. The data flow graph is partitioned with a novel graph partitioning algorithm to break up the AU into smaller locally controlled parts, which can be more efficiently implemented in FPGA than the globally controlled AU. The proposed architecture and algorithms are presented via a case study solving the Euler equations on an unstructured mesh. On the currently available largest FPGA, the generated architecture contains three processing elements working in a pipelined fashion to provide one order of magnitude speedup compared with a high performance microprocessor and three times speedup compared with a high performance graphics processing unit. Copyright © 2013 John Wiley & Sons, Ltd. Copyright © 2013 John Wiley & Sons, Ltd.

Open Access: Yes

DOI: 10.1002/cpe.3022

Examining the accuracy and the precision of PDEs for FPGA computations

Publication Name: International Workshop on Cellular Nanoscale Networks and their Applications

Publication Date: 2012-12-13

Volume: Unknown

Issue: Unknown

Page Range: Unknown

Description:

There are a large number of problems which can be accelerated by using architectures on Field Programmable Gate Arrays (FPGA). However sometimes the complexity of a problem does not allow to map it onto a specific FPGA. In that case analysis of precision of the arithmetic unit which may solve the computational problem can be a good attempt to fit the architecture and to accelerate its computation. Numerical algorithm can be implemented using fixed-point or floating point arithmetic (or mixed (both)) with different precision. The aim of the article is not to optimize the numerical algorithm but to find a smaller arithmetic unit precision, which results enough accuracy and fits to smaller FPGA-s. In the paper, one particular problem type is investigated, namely the accuracy of the solution of a simple Partial Differential Equation (PDE). The accuracy measurement is done on an FPGA with different bit width. The solution of the advection equation is analyzed using first and second order discretization methods. As a result we managed to find an optimal bit width for the solution on a specific FPGA. © 2012 IEEE.

Open Access: Yes

DOI: 10.1109/CNNA.2012.6331439

FPGA based acceleration of computational fluid flow simulation on unstructured mesh geometry

Publication Name: Proceedings 22nd International Conference on Field Programmable Logic and Applications Fpl 2012

Publication Date: 2012-12-12

Volume: Unknown

Issue: Unknown

Page Range: 128-135

Description:

Numerical simulation of complex computational fluid dynamics problems evolving in time plays an important role in scientific and engineering applications. Accurate behavior of dynamical systems can be understood using large scale simulations which traditionally requires expensive super-computing facilities. In the paper a Field Programmable Gate Array (FPGA) based framework is described to accelerate simulation of complex physical spatio-temporal phenomena. Simulating complicated geometries requires unstructured spatial discretization which results in irregular memory access patterns severely limiting computing performance. Data locality is improved by mesh node renumbering technique which results in a sequential memory access pattern. Additionally storing a small window of cell-centered state values in the on-chip memory of the FPGA can increase data reuse and decrease memory bandwidth requirements. Generation of the floating-point data path and control structure of the arithmetic unit containing dozens of operators is a very challenging task when the goal is high operating frequency. Efficiency and use of the framework is described by a case study solving the Euler equations on an unstructured mesh using finite volume technique. On the currently available largest FPGA the generated architecture contains three processing elements working in parallel providing 75 times speedup compared to a high performance microprocessor. © 2012 IEEE.

Open Access: Yes

DOI: 10.1109/FPL.2012.6339276

Computational fluid flow simulation on body fitted mesh geometry with IBM cell broadband engine architecture

Publication Name: Ecctd 2009 European Conference on Circuit Theory and Design Conference Program

Publication Date: 2009-12-10

Volume: Unknown

Issue: Unknown

Page Range: 827-830

Description:

The solutions of partial differential equations (PDEs) play a key role in today's real world simulations. Computational Fluid Dynamics (CFD) is an important part of this area, which involves the problem of gas or fluid flow over different obstacles, e.g., air flow around vehicles, buildings, or the flow of water in the oceans. In engineering applications the temporal evolution of non-ideal, compressible fluids is quite often modeled by the system of Navier-Stokes equations. They are a coupled set of nonlinear hyperbolic partial differential equations and form a relatively simple, yet efficient model of compressible fluid dynamics. In the paper the implementation of a CFD on Body Fitted Mesh geometry on the Cell Broadband Engine is described. An arbitrary surface can be more easily simulated on body fitted mesh than on rectangular computation domain. ©2009 IEEE.

Open Access: Yes

DOI: 10.1109/ECCTD.2009.5275111

Supersonic flow simulation on IBM cell processor based emulated digital cellular neural networks

Publication Name: Proceedings IEEE International Symposium on Circuits and Systems

Publication Date: 2009-10-26

Volume: Unknown

Issue: Unknown

Page Range: 1225-1228

Description:

In the area of mechanical, aerospace, chemical and civil engineering the solution of partial differential equations (PDEs) has been one of the most important problems of mathematics for a long time. In this field, one of the most exciting areas is the simulation of fluid flow, which involves for example problems of air, sea and land vehicle motion. In engineering applications the temporal evolution of non-ideal, compressible fluids is quite often modeled by the system of Navier-Stokes equations. They are a coupled set of nonlinear hyperbolic partial differential equations and form a relatively simple, yet efficient model of compressible fluid dynamics. Unfortunately the necessity of the coupled multi-layered computational structure with nonlinear, space-variant templates does not make it possible to utilize the huge computing power of the analog Cellular Neural Network Universal Machine (CNN-UM) chips. To improve the performance of our solution emulated digital CNN-UM implemented on IBM Cell Broadband Engine has been used. The goal is to perform the operations with the highest possible parallelism. ©2009 IEEE.

Open Access: Yes

DOI: 10.1109/ISCAS.2009.5117983

Simulation of 2D inviscid, adiabatic, compressible flows on emulated digital CNN-UM

Publication Name: International Journal of Circuit Theory and Applications

Publication Date: 2009-05-01

Volume: 37

Issue: 4

Page Range: 569-585

Description:

In the area of mechanical, aerospace, chemical and civil engineering the solution of partial differential equations has been one of the most important problems in mathematics for a long time. In this field, one of the most exciting areas is the simulation of fluid flow, which involves for example, problems of air, sea and land vehicle motion. In this paper a CNN-UM based solver of 2D inviscid, adiabatic, compressible fluids will be presented. The governing equations are solved by using first- and second-order numerical methods. Unfortunately the necessity of the coupled multi-layered computational structure with nonlinear, space-variant templates does not make it possible to utilize the huge computing power of the analog CNN-UM chips. To improve the performance of our solution emulated digital CNN-UM implemented on FPGA has been used. Properties of the implemented specialized architecture is examined in terms of area, speed and accuracy. Copyright © 2008 John Wiley and Sons, Ltd.

Open Access: Yes

DOI: 10.1002/cta.565

Simulation of Two-Dimensional supersonic flows on emulated-digital CNN-UM

Publication Name: Eurasip Journal on Advances in Signal Processing

Publication Date: 2009-04-09

Volume: 2009

Issue: Unknown

Page Range: Unknown

Description:

Computational fluid dynamics (CFD) is the scientific modeling of the temporal evolution of gas and fluid flows by exploiting the enormous processing power of computer technology. Simulation of fluid flow over complex-shaped objects currently requires several weeks of computing time on high-performance supercomputers. A CNN-UM-based solver of 2D inviscid, adiabatic, and compressible fluids will be presented. The governing partial differential equations (PDEs) are solved by using first- and second-order numerical methods. Unfortunately, the necessity of the coupled multilayered computational structure with nonlinear, space-variant templates does not make it possible to utilize the huge computing power of the analog CNN-UM chips. To improve the performance of our solution, emulated digital CNN-UM implemented on FPGA has been used. Properties of the implemented specialized architecture is examined in terms of area, speed, and accuracy.

Open Access: Yes

DOI: 10.1155/2009/923404

Experimental result on supersonic flow simulation on emulated digital CNN-UM

Publication Name: Proceedings of the IEEE International Workshop on Cellular Neural Networks and their Applications

Publication Date: 2008-09-23

Volume: Unknown

Issue: Unknown

Page Range: 5

Description:

In the area of mechanical, aerospace, chemical and civil engineering the solution of partial differential equations (PDEs) has been one of the most important problems of mathematics for a long time. In this field, one of the most exciting areas is the simulation of fluid flow, which involves for example problems of air, sea and land vehicle motion. ©2008 IEEE.

Open Access: Yes

DOI: 10.1109/CNNA.2008.4588636

Two-dimensional compressible flow simulation on emulated digital CNN-UM

Publication Name: Proceedings of the IEEE International Workshop on Cellular Neural Networks and their Applications

Publication Date: 2008-09-23

Volume: Unknown

Issue: Unknown

Page Range: 169-174

Description:

In the area of mechanical, aerospace, chemical and civil engineering the solution of partial differential equations (PDEs) has been one of the most important problems of mathematics for a long time. In this field, one of the most exciting areas is the simulation of fluid flow, which involves for example problems of air, sea and land vehicle motion. In this paper a CNNUM based solver of 2D inviscid, adiabatic, compressible fluids will be presented. The governing equations are solved by using firstand second-order numerical methods. Unfortunately the necessity of the coupled multi-layered computational structure with nonlinear, space-variant templates does not make it possible to utilize the huge computing power of the analog CNN-UM chips. To improve the performance of our solution emulated digital CNN-UM implemented on FPGA has been used. Properties of the implemented specialized architecture is examined in terms of area, speed and accuracy. ©2008 IEEE.

Open Access: Yes

DOI: 10.1109/CNNA.2008.4588672