P. Szolgay

7003329702

Publications - 9

Accelerating unstructured finite volume computations on field-programmable gate arrays

Publication Name: Concurrency and Computation Practice and Experience

Publication Date: 2014-03-10

Volume: 26

Issue: 3

Page Range: 615-643

Description:

In the paper, an field-programmable gate array (FPGA)-based framework is described to efficiently accelerate unstructured finite volume computations where the same mathematical expression has to be evaluated at every point of the mesh. The irregular memory access patterns caused by the unstructured spatial discretization are eliminated by a novel mesh node reordering technique, and a special architecture is designed to fully utilize the benefits of the predictable memory access patterns. In the proposed architecture, a fixed-size moving window of the input stream of the reordered state variables is cached into the on-chip memory and a pipelined chain of processing elements, which gets input only from the fast on-chip memory, is used to carry out the computations. The arithmetic unit (AU) of the processing elements is generated from the data flow graph extracted from the given mathematical expression. The data flow graph is partitioned with a novel graph partitioning algorithm to break up the AU into smaller locally controlled parts, which can be more efficiently implemented in FPGA than the globally controlled AU. The proposed architecture and algorithms are presented via a case study solving the Euler equations on an unstructured mesh. On the currently available largest FPGA, the generated architecture contains three processing elements working in a pipelined fashion to provide one order of magnitude speedup compared with a high performance microprocessor and three times speedup compared with a high performance graphics processing unit. Copyright © 2013 John Wiley & Sons, Ltd. Copyright © 2013 John Wiley & Sons, Ltd.

Open Access: Yes

DOI: 10.1002/cpe.3022

Examining the accuracy and the precision of PDEs for FPGA computations

Publication Name: International Workshop on Cellular Nanoscale Networks and their Applications

Publication Date: 2012-12-13

Volume: Unknown

Issue: Unknown

Page Range: Unknown

Description:

There are a large number of problems which can be accelerated by using architectures on Field Programmable Gate Arrays (FPGA). However sometimes the complexity of a problem does not allow to map it onto a specific FPGA. In that case analysis of precision of the arithmetic unit which may solve the computational problem can be a good attempt to fit the architecture and to accelerate its computation. Numerical algorithm can be implemented using fixed-point or floating point arithmetic (or mixed (both)) with different precision. The aim of the article is not to optimize the numerical algorithm but to find a smaller arithmetic unit precision, which results enough accuracy and fits to smaller FPGA-s. In the paper, one particular problem type is investigated, namely the accuracy of the solution of a simple Partial Differential Equation (PDE). The accuracy measurement is done on an FPGA with different bit width. The solution of the advection equation is analyzed using first and second order discretization methods. As a result we managed to find an optimal bit width for the solution on a specific FPGA. © 2012 IEEE.

Open Access: Yes

DOI: 10.1109/CNNA.2012.6331439

FPGA based acceleration of computational fluid flow simulation on unstructured mesh geometry

Publication Name: Proceedings 22nd International Conference on Field Programmable Logic and Applications Fpl 2012

Publication Date: 2012-12-12

Volume: Unknown

Issue: Unknown

Page Range: 128-135

Description:

Numerical simulation of complex computational fluid dynamics problems evolving in time plays an important role in scientific and engineering applications. Accurate behavior of dynamical systems can be understood using large scale simulations which traditionally requires expensive super-computing facilities. In the paper a Field Programmable Gate Array (FPGA) based framework is described to accelerate simulation of complex physical spatio-temporal phenomena. Simulating complicated geometries requires unstructured spatial discretization which results in irregular memory access patterns severely limiting computing performance. Data locality is improved by mesh node renumbering technique which results in a sequential memory access pattern. Additionally storing a small window of cell-centered state values in the on-chip memory of the FPGA can increase data reuse and decrease memory bandwidth requirements. Generation of the floating-point data path and control structure of the arithmetic unit containing dozens of operators is a very challenging task when the goal is high operating frequency. Efficiency and use of the framework is described by a case study solving the Euler equations on an unstructured mesh using finite volume technique. On the currently available largest FPGA the generated architecture contains three processing elements working in parallel providing 75 times speedup compared to a high performance microprocessor. © 2012 IEEE.

Open Access: Yes

DOI: 10.1109/FPL.2012.6339276

Simulation of 2D inviscid, adiabatic, compressible flows on emulated digital CNN-UM

Publication Name: International Journal of Circuit Theory and Applications

Publication Date: 2009-05-01

Volume: 37

Issue: 4

Page Range: 569-585

Description:

In the area of mechanical, aerospace, chemical and civil engineering the solution of partial differential equations has been one of the most important problems in mathematics for a long time. In this field, one of the most exciting areas is the simulation of fluid flow, which involves for example, problems of air, sea and land vehicle motion. In this paper a CNN-UM based solver of 2D inviscid, adiabatic, compressible fluids will be presented. The governing equations are solved by using first- and second-order numerical methods. Unfortunately the necessity of the coupled multi-layered computational structure with nonlinear, space-variant templates does not make it possible to utilize the huge computing power of the analog CNN-UM chips. To improve the performance of our solution emulated digital CNN-UM implemented on FPGA has been used. Properties of the implemented specialized architecture is examined in terms of area, speed and accuracy. Copyright © 2008 John Wiley and Sons, Ltd.

Open Access: Yes

DOI: 10.1002/cta.565

Simulation of Two-Dimensional supersonic flows on emulated-digital CNN-UM

Publication Name: Eurasip Journal on Advances in Signal Processing

Publication Date: 2009-04-09

Volume: 2009

Issue: Unknown

Page Range: Unknown

Description:

Computational fluid dynamics (CFD) is the scientific modeling of the temporal evolution of gas and fluid flows by exploiting the enormous processing power of computer technology. Simulation of fluid flow over complex-shaped objects currently requires several weeks of computing time on high-performance supercomputers. A CNN-UM-based solver of 2D inviscid, adiabatic, and compressible fluids will be presented. The governing partial differential equations (PDEs) are solved by using first- and second-order numerical methods. Unfortunately, the necessity of the coupled multilayered computational structure with nonlinear, space-variant templates does not make it possible to utilize the huge computing power of the analog CNN-UM chips. To improve the performance of our solution, emulated digital CNN-UM implemented on FPGA has been used. Properties of the implemented specialized architecture is examined in terms of area, speed, and accuracy.

Open Access: Yes

DOI: 10.1155/2009/923404

Experimental result on supersonic flow simulation on emulated digital CNN-UM

Publication Name: Proceedings of the IEEE International Workshop on Cellular Neural Networks and their Applications

Publication Date: 2008-09-23

Volume: Unknown

Issue: Unknown

Page Range: 5

Description:

In the area of mechanical, aerospace, chemical and civil engineering the solution of partial differential equations (PDEs) has been one of the most important problems of mathematics for a long time. In this field, one of the most exciting areas is the simulation of fluid flow, which involves for example problems of air, sea and land vehicle motion. ©2008 IEEE.

Open Access: Yes

DOI: 10.1109/CNNA.2008.4588636

Two-dimensional compressible flow simulation on emulated digital CNN-UM

Publication Name: Proceedings of the IEEE International Workshop on Cellular Neural Networks and their Applications

Publication Date: 2008-09-23

Volume: Unknown

Issue: Unknown

Page Range: 169-174

Description:

In the area of mechanical, aerospace, chemical and civil engineering the solution of partial differential equations (PDEs) has been one of the most important problems of mathematics for a long time. In this field, one of the most exciting areas is the simulation of fluid flow, which involves for example problems of air, sea and land vehicle motion. In this paper a CNNUM based solver of 2D inviscid, adiabatic, compressible fluids will be presented. The governing equations are solved by using firstand second-order numerical methods. Unfortunately the necessity of the coupled multi-layered computational structure with nonlinear, space-variant templates does not make it possible to utilize the huge computing power of the analog CNN-UM chips. To improve the performance of our solution emulated digital CNN-UM implemented on FPGA has been used. Properties of the implemented specialized architecture is examined in terms of area, speed and accuracy. ©2008 IEEE.

Open Access: Yes

DOI: 10.1109/CNNA.2008.4588672

Enhanced emulated digital CNN-UM (CASTLE) arithmetic cores

Publication Name: Journal of Circuits Systems and Computers

Publication Date: 2003-12-01

Volume: 12

Issue: 6

Page Range: 711-738

Description:

An emulated digital CNN-UM (CASTLE) architecture was published few years ago.1 Different emulated digital CNN-UM architectures are analyzed in the paper. These new modified architectures are optimized according to the silicon area, operating speed or dissipated power. A reconfigurable arithmetic core will also be shown in the paper, by which solution of the neighborhood size can be changed. An advanced CASTLE with pipe-lining is presented. The operation frequency is increased by using this solution in approximately 10 times.

Open Access: Yes

DOI: 10.1142/S0218126603001136

An accelerated digital CNN-UM (CASTLE) architecture by using the pipe-line technique

Publication Name: Proceedings of the IEEE International Workshop on Cellular Neural Networks and their Applications

Publication Date: 2002-01-01

Volume: 2002-January

Issue: Unknown

Page Range: 355-362

Description:

Different CNN-UM architecture implementations, analog and emulated digital, were developed. The emulated digital architecture (CASTLE) is accurate but slower than the analog CNN-UMs. It is generally disadvantageous especially if transient computing is critical. The operation speed of the emulated digital implementations, namely CASTLE, can be increased significantly using the pipeline technique. This solution is analyzed with respect to area, time, etc. These arithmetic cores were tested and simulated using a VIRTEX FPGA development system.

Open Access: Yes

DOI: 10.1109/CNNA.2002.1035070