András Kiss

57195959115

Publications - 5

Accelerating unstructured finite volume computations on field-programmable gate arrays

Publication Name: Concurrency and Computation Practice and Experience

Publication Date: 2014-03-10

Volume: 26

Issue: 3

Page Range: 615-643

Description:

In the paper, an field-programmable gate array (FPGA)-based framework is described to efficiently accelerate unstructured finite volume computations where the same mathematical expression has to be evaluated at every point of the mesh. The irregular memory access patterns caused by the unstructured spatial discretization are eliminated by a novel mesh node reordering technique, and a special architecture is designed to fully utilize the benefits of the predictable memory access patterns. In the proposed architecture, a fixed-size moving window of the input stream of the reordered state variables is cached into the on-chip memory and a pipelined chain of processing elements, which gets input only from the fast on-chip memory, is used to carry out the computations. The arithmetic unit (AU) of the processing elements is generated from the data flow graph extracted from the given mathematical expression. The data flow graph is partitioned with a novel graph partitioning algorithm to break up the AU into smaller locally controlled parts, which can be more efficiently implemented in FPGA than the globally controlled AU. The proposed architecture and algorithms are presented via a case study solving the Euler equations on an unstructured mesh. On the currently available largest FPGA, the generated architecture contains three processing elements working in a pipelined fashion to provide one order of magnitude speedup compared with a high performance microprocessor and three times speedup compared with a high performance graphics processing unit. Copyright © 2013 John Wiley & Sons, Ltd. Copyright © 2013 John Wiley & Sons, Ltd.

Open Access: Yes

DOI: 10.1002/cpe.3022

Examining the accuracy and the precision of PDEs for FPGA computations

Publication Name: International Workshop on Cellular Nanoscale Networks and their Applications

Publication Date: 2012-12-13

Volume: Unknown

Issue: Unknown

Page Range: Unknown

Description:

There are a large number of problems which can be accelerated by using architectures on Field Programmable Gate Arrays (FPGA). However sometimes the complexity of a problem does not allow to map it onto a specific FPGA. In that case analysis of precision of the arithmetic unit which may solve the computational problem can be a good attempt to fit the architecture and to accelerate its computation. Numerical algorithm can be implemented using fixed-point or floating point arithmetic (or mixed (both)) with different precision. The aim of the article is not to optimize the numerical algorithm but to find a smaller arithmetic unit precision, which results enough accuracy and fits to smaller FPGA-s. In the paper, one particular problem type is investigated, namely the accuracy of the solution of a simple Partial Differential Equation (PDE). The accuracy measurement is done on an FPGA with different bit width. The solution of the advection equation is analyzed using first and second order discretization methods. As a result we managed to find an optimal bit width for the solution on a specific FPGA. © 2012 IEEE.

Open Access: Yes

DOI: 10.1109/CNNA.2012.6331439

FPGA based acceleration of computational fluid flow simulation on unstructured mesh geometry

Publication Name: Proceedings 22nd International Conference on Field Programmable Logic and Applications Fpl 2012

Publication Date: 2012-12-12

Volume: Unknown

Issue: Unknown

Page Range: 128-135

Description:

Numerical simulation of complex computational fluid dynamics problems evolving in time plays an important role in scientific and engineering applications. Accurate behavior of dynamical systems can be understood using large scale simulations which traditionally requires expensive super-computing facilities. In the paper a Field Programmable Gate Array (FPGA) based framework is described to accelerate simulation of complex physical spatio-temporal phenomena. Simulating complicated geometries requires unstructured spatial discretization which results in irregular memory access patterns severely limiting computing performance. Data locality is improved by mesh node renumbering technique which results in a sequential memory access pattern. Additionally storing a small window of cell-centered state values in the on-chip memory of the FPGA can increase data reuse and decrease memory bandwidth requirements. Generation of the floating-point data path and control structure of the arithmetic unit containing dozens of operators is a very challenging task when the goal is high operating frequency. Efficiency and use of the framework is described by a case study solving the Euler equations on an unstructured mesh using finite volume technique. On the currently available largest FPGA the generated architecture contains three processing elements working in parallel providing 75 times speedup compared to a high performance microprocessor. © 2012 IEEE.

Open Access: Yes

DOI: 10.1109/FPL.2012.6339276

Computational fluid flow simulation on body fitted mesh geometry with IBM cell broadband engine architecture

Publication Name: Ecctd 2009 European Conference on Circuit Theory and Design Conference Program

Publication Date: 2009-12-10

Volume: Unknown

Issue: Unknown

Page Range: 827-830

Description:

The solutions of partial differential equations (PDEs) play a key role in today's real world simulations. Computational Fluid Dynamics (CFD) is an important part of this area, which involves the problem of gas or fluid flow over different obstacles, e.g., air flow around vehicles, buildings, or the flow of water in the oceans. In engineering applications the temporal evolution of non-ideal, compressible fluids is quite often modeled by the system of Navier-Stokes equations. They are a coupled set of nonlinear hyperbolic partial differential equations and form a relatively simple, yet efficient model of compressible fluid dynamics. In the paper the implementation of a CFD on Body Fitted Mesh geometry on the Cell Broadband Engine is described. An arbitrary surface can be more easily simulated on body fitted mesh than on rectangular computation domain. ©2009 IEEE.

Open Access: Yes

DOI: 10.1109/ECCTD.2009.5275111

Supersonic flow simulation on IBM cell processor based emulated digital cellular neural networks

Publication Name: Proceedings IEEE International Symposium on Circuits and Systems

Publication Date: 2009-10-26

Volume: Unknown

Issue: Unknown

Page Range: 1225-1228

Description:

In the area of mechanical, aerospace, chemical and civil engineering the solution of partial differential equations (PDEs) has been one of the most important problems of mathematics for a long time. In this field, one of the most exciting areas is the simulation of fluid flow, which involves for example problems of air, sea and land vehicle motion. In engineering applications the temporal evolution of non-ideal, compressible fluids is quite often modeled by the system of Navier-Stokes equations. They are a coupled set of nonlinear hyperbolic partial differential equations and form a relatively simple, yet efficient model of compressible fluid dynamics. Unfortunately the necessity of the coupled multi-layered computational structure with nonlinear, space-variant templates does not make it possible to utilize the huge computing power of the analog Cellular Neural Network Universal Machine (CNN-UM) chips. To improve the performance of our solution emulated digital CNN-UM implemented on IBM Cell Broadband Engine has been used. The goal is to perform the operations with the highest possible parallelism. ©2009 IEEE.

Open Access: Yes

DOI: 10.1109/ISCAS.2009.5117983