We use Par4All to parallelize programs for customers and collaborative research projects but we have also used Par4All on some benchmarks. Here are a few results that we are allowed to communicate.
Stars-PM is a Particle-Mesh N-body cosmological simulation in C code from the Observatoire Astronomique de Strasbourg and uses 3D FFT among other things.
The goal is to model gravitational interactions between particles in space. The 3D resolution of these interactions is obtained by interpolation on space discretization.
We use Par4All automatic transformations process to on Stars-PM. The sequential version was written in C and has been the object of a study to manually optimize and adapt on GPU.
This graph shows our results on 20 benchmarks from Polybench suite, 3 from Rodinia, and the application Stars-PM. Measurements were performed on a machine with 2 Xeon Westmere X5670 (12 cores at 2.93 GHz) and a nvidia GPU Tesla C2050.
The OpenMP versions used for the experiments are generated automatically by the parallelizer and are not manually optimized.
For more details:
Hyantes is a library to compute neighbourhood population potential with scale control. It is developed by the MESCAL team from the Laboratoire Informatique de Grenoble (France), as a part of HyperCarte project. The HyperCarte project aims to develop new methods for the cartographic representation of human distributions (population density, population increase, etc.) with various smoothing functions and opportunities for time-scale animations of maps. Hyantes provides one of the smoothing methods related to multiscalar neighbourhood density estimation. It is a C library that takes sets of geographic data as inputs and computes a smoothed representation of this data taking account of neighbourhood’s influence.
For more information: http://hyantes.gforge.inria.fr
We measure the wall-clock time that includes startup time, data load time and output write time, that is the real time understood by users. By measuring kernel time only, speed-up would be better but less representative of the real application (Amdahl…).
On a Wild Node with 2 Intel Xeon X5670 @ 2.93GHz (12 cores) and a Tesla C2050 (Fermi), Linux/Ubuntu 10.04, gcc 4.4.3, CUDA 3.1, we measure in double precision:
To test it by yourself on the main computational part, go to the examples/P4A/Hyantes directory of Par4All or look in the git repository https://github.com/Par4All/par4all/tree/p4a/examples/P4A/Hyantes
For mobile users, it is interesting to show result figures on a laptop: for the single precision (compiled with make USE_FLOAT=1) on a HP EliteBook 8730w laptop (with an Intel Core2 Extreme Q9300 @ 2.53GHz (4 cores) and a nVidia GPU Quadro FX 3700M, 16 multiprocessors, 128 cores, architecture 1.1) with Linux/Debian/sid, gcc 4.4.3, CUDA 3.1:
Fresnel is a C program from the Holotetrix company that simulates Fresnel diffraction in holograms before manufacturing.
The application has been tested on various architectures:
Architecture | Cores |
---|---|
Intel Core 2 duo E6600 2.4 GHz | 2 |
Intel Core 2 duo T9400 2.5 GHz | 2 |
2 Intel Core 2 X5472 3 GHz | 8 |
2 Intel Nehalem X5570 2.9 GHz | 8 |
nVidia GTX 200 | 192 |
nVidia Tesla C1060 | 240 |
nVidia Quadro FX 3700M (G92GL) | 128 |
The speed-up are normalized with 1 E6600 core as reference and the computations are done for different hologram sizes for the following results:
bwaves is a computational fluid dynamics Fortran program that simulates blast waves in three dimensional transonic transient laminar viscous flow.
More information on the program itself on http://www.spec.org/cpu2006/Docs/410.bwaves.html
On a Wild Node, we get a speed-up of 4.5 with a 2 Intel Xeon X5670 @ 2.93GHz (12 cores).
The classical Hello World in Fortran can be found in examples/F77_matmul_OpenMP directory of Par4All or look in the git repository https://github.com/Par4All/par4all/tree/p4a/examples/F77_matmul_OpenMP so you can try by yourself.
On a Wild Node, we get a speed-up of 12.1 (thanks to cache effects) with a 2 Intel Xeon X5670 @ 2.93GHz (12 cores).