Barracuda VR and GPU Supercomputing

Does Barracuda VR come in a GPU version?

In Q4, 2013, CPFD Software released Series 16-GPU for both the Barracuda and Barracuda VR products. These GPU-parallel codes are new products, redesigned starting with the fundamental data structures to exploit GPU acceleration, utilizing NVIDIA’s CUDA® language for GPU execution. Customers may purchase Series 16-GPU, or may optionally upgrade existing serial licenses. Series 16-GPU is available in Q4, 2013 for both Barracuda® and Barracuda VR on Linux. Windows® GPU and serial (i.e. CPU-only) codes with series 16 features are still in development, with releases anticipated in the first half of 2014. 

What sort of Barracuda VR calculations can be run with Series 16-GPU?

All types of simulations that can be run currently can be run on Series 16-GPU. These range from isothermal, non-reacting simulations to thermal, reacting cases with discrete, multi-component particle reactions. However, to take full advantage of the GPU supercomputing capabilities, it is very important that the entire calculation fit within the memory on the GPU itself. Since GPUs have less available memory compared with any recommended specifications for the workstation RAM, users must monitor project sizes to take full advantage of GPU-enabled performance. Once the available GPU memory threshold is exceeded, much of the GPU performance gains are forfeited, and calculations will slow significantly.

For an example non-reacting, isothermal simulation, a cell requires approximately 1855 bytes of memory, while a particle requires 490 bytes. Based on this, sample memory usage can be estimated if the cell and computational particle counts are known. Sample memory usage data for an isothermal, non-reacting test case are shown in Table 1. Note that memory usage is higher for thermal, reacting cases. In particular, the more species (gases or particle components) are used, the greater the memory requirements. 

What speed-upiv is reasonable to expect with Series 16-GPU?

Speed-ups obtained with Series 16-GPU are dependent upon the characteristics of the calculation being run. Isothermal, non-reacting calculations will result in a greater speed-up than cases with extensive discrete-particle chemistry. Projects with large numbers of computational particles will result in a greater speed-up than equivalent projects with fewer particles. Computations with a significant number of null cells will accelerate more than those with fewer null cells. Very small tests, with few cells and particles, may not experience any speed-up. Thus, an up-front guarantee of speed-up is impossible to give for all cases. We have seen speed-ups as low as 2.0x and as high as 10x or greater for sample industrial cases tested.

However, our testing has revealed general trends for typical calculations. Sample speed-ups are shown in Figure 1 for ten different projects tested. The plot includes a brief project description and the legend can be used to determine whether the calculation was run as isothermal, thermal, reacting or non- reacting. In all cases the same computer was used for both calculations, with no other calculations running concurrently. Figure 1 shows that most calculations experience a speed-up between 3.0x to 5.0x, with 4.0x fairly typicalv.

What does this mean? Practically speaking, an average calculation run with Series 16-GPU will take 25% of the runtime of the current Barracuda or Barracuda VR solver. That means, a calculation that currently takes 1 month to run, can now be run in about 1 week. A calculation that takes a week will now be done over a couple of days. Alternatively, for a case that now runs in a day, Series 16-GPU users can now run a calculation with a finer resolution of cells and computational particles, and not wait longer for the answer.

Are similar speed-ups expected in future releases?

The efficiency improvements and utilization of state-of-the-art GPU acceleration techniques have resulted in significant calculation speed-ups. However, additional speed-ups are still expected in future releases. The efficiency gains outlined herein are for a first-release of a GPU-enabled solver. Additional gains are expected as we continue to utilize the GPU for additional portions of the software. Users who purchase Series 16-GPU shall be entitled to future enhanced versions as long as their lease or perpetual license maintenance is in force. 

To learn more about GPU Supercomputing Systems, click here.

iv Speed-up is defined as the number of equivalent calculations that could be run with Series 16-GPU in the same time it takes to run one calculation with the current release code (15.2.2). For example, if the calculation takes half the time to run, the speed-up is quoted as 200% or 2.0x.
v The speed-ups reported herein are typical based on what is known to CPFD Software staff at the time of writing.