Barracuda VR and GPU Supercomputing
When did Barracuda VR come in a GPU version?
In Q4, 2013, CPFD Software released Series 16-GPU for both the Barracuda and Barracuda VR products for Linux operating systems. These GPU-parallel codes were new products, redesigned starting with the fundamental data structures to exploit GPU acceleration, utilizing NVIDIA’s CUDA® language for GPU execution. Customers could purchase Series 16-GPU, or could optionally upgrade existing serial licenses. Series 16-GPU came out in early 2014 for both Barracuda® and Barracuda VR on Linux or Windows® 64-bit.
What sort of Barracuda VR calculations could be run with Series 16-GPU?
All types of simulations that could previously be run were compatible with Series 16-GPU. These ranged from isothermal, non-reacting simulations to thermal, reacting cases with discrete, multi-component particle reactions. However, to take full advantage of the GPU supercomputing capabilities, it remains very important that the entire calculation fit within the memory on the GPU itself. Since GPUs have less available memory compared with any recommended specifications for the workstation RAM, users must monitor project sizes to take full advantage of GPU-enabled performance. Once the available GPU memory threshold is exceeded, much of the GPU performance gains are forfeited, and calculations will slow significantly.
For an example non-reacting, isothermal simulation on Series 16-GPU, a cell requires approximately 1855 bytes of memory, while a particle requires 490 bytes. Based on this, sample memory usage can be estimated if the cell and computational particle counts are known. Sample memory usage data for an isothermal, non-reacting test case are shown in Table 1. Note that memory usage is higher for thermal, reacting cases. In particular, the more species (gases or particle components) are used, the greater the memory requirements.
What speed-upiv is reasonable to expect with Series 16-GPU?
Speed-ups obtained with Series 16-GPU were dependent upon the characteristics of the calculation being run. Isothermal, non-reacting calculations resulted in a greater speed-up than cases with extensive discrete-particle chemistry. Projects with large numbers of computational particles resulted in a greater speed-up than equivalent projects with fewer particles. Computations with a significant number of null cells accelerated more than those with fewer null cells. Very small tests, with few cells and particles, still may not experience any speed-up. Thus, an up-front guarantee of speed-up is impossible to give for all cases. We have seen speed-ups as low as 2.0x and as high as 10x or greater for sample industrial cases tested.
However, our testing has revealed general trends for typical calculations. Sample speed-ups are shown in Figure 1 for ten different projects tested. The plot includes a brief project description and the legend can be used to determine whether the calculation was run as isothermal, thermal, reacting or non- reacting. In all cases the same computer was used for both calculations, with no other calculations running concurrently. Figure 1 shows that most calculations experience a speed-up between 3.0x to 5.0x, with 4.0x fairly typicalv.
What does this mean? Practically speaking, an average calculation run with Series 16-GPU took 25% of the runtime of the Series-15 Barracuda or Barracuda VR solver. That means, a calculation that previously took 1 month to run, could be run in about 1 week with Series 16-GPU. A calculation that took a week was done over a couple of days. Alternatively, for a case that previously ran in a day, users with GPU acceleration can now run a calculation with a finer resolution of cells and computational particles, and not wait longer for the answer.
What about future releases?
The efficiency improvements and utilization of state-of-the-art GPU acceleration techniques have resulted in significant calculation speed-ups. However, additional speed-ups are still expected in future releases. The efficiency gains outlined herein are for a first-release of a GPU-enabled solver. Additional gains were realized with the release of the GPU version of Series 17 (released Q2, 2015) which averaged an additional 50% speed-up over Series 16-GPU. Further gains are still expected as we continue to utilize the GPU for additional portions of the software. Users who purchase GPU solvers are entitled to future enhanced versions as long as their lease or perpetual license maintenance is in force.
To learn more about GPU Supercomputing Systems, click here.
iv Speed-up is defined as the number of equivalent calculations that could be run with Series 16-GPU in the same time it takes to run one calculation with the current release code (15.2.2). For example, if the calculation takes half the time to run, the speed-up is quoted as 200% or 2.0x.
v The speed-ups reported herein are typical based on what is known to CPFD Software staff at the time of writing.