Conway's Game of Life on GPU using CUDA
Chapter 6: Conclusion
This project presented three different conventional evaluation algorithms for Conway's Game of Life: basic algorithm, bit counting algorithm, and lookup table algorithm. All three algorithms were carefully implemented on both CPU and GPU and thorough benchmark was performed.
The best algorithm for the CPU turned out to be the lookup table that yields speedups up to 45× over the serial CPU. The best algorithm for the GPG was the bit counting with speedup of 480× over the serial CPU. Following section shows detailed graphs and comparison of all algorithms.
Hardware
CPU: Xeon E5530 @ 2.40GHz with 4 cores (8 threads)
GPU: GeForce GTX 580 with 1.5 GB of graphics memory and 512 CUDA cores
Overall performance evaluation
Figure 26 presents absolute evaluation speeds in millions of evaluated cells per second. The GPU is able to evaluate amazing 28 trillion cells per second. This number is hard to imagine. I like to compare fast things with fast things, say with the speed of light – nearly 300,000 km/s, that's 1,000,000,000 km/h (671,000,000 mph).
The distance per evaluated cell dpc
can be computed as:
dpc = c/es = 299,792,458/24,688,686,207 = 0.01214 m
, where dpc
is distance per evaluated cell in meters,c
is speed of light in m/s, andes
is evaluation speed in cells/s.
One cell is evaluated for every 0.01214 meters passed by light! That's 12 mm or 0.48 in! That is quite mind-blowing. Let me know if you have any conventional (no hash-life) algorithm that performs significantly better on comparable hardware.
Last two charts presents speedup over the serial CPU (Figure 27) and over the parallel CPU (Figure 28). The GPU is reaching speedup of 480× over the serial CPU or 235× over the parallel CPU algorithms. I consider those speedups as satisfying results.
Code and supplementary material
You can find full source code published under Public Domain on the GitHub.