OpenCV Based Julia Set by C++, OpenMP, OpenCL and CUDA

When I study CUDA with the book CUDA by example, I found an interesting small program, using computer to generate Julia set image, a kind of fractal image. I have some experience on fractal geometry when I was an undergraduate student and I still have interests on it. I tried the example and found that the CUDA example on that book has some syntax errors.
Based on that example, I created the OpenCL version and OpenMP version.
Allow me to show you the result of this program.


Julia set image


All the related source code was uploaded to my github repository OpenCV_based_JuliaSet_Implementation.
So far, I haven’t tried whether it can be directly compiled on other people’s machine. You know, it takes time to set up OpenCV environment, let alone CUDA or OpenCL.
Anyway, this blog will try to provide necessary support for these programs, click here to view the source code.
I’m considering to give a Pthread version, maybe later…

C++, OpenCV and algorithm

I won’t talk too much about C++. It’s personal but I don’t like C++, if you are into it, it’s fine. I can give you a long list to convince you that C++ is a bad language and trigger the holy war again, but I’m not going to do it now.
Also, it’s personal, C language is the best language in the world!
The C++ version Julia set program defines a class to represent complex variable, and uses C++’s operator overloading character in order to realize the complex variable’s addition and multiplication operation.
Variable “ppercentage” and “percentage” relate to the program progress, DIM defines the size of the output image, if DIM is too large, then it will take some time for the C++ version to complete.
OpenCV is an popular open source computer vision library dominated by Intel. My program only uses it’s data structure “Mat”, which represents a frame of image. The “data” field of “Mat” is a pointer to pixel value array. “Mat” stores the data row-wise,so access the data row-wise consistently will make the most use of cache space in your computer’s memory system.
About the Julia set, I presume that your already have some basic knowledge of it, otherwise, I may wonder what brings you here, for those who don’t, I list the following key word if you are interested in digging into this amazing universe.
Julia set, Mandelbrot set, chaos theory, nonlinear dynamics system
Algorithm of generating Julia set is quite intuitive.
Given the complex formula: Z <= Z^2 + C, where Z and C belongs to set of complex number.
We set C as a fix number, in the program, for example C = (-0.8, 0.156).

1
complex c(-0.8, 0.156);

Then in a image, we move the origin of coordinate to the center of the image, and limit the range of the computation space from -1.5 to 1.5. For example, if we have a image of size 100 x 100, the first pixel (top-left corner of the image) corresponding to the point (-1.5, 1.5) in the computation space, and the last pixel (bottom-right corner of the image) corresponding to the point (1.5, -1.5).
We assume the whole image is a set of points in the complex plane. Let the pixel position represent a complex variable. For example, Re-ranged position (-1.5 ,1,5) represent the complex number Z, say Z = -1.5 + 1.5i.
Now, we know where Z and C comes from, we can start the iteration.
At the first point, Z = (-1.5, 1.5), C = (-0.8, 0.156), feed the number to equation Z <= Z^2 + C. I assume that you know the rule of complex variable addition and multiplication. After the first time’s iteration, we get the new Z = (-0.8, -2.344), and the Euclidean distance of the new Z to the origin point is 6.134336, the distance is less than 5000, so we repeat the above process until the distance is larger than 5000 or the iteration repeated for 200 times.
The amazing part is that for some “Z”s, after a certain times of iteration, the distance can be larger than a definite value, but for some value, they are proved that no matter how many time’s iteration, the distance will never be larger than a definite value.
We set the pixel value of those “Z”s who can escape from the origin point to grayscale 255(white), and those that who cannot to grayscale 0(black).
You can modify the program to generate color image, set the color of a certain position according to its escape time(iteration number).
I should warn you that you had better not set the DIM image size too large, otherwise, it will take a quite long time for the CPU to process the image, maybe tens of minutes or hours or days. The time complexity of this program is quite high.

OpenMP

OpenMP version is quite similar to the C++ version. I should say that OpenMP is a great multi-thread programming frame work. Quite a lot of programs like Julia set generator can be migrate from single core to multi-core without too much extra effort. Thread safe is always a hot topic in multi-thread programming because different thread may share the same memory space. And we don’t need to worry about it in this program because the process on the pixels of this program is independent from other pixels.

1
#pragma omp parallel for private(j) num_threads(6)

This compiler directive tells the compiler that the following for loop will be executed by 6 threads.
If you are using Visual studio IDE, remember to open the multi-thread option in the properity language page. You can search on the google how to enable OpenMP. Otherwise it will not work even though the directive is given.
My way is: go to project property, select C/C++ -> Language, and change “OpenMP Support” to ‘Yes’, Click ok.
Under Linux, remember to apply -fopenmp to g++ when compiling.
The processing speed of OpenMP version should be about 5 to 6 times faster than the C++ version. I remember the processing time for a 20000x20000 image is about 20 minutes, my CPU is Intel(R) Core(TM) i7-6700 @3.4GHz 3.41GHz, 4 cores with 8 threads.

CUDA

As I mentioned above, the main body of CUDA Julia set program comes from the example code of the book CUDA by example, the difference is that my version can run! I don’t know why but quite a lot of GPU programming book’s demo cannot run, or have some low level bugs.
For the CUDA version, the GPU execution parameters such as cuda_thread_num and cuda_block_num things also comes from the above book, which means I didn’t do any optimization.
My desktop has a Nvidia GTX-970 graphic card, from the Wikipedia, the throughput for single precision floating point manipulation is about 3400 GFLOPS.
If you want to run the CUDA version, make sure your environment can support CUDA. I created this program by using an template provided by CUDA developer SDK, it’s a third party template of Visual studio. CUDA program can only be compiled by nvcc compiler.
The processing time for CUDA version is around a few minutes, still, for 20000x20000 image.

OpenCL

I like OpenCL more than CUDA, maybe it’s just because of the feeling of having the ability to control over everything.
OpenCL has the different feature from other framework that is called runtime compiling, the main program read the GPU kernel file and create shader program during the running the process.
Unlike CUDA, OpenCL support GPUs from different vendors such as Intel, AMD, Qualcomm-Adreno, ARM-Mali and of course Nvidia(but Nvidia seems do not want to optimize OpenCL for it’s own good, so Nvidia, fxxk you! Please refer to appendix to figure out the source.).
For OpenCL, the parameters (such as global_size and local size) for GPU execution is highly optimized for my GPU. So, the time for OpenCL version is just about seconds for generating a 20000x20000 image.

Appendix

Linus Torvalds is my idol, not only because his accomplishment, but also because of his appearance! Watch this Youtube video, drag to 48:10 then you will know why he is so charming!


License


The content of this blog itself is licensed under the Creative Commons Attribution 4.0 International License.
CC-BY-SA LICENCES

The containing source code (if applicable) and the source code used to format and display that content is licensed under the Apache License 2.0.
Copyright [2016] [yeephycho]
Licensed under the Apache License, Version 2.0 (the “License”);
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
Apache License 2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an “AS IS” BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
express or implied. See the License for the specific language
governing permissions and limitations under the License.
APACHE LICENCES