#ifndef __Common_H #define __Common_H #endif void getvalue ( float **, int *); void DeviceFunc ( float *, int, float *); The code is relatively straightforward. Because it is done in C++, the main function is compiled separately from the CUDA code. This portion of the program is run on the host's standard processor like a regular C++ program. It reads in the data from an external file.
![Method Method](http://oregonstate.edu/instruct/ch490/lessons/lesson13_files/image018.jpg)
Then transfers control to the device in DeviceFunc.cu. Here, memory is allocated and transferred from the main processor into the graphics processing unit. Final fantasy tactics rom version 3.0. It prepares an execution configuration and then launches the kernel, which is the actual Gaussian Elimination code run in parallel by the GPU. Although the 'typical' Gaussian elimination can be done in CUDA, several studies have found that there are more efficient ways of parallelizing the algorithm by making some adjustments. In one study by Xinggao Xia and Jong Chul Lee, rather than clearing only the rows before the pivot, all other rows are reduced to zero. Once each column is complete, no back-substitution is required. Partial pivoting is also used to ensure the accuracy of the answer.