Cuda is a parallel computing architecture developed by NVIDIA. Nvidia GPUs have a parallel "many-core" architecture, each core capable of running thousands of threads simultaneously. I installed CUDA 2.1 on my Sony notebook Vaio FZ21M with Geoforce 8400M by using the modded drivers available at http://www.laptopvideo2go.com/
CUDA supports C, C++, Python, and Fortran. The paper Fast N-body Simulation with CUDA describes an a CUDA parallel program which runs 50 times as fast as a highly tuned serial implementation for the N-body simulation program. Wow! this is amazing. I still remeber my parallel implementation on Cray T3E, with 128 alpha processors. More than 10 years ago.
Well, now I have thousands of threads running on a multi-core GPU under my fingers (plus the dual-core CPU). Cuda documentation is pretty extensive. I am right now studying the CuBlas libraries for linear algebra and starting to code.
Here you have an introduction to cuda by the way of Dr. Dobbs