GPUs are specialized processors designed for graphics processing. CUDA (Compute Unified Device Architecture) allows general purpose programming on NVIDIA GPUs. CUDA programs launch kernels across a grid of blocks, with each block containing multiple threads that can cooperate. Threads have unique IDs and can access different memory types including shared, global, and constant memory. Applications that map well to this architecture include physics simulations, image processing, and other data-parallel workloads. The future of CUDA includes more general purpose uses through GPGPU and improvements in virtual memory, size, and cooling.