CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model developed by NVIDIA for general computing on its graphical processing units (GPUs). CUDA allows developers to speed up compute-intensive applications by harnessing the power of GPUs for the parallelizable part of the computation. The CUDA platform includes a C/C++ based programming language, a set of libraries for deep learning (cuDNN), linear algebra (cuBLAS), and other domains, as well as a runtime and a driver. This comprehensive ecosystem is a major reason for NVIDIA's dominance in the fields of deep learning, scientific computing, and data science.
NVIDIA released the first version of CUDA in 2007, making GPU programming more accessible to developers beyond the graphics community. The project was born out of the recognition that GPUs, with their thousands of cores, were well-suited for parallel computing tasks beyond rendering graphics. The introduction of CUDA, along with the first G80 series of GPUs, marked a turning point for the use of GPUs in high-performance computing.
CUDA has become the de facto standard for GPU computing. The vast majority of deep learning frameworks, including TensorFlow, PyTorch, and JAX, are built on top of the CUDA platform, and their performance is heavily optimized for NVIDIA GPUs. This has created a significant 'moat' for NVIDIA, as a vast ecosystem of software, libraries, and developer expertise is centered around CUDA. This software advantage is a primary reason why competitors have found it difficult to gain traction in the AI hardware market, despite producing capable hardware.