NVIDIA Corporation

All

595 repositories

Megatron-LM
Public
Ongoing research training transformer models at scale
transformers model-para large-language-models
Python
•
Other
•3.1k•14k•301•124•Updated Sep 16, 2025Sep 16, 2025
NeMo-Agent-Toolkit
Public
The NVIDIA NeMo Agent toolkit is an open-source library for efficiently connecting and optimizing teams of AI agents.
Python
•
Apache License 2.0
•362•1.4k•53•19•Updated Sep 16, 2025Sep 16, 2025
cccl
Public
CUDA Core Compute Libraries
cpp hpc gpu modern-cpp parallel-computing cuda nvidia gpu-acceleration cuda-kernels gpu-computing
C++
•
Other
•267•1.9k•1.1k•157•Updated Sep 16, 2025Sep 16, 2025
TensorRT-LLM
Public
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.
cuda pytorch moe blackwell llm-serving
C++
•
Apache License 2.0
•1.7k•12k•748•388•Updated Sep 16, 2025Sep 16, 2025
NeMo-Guardrails
Public
NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
Python
•
Other
•538•5.1k•127•59•Updated Sep 16, 2025Sep 16, 2025
cuda-python
Public
CUDA Python: Performance meets Productivity
Python
•
Other
•205•3k•149•13•Updated Sep 16, 2025Sep 16, 2025
Fuser
Public
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
C++
•
Other
•67•354•174•173•Updated Sep 16, 2025Sep 16, 2025
k8s-nim-operator
Public
An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.
Go
•
Apache License 2.0
•32•126•6•28•Updated Sep 16, 2025Sep 16, 2025
gpu-operator
Public
NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
kubernetes gpu cuda nvidia
Go
•
Apache License 2.0
•383•2.3k•385•73•Updated Sep 16, 2025Sep 16, 2025
cuda-quantum
Public
C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows
python cpp quantum quantum-computing hacktoberfest quantum-programming-language quantum-algorithms quantum-machine-learning unitaryhack
C++
•
Other
•283•789•389•87•Updated Sep 16, 2025Sep 16, 2025
VisRTX
Public
NVIDIA OptiX based implementation of ANARI
C++
•
Other
•34•263•8•0•Updated Sep 16, 2025Sep 16, 2025
digital-biology-examples
Public
NVIDIA Digital Biology examples for optimized inference and training at scale
48•174•2•2•Updated Sep 16, 2025Sep 16, 2025
QEMU
Public
NVIDIA fork of QEMU
C
•
Other
•3•6•0•0•Updated Sep 16, 2025Sep 16, 2025
TensorRT-Model-Optimizer
Public
A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed.
Python
•
Apache License 2.0
•156•1.4k•109•24•Updated Sep 16, 2025Sep 16, 2025
cutlass
Public
CUDA Templates for Linear Algebra Subroutines
deep-learning cpp nvidia deep-learning-library gpu cuda
C++
•
Other
•1.4k•8.4k•363•53•Updated Sep 15, 2025Sep 15, 2025
NeMo-Skills
Public
A project to improve skills of large language models
Python
•
Apache License 2.0
•98•559•40•8•Updated Sep 15, 2025Sep 15, 2025
Isaac-GR00T
Public
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model for generalized humanoid robot reasoning and skills.
Jupyter Notebook
•
Apache License 2.0
•739•4.9k•89•19•Updated Sep 15, 2025Sep 15, 2025
cloud-native-docs
Public
Documentation repository for NVIDIA Cloud Native Technologies
kubernetes containers kubernetes-operator
PowerShell
•
Apache License 2.0
•28•29•4•15•Updated Sep 15, 2025Sep 15, 2025
nv-ingest
Public
NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images that you can use in downstream generative applications.
Python
•
Apache License 2.0
•262•2.7k•90•31•Updated Sep 15, 2025Sep 15, 2025
cuEquivariance
Public
cuEquivariance is a math library that is a collective of low-level primitives and tensor ops to accelerate widely-used models, like DiffDock, MACE, Allegro and NEQUIP, based on equivariant neural networks. Also includes kernels for accelerated structure prediction.
Python
•19•299•4•5•Updated Sep 15, 2025Sep 15, 2025
MatX
Public
An efficient C++17 GPU numerical computing library with Python-like syntax
hpc gpu cuda gpgpu gpu-computing
C++
•
BSD 3-Clause "New" or "Revised" License
•106•1.4k•39•8•Updated Sep 15, 2025Sep 15, 2025
topograph
Public
A toolkit for discovering cluster network topology.
Go
•
Apache License 2.0
•6•68•1•1•Updated Sep 15, 2025Sep 15, 2025
TransformerEngine
Public
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
python machine-learning deep-learning gpu cuda pytorch jax fp8
Python
•
Apache License 2.0
•502•2.7k•216•86•Updated Sep 15, 2025Sep 15, 2025
cuda-checkpoint
Public
CUDA checkpoint and restore utility
cuda checkpoint
C
•
Other
•21•367•24•0•Updated Sep 15, 2025Sep 15, 2025
cuopt
Public
GPU accelerated decision optimization
gpu optimization cuda linear-programming
Cuda
•
Apache License 2.0
•71•420•93•25•Updated Sep 15, 2025Sep 15, 2025
cuDecomp
Public
An Adaptive Pencil Decomposition Library for NVIDIA GPUs
fft pencil-decomposition
C++
•
Other
•11•69•0•1•Updated Sep 15, 2025Sep 15, 2025
jitify
Public
A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
cuda runtime-compilation single-header jit-compilation cpp nvrtc
C++
•
BSD 3-Clause "New" or "Revised" License
•72•554•20•7•Updated Sep 15, 2025Sep 15, 2025
garak
Public
the LLM vulnerability scanner
ai vulnerability-assessment security-scanners llm-security llm-evaluation
Python
•
Apache License 2.0
•612•5.8k•286•34•Updated Sep 15, 2025Sep 15, 2025
nvshmem
Public
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmers to perform one-sided communication from within CUDA kernels and on CUDA streams.
C++
•
Other
•22•306•3•2•Updated Sep 15, 2025Sep 15, 2025
numba-cuda
Public
The CUDA target for Numba
Python
•
BSD 2-Clause "Simplified" License
•38•185•98•24•Updated Sep 15, 2025Sep 15, 2025