I am a Senior Backend & Distributed Systems Engineer with over 11 years of industry experience. I specialize in building highly scalable, cloud-native infrastructure, with a deep focus on Go, Kubernetes, and LLM inference optimization.
Lately, I've been diving deep into the intersection of MLOps and cloud infrastructure—specifically optimizing distributed LLM serving, prefix-cache aware scheduling, and Kubernetes accelerator orchestration.
- Languages: Go (Golang), Python, Shell scripting
- Cloud & DevOps: Kubernetes (K8s), Helm, Docker, Microservices Architecture
- Systems & ML Infra: LLM Inference Optimization (vLLM, HMA scheduling), Distributed Systems, Auth (JWT, OAuth)
- LLM Inference Scaling: Actively working on state-of-the-art inference performance using modern accelerators on Kubernetes (
llm-d). - Kubernetes Ecosystem: Designing robust scheduling mechanisms (like Hybrid Memory Attention awareness) and upstreaming features/contributions.
- Open Source: Deeply passionate about giving back to the cloud-native and Kubernetes communities.
- 💼 LinkedIn: linkedin.com/in/kapiljain1989
- 📧 Email: kapiljain1989@gmail.com

