I am a Senior Software Engineer with over 7 years of experience in designing and building large-scale, high-concurrency distributed systems using Java and microservices.
Currently, I'm expanding my expertise into AI Infrastructure and Large Language Model (LLM) Applications, applying my deep engineering background to build robust and efficient systems that power intelligent solutions. I am passionate about creating value at the intersection of solid software architecture and cutting-edge AI.
Based in Singapore right now, I am seeking roles in either AI Infrastructure.
My skills cover the full spectrum from foundational backend architecture to modern AI/ML infrastructure.
| Core Java & Distributed Systems | AI/ML & Big Data | 
|---|---|
| Java, Spring Boot, Spring Cloud, Mybatis-Plus | Python, PyTorch, Transformers, DeepSpeed, vLLM, ray, aiter | 
| Microservices, SaaS, Domain-Driven Design (DDD) | LLM Fine-tuning (LoRA), RAG | 
| Docker, Kubernetes, DevOps, gRPC, OpenFeign | Vector DB (Milvus, FAISS, ElasticSearch) | 
| Kafka, Zookeeper, Alibaba nacos, WebSocket, Netty | Delta Lake, Apache Flink, Apache Hudi, Apache Iceberg, Flink-CDC, Prometheus+Grafana | 
| MySQL, MongoDB, Neo4j, PostgreSQL, ElasticSearch, Redis, MinIO, SolrCloud, Hbase | ELK, Flume, Clickhouse, Drios | 
| System Design & Scalable Architecture | MLOps & Inference Optimization | 
Here are some projects that highlight my capabilities across both domains.
- 
RAG System for Domain-Specific Q&A - Engineered a Retrieval-Augmented Generation (RAG) pipeline using Mistral-7B,chatGPT-o4-mini,ElasticSearchfor a specialized knowledge domain.
- Optimized the system for real-time interaction through efficient data processing and a high-throughput inference server deployment.
 
- Engineered a Retrieval-Augmented Generation (RAG) pipeline using 
- 
LLM Inference Acceleration & Fine-tuning - Customized open-source LLMs (Mistral,Qwen,Llama) using LoRA fine-tuning techniques on specific datasets.
- Accelerated model inference significantly using vLLM and FlashAttention, deploying them as scalable API endpoints on cloud platforms like GCP and Azure.
 
- Customized open-source LLMs (
- 
vLLM & Transformers & DeepSpeed & Triton & Ray & flashinfer & pytorch Contribution (Ongoing) 
- 
Group-Level Multi-functional Payment Platform - Architected and developed a highly available, enterprise-grade payment center using Domain-Driven Design (DDD) and a robust microservices architecture.
- The system reliably handles millions of transactions in a specific period time and smoothly process tens of thousands of transactions or more every day, ensuring data consistency and security across various payment channels.
 
- 
High-Concurrency Instant Messaging (IM) System - Built a distributed IM system from the ground up to support millions of concurrent users.
- Leveraged a powerful tech stack including Spring Boot,WebSocket,Kafkafor message queuing, andZookeeperfor service coordination, achieving high throughput and low latency.
 
- 
Enterprise Search & Real-Time Data Platform - Designed a high-performance search engine using ElasticsearchandFlink-CDCcapable of indexing and searching billions of records with sub-second latency.
- Built the underlying real-time data synchronization pipeline, providing a unified data backbone for multiple business units.
 
- Designed a high-performance search engine using 
- 
Enterprise Flink computing Platform - Enterprise-level Flink Cluster: Built a unified Flink-based computing center for real-time data lake, stats center, and ETL pipelines.
 
etc......
I believe in continuous learning and sharing knowledge. I write about my journey in software architecture, distributed systems, and AI on my Medium blog.

