Change the repository type filter
All
Repositories list
34 repositories
- A highly optimized LLM inference acceleration engine for Llama and its variants.
- TLLM_QMM strips the implementation of quantized kernels of Nvidia's TensorRT-LLM, removing NVInfer dependency and exposes ease of use Pytorch module. We modified the dequantation and weight preprocessing to align with popular quantization alogirthms such as AWQ and GPTQ, and combine them with new FP8 quantization.
norm
Public- A React-based web video player
- 🎆 A well-designed local image and video selector for Android
SERank
PublicAn efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow implementation of SERank model. The code is developed based on TF-Ranking.chaika
Publicpromate
PublicGraphite On VictoriaMetricscuBERT
PublicFast implementation of BERT inference directly on NVIDIA (CUDA, CUBLAS) and Intel MKLmirror
Publickids
Publiczetta-client-java
Publicpresto-connectors
Publicprotobuf
Publicphabricator
Publicarcanist
Public