Files

.devops
.github
ci
cmake
common
docs
examples
- baby-llama
- batched-bench
- batched.swift
- batched
- beam-search
- benchmark
- convert-llama2c-to-ggml
- embedding
- eval-callback
- export-lora
- finetune
- gbnf-validator
- gguf-split
- gguf
- gritlm
- imatrix
- infill
- jeopardy
- llama-bench
- llama.android
- llama.swiftui
- llava
- lookahead
- lookup
- main-cmake-pkg
- main
- parallel
- passkey
- perplexity
- quantize-stats
- quantize
  - CMakeLists.txt
  - README.md
  - quantize.cpp
  - tests.sh
- retrieval
- save-load-state
- server
- simple
- speculative
- sycl
- tokenize
- train-text-from-scratch
- CMakeLists.txt
- Miku.sh
- alpaca.sh
- base-translate.sh
- chat-13B.bat
- chat-13B.sh
- chat-persistent.sh
- chat-vicuna.sh
- chat.sh
- gpt4all.sh
- json-schema-pydantic-example.py
- json_schema_to_grammar.py
- llama.vim
- llama2-13b.sh
- llama2.sh
- llm.vim
- make-ggml.py
- pydantic-models-to-grammar-examples.py
- pydantic_models_to_grammar.py
- reason-act.sh
- regex-to-grammar.py
- server-embd.py
- server-llama2-13B.sh
- ts-type-to-grammar.sh
ggml-cuda
gguf-py
grammars
kompute
kompute-shaders
media
models
pocs
prompts
requirements
scripts
spm-headers
tests
.clang-tidy
.dockerignore
.ecrc
.editorconfig
.flake8
.gitignore
.gitmodules
.pre-commit-config.yaml
AUTHORS
CMakeLists.txt
LICENSE
Makefile
Package.swift
README-sycl.md
README.md
SECURITY.md
build.zig
codecov.yml
convert-hf-to-gguf.py
convert-llama-ggml-to-gguf.py
convert-lora-to-ggml.py
convert-persimmon-to-gguf.py
convert.py
flake.lock
flake.nix
ggml-alloc.c
ggml-alloc.h
ggml-backend-impl.h
ggml-backend.c
ggml-backend.h
ggml-common.h
ggml-cuda.cu
ggml-cuda.h
ggml-impl.h
ggml-kompute.cpp
ggml-kompute.h
ggml-metal.h
ggml-metal.m
ggml-metal.metal
ggml-mpi.c
ggml-mpi.h
ggml-opencl.cpp
ggml-opencl.h
ggml-quants.c
ggml-quants.h
ggml-sycl.cpp
ggml-sycl.h
ggml-vulkan-shaders.hpp
ggml-vulkan.cpp
ggml-vulkan.h
ggml.c
ggml.h
ggml_vk_generate_shaders.py
llama.cpp
llama.h
mypy.ini
requirements.txt
sgemm.cpp
sgemm.h
unicode-data.cpp
unicode-data.h
unicode.cpp
unicode.h

quantize

quantize: add imatrix and dataset metadata in GGUF (ggml-org#6658 )

Apr 26, 2024

0c4d489 · Apr 26, 2024

History

This branch is 3 commits ahead of, 2357 commits behind ggml-org/llama.cpp:master.

Name	Name	Last commit message	Last commit date
parent directory ..
CMakeLists.txt	CMakeLists.txt	quantize: add imatrix and dataset metadata in GGUF (ggml-org#6658 )	Apr 26, 2024
README.md	README.md	chore: Fix markdown warnings (ggml-org#6625 )	Apr 12, 2024
quantize.cpp	quantize.cpp	quantize: add imatrix and dataset metadata in GGUF (ggml-org#6658 )	Apr 26, 2024
tests.sh	tests.sh	tests : minor bash stuff (ggml-org#6902 )	Apr 25, 2024

README.md

quantize

TODO

Llama 2 7B

Quantization	Bits per Weight (BPW)
Q2_K	3.35
Q3_K_S	3.50
Q3_K_M	3.91
Q3_K_L	4.27
Q4_K_S	4.58
Q4_K_M	4.84
Q5_K_S	5.52
Q5_K_M	5.68
Q6_K	6.56

Llama 2 13B

Quantization	Bits per Weight (BPW)
Q2_K	3.34
Q3_K_S	3.48
Q3_K_M	3.89
Q3_K_L	4.26
Q4_K_S	4.56
Q4_K_M	4.83
Q5_K_S	5.51
Q5_K_M	5.67
Q6_K	6.56

Llama 2 70B

Quantization	Bits per Weight (BPW)
Q2_K	3.40
Q3_K_S	3.47
Q3_K_M	3.85
Q3_K_L	4.19
Q4_K_S	4.53
Q4_K_M	4.80
Q5_K_S	5.50
Q5_K_M	5.65
Q6_K	6.56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

quantize

quantize

README.md

quantize

Llama 2 7B

Llama 2 13B

Llama 2 70B

Files

quantize

Directory actions

More options

Directory actions

More options

Latest commit

History

quantize

Folders and files

parent directory

README.md

quantize

Llama 2 7B

Llama 2 13B

Llama 2 70B