OpenVINO™ Model Server

Model Server hosts models and makes them accessible to software components over standard network protocols: a client sends a request to the model server, which performs model inference and sends a response back to the client. Model Server offers many advantages for efficient model deployment:

Remote inference enables using lightweight clients with only the necessary functions to perform API calls to edge or cloud deployments.
Applications are independent of the model framework, hardware device, and infrastructure.
Client applications in any programming language that supports REST or gRPC calls can be used to run inference remotely on the model server.
Clients require fewer updates since client libraries change very rarely.
Model topology and weights are not exposed directly to client applications, making it easier to control access to the model.
Ideal architecture for microservices-based applications and deployments in cloud environments – including Kubernetes and OpenShift clusters.
Efficient resource utilization with horizontal and vertical inference scaling.

OpenVINO™ Model Server (OVMS) is a high-performance system for serving models. Implemented in C++ for scalability and optimized for deployment on Intel architectures. It uses the same API as TensorFlow Serving and KServe while applying OpenVINO for inference execution. Inference service is provided via gRPC or REST API, making deploying new algorithms and AI experiments easy.

In addition, there are included endpoints for generative use cases compatible with OpenAI API and Cohere API.

The models used by the server need to be stored locally or hosted remotely by object storage services. For more details, refer to Preparing Model Repository documentation. Model server works inside Docker containers, on Bare Metal, and in Kubernetes environment. Start using OpenVINO Model Server with a fast-forward serving example from the QuickStart guide or LLM QuickStart guide.

Read release notes to find out what’s new.

Key features:

[NEW] Native Windows support. Check updated deployment guide
[NEW] Text Embeddings compatible with OpenAI API
[NEW] Reranking compatible with Cohere API
[NEW] Efficient Text Generation via OpenAI API
Python code execution
gRPC streaming
MediaPipe graphs serving
Model management - including model versioning and model updates in runtime
Dynamic model inputs
Directed Acyclic Graph Scheduler along with custom nodes in DAG pipelines
Metrics - metrics compatible with Prometheus standard
Support for multiple frameworks, such as TensorFlow, PaddlePaddle and ONNX
Support for AI accelerators

Check full list of features

Note: OVMS has been tested on RedHat, Ubuntu and Windows. Public docker images are stored in:

Dockerhub
RedHat Ecosystem Catalog Binary packages for Linux and Windows are on Github

Run OpenVINO Model Server

A demonstration on how to use OpenVINO Model Server can be found in our quick-start guide for vision use case and LLM text generation.

Check also other instructions:

Preparing model repository

Deployment

Writing client code

Demos

References

Contact

If you have a question, a feature request, or a bug report, feel free to submit a Github issue.

* Other names and brands may be claimed as the property of others.

Name		Name	Last commit message	Last commit date
Latest commit History 2,733 Commits
.github		.github
ci		ci
client		client
demos		demos
docs		docs
external		external
extras		extras
release_files		release_files
src		src
tests		tests
third_party		third_party
tools		tools
.bazelrc		.bazelrc
.bazelversion		.bazelversion
.clang-format		.clang-format
.dockerignore		.dockerignore
.gitignore		.gitignore
BUILD.bazel		BUILD.bazel
Dockerfile.redhat		Dockerfile.redhat
Dockerfile.ubuntu		Dockerfile.ubuntu
Doxyfile		Doxyfile
LICENSE		LICENSE
Makefile		Makefile
MakefileCapi		MakefileCapi
README.md		README.md
WORKSPACE		WORKSPACE
common_settings.bzl		common_settings.bzl
create_package.sh		create_package.sh
distro.bzl		distro.bzl
install_redhat_gpu_drivers.sh		install_redhat_gpu_drivers.sh
install_ubuntu_gpu_drivers.sh		install_ubuntu_gpu_drivers.sh
install_va.sh		install_va.sh
package.json		package.json
prepare_gpu_models.sh		prepare_gpu_models.sh
prepare_llm_models.sh		prepare_llm_models.sh
run_unit_tests.sh		run_unit_tests.sh
security.md		security.md
setupvars.bat		setupvars.bat
setupvars.ps1		setupvars.ps1
spelling-whitelist.txt		spelling-whitelist.txt
windows_build.bat		windows_build.bat
windows_change_test_configs.py		windows_change_test_configs.py
windows_clean_build.bat		windows_clean_build.bat
windows_create_package.bat		windows_create_package.bat
windows_install_build_dependencies.bat		windows_install_build_dependencies.bat
windows_prepare_llm_models.bat		windows_prepare_llm_models.bat
windows_prepare_python.bat		windows_prepare_python.bat
windows_set_ovms_version.py		windows_set_ovms_version.py
windows_setupvars.bat		windows_setupvars.bat
windows_test.bat		windows_test.bat
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenVINO™ Model Server

Key features:

Run OpenVINO Model Server

References

Contact

About

Releases 34

Packages

Contributors 54

Languages

License

openvinotoolkit/model_server

Folders and files

Latest commit

History

Repository files navigation

OpenVINO™ Model Server

Key features:

Run OpenVINO Model Server

References

Contact

About

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases 34

Packages 0

Contributors 54

Languages

Packages