This project is no longer actively maintained. While existing releases remain available, there are no planned updates, bug fixes, new features, or security patches. Users should be aware that vulnerabilities may not be addressed.
In this example we will show how to serve Huggingface Transformers with TorchServe model locally using kserve.
Clone pytorch/serve repository.
Copy the Transformer_kserve_handler.py handler file to examples/Huggingface_Transformers
folder
Navigate to examples/Huggingface_Transformers
Run the following command to download the model
python Download_Transformer_models.py
torch-model-archiver --model-name BERTSeqClassification --version 1.0 \
--serialized-file Transformer_model/pytorch_model.bin \
--handler ./Transformer_kserve_handler.py \
--extra-files "Transformer_model/config.json,./setup_config.json,./Seq_classification_artifacts/index_to_name.json,./Transformer_handler_generalized.py"
The command will create BERTSeqClassification.mar
file in current directory
Move the mar file to model-store
sudo mv BERTSeqClassification.mar /mnt/models/model-store
and use the following config properties (/mnt/models/config
)
inference_address=http://127.0.0.1:8085
management_address=http://127.0.0.1:8085
metrics_address=http://127.0.0.1:8082
enable_envvars_config=true
install_py_dep_per_model=true
enable_metrics_api=true
service_envelope=kservev2
metrics_mode=prometheus
NUM_WORKERS=1
number_of_netty_threads=4
job_queue_size=10
model_store=/mnt/models/model-store
model_snapshot={"name":"startup.cfg","modelCount":1,"models":{"BERTSeqClassification":{"1.0":{"defaultVersion":true,"marName":"BERTSeqClassification.mar","minWorkers":1,"maxWorkers":5,"batchSize":1,"maxBatchDelay":5000,"responseTimeout":120}}}}
Use bert_bytes_v2.json or bert_tensor_v2.
For new sample text, follow the instructions below
For bytes input, use tobytes utility.
python tobytes.py --input_text "this year business is good"
For tensor input, use bert_tokenizer utility
python bert_tokenizer.py --input_text "this year business is good"
Start TorchServe
torchserve --start --ts-config /mnt/models/config/config.properties --ncs
To test locally, clone TorchServe and move to the following folder kubernetes/kserve/kserve_wrapper
Start Kserve
python __main__.py
Navigate to kubernetes/kserve/kf_request_json/v2/bert
Run the following command
curl -v -H "ContentType: application/json" http://localhost:8080/v2/models/BERTSeqClassification/infer -d @./bert_bytes_v2.json
Expected Output
{"id": "d3b15cad-50a2-4eaf-80ce-8b0a428bd298", "model_name": "BERTSeqClassification", "model_version": "1.0", "outputs": [{"name": "predict", "shape": [], "datatype": "BYTES", "data": ["Not Accepted"]}]}
Run the following command
curl -v -H "ContentType: application/json" http://localhost:8080/v2/models/BERTSeqClassification/infer -d @./bert_tensor_v2.json
Expected output
{"id": "33abc661-7265-42fc-b7d9-44e5f79a7a67", "model_name": "BERTSeqClassification", "model_version": "1.0", "outputs": [{"name": "predict", "shape": [], "datatype": "BYTES", "data": ["Not Accepted"]}]}