Add request envelope to support Multiserving frameworks (#749)

abishekchiffon · shivamshriwas · harshbafna · web-flow · commit abcaf1a062dd · 2020-11-23T14:16:44.000-08:00
* - updated image classifier default handler - updated custom resnet and mnist handlers - update docs * documentation update - removed stale Transformer_readme.md * updated docs * Doc restructure and code fixes * Updated example as per fix in default handlers * Enhanced base and custom handler examples * Missing checking for manifest * Fixed some typos * Removed commented code * Refactor BaseHandler * Adding in unit tests * Fixed gitignore in this branch * Fix a bug with Image Segmenter * Updated Object Detector to reuse functionality; consistency * Fix pylint errors * Backwards compat for index_names.json * Fixed Image Segmenter again * Made the compat layer in text actually compat. * Removed batching from text classifier * Adding comments per review. * Fixing doc feedback. * Updating docs about batching. * Initial commit of envelopes. * Got the end-to-end working * Undoing a change for local stuff. * Fixing a few broken tests. * Fixed error introduced due to conflict resolution via web based merge tool * Corrected code comment * - Updated Object detection & text classification handlers - updated docs * fixed python linting errors * updated index to name json text classifier * fixed object detector handler for batch support * Fixed the batch inference output * update expected output as per new handler changes * updated text classification mar name in sanity suite * updated text classifier mar name and removed bert scripted models * updated model zoo with new text classification url * added model_name in while registering model in sanity suite * updated text classification model name * added upgrade option for installing python dependencies in install utils * added upgrade option for installing python dependencies and extra numpy package in regression suite * refectored pytests in regression suite for better performance and reporting * Merge upstream * Got the end-to-end working * Merge upstream (2) * Fixing a few broken tests. * Undoing a bad merge. * minor fix in torch-archiver command * reverted postprocess removal * updated mar files in model zoo to use updated handlers * updated regression suite to use updated mar files * suppressed pylint warning in UT * fixed resnet-152 mar name and expected output * updated inference tests data - added tolerence value for resent152 models * Added custom handler in vgg11 example (#559) * added custom handler for vgg11 * added readme for vgg11 example * fixed typo in readme * updated model zoo * reverted back changes for scripted vgg11 mar file * added vgg11 model to regression test suite * disabled pylint check in UT * updated expected response for vgg11 inference in regression suite * updated expected response for vgg11 inference in regression suite * updated expected for densenet scripted * Fixing bad file format * Fixed the 'no newman npm' issue in regression test suite, solution suggested in PR 757 * Fixed the metrics bug #772 for test_envelopes Co-authored-by: Shivam Shriwas <shivamshriwas21@gmail.com> Co-authored-by: Harsh Bafna <harshbafna619@gmail.com> Co-authored-by: dhaniram kshirsagar <dhaniram_kshirsagar@persistent.com> Co-authored-by: dhaniram-kshirsagar <26479924+dhaniram-kshirsagar@users.noreply.github.com> Co-authored-by: Henry Tappen <htappen@gmail.com> Co-authored-by: harshbafna <harsh_bafna@persistent.co.in> Co-authored-by: Aaqib <maaquib@gmail.com> Co-authored-by: Henry Tappen <htappen@google.com> Co-authored-by: Geeta Chauhan <4461127+chauhang@users.noreply.github.com>
diff --git a/.gitignore b/.gitignore
@@ -4,6 +4,8 @@ dist/
 *__pycache__*
 *.egg-info/
 .idea
+*htmlcov*
+.coverage
 .github/actions/
 .github/.DS_Store
-.DS_Store
+.DS_Store
diff --git a/docs/default_handlers.md b/docs/default_handlers.md
@@ -46,4 +46,7 @@ For more details see [examples](https://github.com/pytorch/serve/tree/master/exa
 - [object_detector](https://github.com/pytorch/serve/tree/master/examples/object_detector/index_to_name.json)
 
 # Contributing
-If you'd like to edit or create a new default_handler class, make sure to run and update the unit tests in [unit_tests](https://github.com/pytorch/serve/tree/master/ts/torch_handler/unit_tests). As always, make sure to run [torchserve_sanity.py](https://github.com/pytorch/serve/tree/master/torchserve_sanity.py) before submitting.
+If you'd like to edit or create a new default_handler class, you need to take the following steps:
+1. Write a new class derived from BaseHandler. Add it as a separate file in `ts/torch_handler/`
+1. Update `model-archiver/model_packaging.py` to add in your classes name
+1. Run and update the unit tests in [unit_tests](https://github.com/pytorch/serve/tree/master/ts/torch_handler/unit_tests). As always, make sure to run [torchserve_sanity.py](https://github.com/pytorch/serve/tree/master/torchserve_sanity.py) before submitting.
diff --git a/docs/request_envelopes.md b/docs/request_envelopes.md
@@ -0,0 +1,18 @@
+# Introduction
+
+Many model serving systems provide a signature for request bodies. Examples include:
+
+- [Seldon](https://docs.seldon.io/projects/seldon-core/en/v1.1.0/graph/protocols.html)
+- [KFServing](https://github.com/kubeflow/kfserving/tree/master/docs)
+- [Google Cloud AI Platform](https://cloud.google.com/ai-platform/prediction/docs/online-predict)
+
+Data scientists use these multi-framework systems to manage deployments of many different models, possibly written in different languages and frameworks. The platforms offer additional analytics on top of model serving, including skew detection, explanations and A/B testing. These platforms need a well-structured signature in order to both standardize calls across different frameworks and to understand the input data. To simplify support for many frameworks, though, these platforms will simply pass the request body along to the underlying model server.
+
+Torchserve currently has no fixed request body signature. Envelopes allow you to automatically translate from the fixed signature required for your model orchestrator to a flat Python list.
+
+# Usage
+1. When you write a handler, always expect a plain Python list containing data ready to go into `preprocess`. Crucially, you should assume that your handler code looks the same locally or in your model orchestrator.
+1. When you deploy Torchserve behind a model orchestrator, make sure to set the corresponding `service_envelope` in your `config.properties` file. For example, if you're using Google Cloud AI Platform, which has a JSON format, you'd add `service_envelope=json` to your `config.properties` file.
+
+# Contributing
+Add new files under `ts/torch_handler/request_envelope`. Only include one class per file. The key used in `config.properties` will be the name of the .py file you write your class in.
diff --git a/frontend/modelarchive/src/main/java/org/pytorch/serve/archive/Manifest.java b/frontend/modelarchive/src/main/java/org/pytorch/serve/archive/Manifest.java
@@ -62,6 +62,7 @@ public static final class Model {
         private String description;
         private String modelVersion;
         private String handler;
+        private String envelope;
         private String requirementsFile;
 
         public Model() {}
@@ -113,6 +114,14 @@ public String getHandler() {
         public void setHandler(String handler) {
             this.handler = handler;
         }
+
+        public String getEnvelope() {
+            return envelope;
+        }
+
+        public void setEnvelope(String envelope) {
+            this.envelope = envelope;
+        }
     }
 
     public enum RuntimeType {
diff --git a/frontend/server/src/main/java/org/pytorch/serve/util/ConfigManager.java b/frontend/server/src/main/java/org/pytorch/serve/util/ConfigManager.java
@@ -74,6 +74,7 @@ public final class ConfigManager {
     private static final String TS_MAX_REQUEST_SIZE = "max_request_size";
     private static final String TS_MAX_RESPONSE_SIZE = "max_response_size";
     private static final String TS_DEFAULT_SERVICE_HANDLER = "default_service_handler";
+    private static final String TS_SERVICE_ENVELOPE = "service_envelope";
     private static final String TS_MODEL_SERVER_HOME = "model_server_home";
     private static final String TS_MODEL_STORE = "model_store";
     private static final String TS_SNAPSHOT_STORE = "snapshot_store";
@@ -314,6 +315,10 @@ public String getTsDefaultServiceHandler() {
         return getProperty(TS_DEFAULT_SERVICE_HANDLER, null);
     }
 
+    public String getTsServiceEnvelope() {
+        return getProperty(TS_SERVICE_ENVELOPE, null);
+    }
+
     public Properties getConfiguration() {
         return (Properties) prop.clone();
     }
diff --git a/frontend/server/src/main/java/org/pytorch/serve/util/codec/ModelRequestEncoder.java b/frontend/server/src/main/java/org/pytorch/serve/util/codec/ModelRequestEncoder.java
@@ -43,10 +43,23 @@ protected void encode(ChannelHandlerContext ctx, BaseModelRequest msg, ByteBuf o
                 buf = handler.getBytes(StandardCharsets.UTF_8);
             }
 
+            // TODO: this might be a bug. If handler isn't specified, this
+            // will repeat the model path
             out.writeInt(buf.length);
             out.writeBytes(buf);
 
             out.writeInt(request.getGpuId());
+
+            String envelope = request.getEnvelope();
+            if (envelope != null) {
+                buf = envelope.getBytes(StandardCharsets.UTF_8);
+            } else {
+                buf = new byte[0];
+            }
+
+            out.writeInt(buf.length);
+            out.writeBytes(buf);
+
         } else if (msg instanceof ModelInferenceRequest) {
             out.writeByte('I');
             ModelInferenceRequest request = (ModelInferenceRequest) msg;
diff --git a/frontend/server/src/main/java/org/pytorch/serve/util/messages/ModelLoadModelRequest.java b/frontend/server/src/main/java/org/pytorch/serve/util/messages/ModelLoadModelRequest.java
@@ -11,6 +11,7 @@ public class ModelLoadModelRequest extends BaseModelRequest {
     private String modelPath;
 
     private String handler;
+    private String envelope;
     private int batchSize;
     private int gpuId;
 
@@ -19,6 +20,7 @@ public ModelLoadModelRequest(Model model, int gpuId) {
         this.gpuId = gpuId;
         modelPath = model.getModelDir().getAbsolutePath();
         handler = model.getModelArchive().getManifest().getModel().getHandler();
+        envelope = model.getModelArchive().getManifest().getModel().getEnvelope();
         batchSize = model.getBatchSize();
     }
 
@@ -30,6 +32,10 @@ public String getHandler() {
         return handler;
     }
 
+    public String getEnvelope() {
+        return envelope;
+    }
+
     public int getBatchSize() {
         return batchSize;
     }
diff --git a/frontend/server/src/main/java/org/pytorch/serve/wlm/ModelManager.java b/frontend/server/src/main/java/org/pytorch/serve/wlm/ModelManager.java
@@ -155,6 +155,8 @@ private ModelArchive createModelArchive(
             archive.getManifest().getModel().setHandler(configManager.getTsDefaultServiceHandler());
         }
 
+        archive.getManifest().getModel().setEnvelope(configManager.getTsServiceEnvelope());
+
         archive.validate();
 
         return archive;
diff --git a/test/postman/increased_timeout_inference.json b/test/postman/increased_timeout_inference.json
@@ -16,4 +16,4 @@
         },
         "tolerance":5
     }
-]
+]
diff --git a/ts/model_loader.py b/ts/model_loader.py
@@ -34,7 +34,7 @@ class ModelLoader(object):
     __metaclass__ = ABCMeta
 
     @abstractmethod
-    def load(self, model_name, model_dir, handler, gpu_id, batch_size):
+    def load(self, model_name, model_dir, handler, gpu_id, batch_size, envelope=None):
         """
         Load model from file.
 
@@ -43,6 +43,7 @@ def load(self, model_name, model_dir, handler, gpu_id, batch_size):
         :param handler:
         :param gpu_id:
         :param batch_size:
+        :param envelope:
         :return: Model
         """
         # pylint: disable=unnecessary-pass
@@ -54,7 +55,7 @@ class TsModelLoader(ModelLoader):
     TorchServe 1.0 Model Loader
     """
 
-    def load(self, model_name, model_dir, handler, gpu_id, batch_size):
+    def load(self, model_name, model_dir, handler, gpu_id, batch_size, envelope=None):
         """
         Load TorchServe 1.0 model from file.
 
@@ -63,6 +64,7 @@ def load(self, model_name, model_dir, handler, gpu_id, batch_size):
         :param handler:
         :param gpu_id:
         :param batch_size:
+        :param envelope:
         :return:
         """
         logging.debug("Loading model - working dir: %s", os.getcwd())
@@ -74,48 +76,71 @@ def load(self, model_name, model_dir, handler, gpu_id, batch_size):
             with open(manifest_file) as f:
                 manifest = json.load(f)
 
+        function_name = None
         try:
-            temp = handler.split(":", 1)
-            module_name = temp[0]
-            function_name = None if len(temp) == 1 else temp[1]
-            if module_name.endswith(".py"):
-                module_name = module_name[:-3]
-            module_name = module_name.split("/")[-1]
-            module = importlib.import_module(module_name)
-            # pylint: disable=unused-variable
-        except ImportError as e:
-            module_name = ".{0}".format(handler)
-            module = importlib.import_module(module_name, 'ts.torch_handler')
-            function_name = None
+            module, function_name = self._load_handler_file(handler)
+        except ImportError:
+            module = self._load_default_handler(handler)
 
         if module is None:
             raise ValueError("Unable to load module {}, make sure it is added to python path".format(module_name))
-        if function_name is None:
-            function_name = "handle"
-        if hasattr(module, function_name):
-            entry_point = getattr(module, function_name)
-            service = Service(model_name, model_dir, manifest, entry_point, gpu_id, batch_size)
 
-            service.context.metrics = metrics
-            # initialize model at load time
-            entry_point(None, service.context)
+        envelope_class = None
+        if envelope is not None:
+            envelope_class = self._load_default_envelope(envelope)
+
+        function_name = function_name or "handle"
+        if hasattr(module, function_name):
+            entry_point, initialize_fn = self._get_function_entry_point(module, function_name)
         else:
-            model_class_definitions = list_classes_from_module(module)
-            if len(model_class_definitions) != 1:
-                raise ValueError("Expected only one class in custom service code or a function entry point {}".format(
-                    model_class_definitions))
-
-            model_class = model_class_definitions[0]
-            model_service = model_class()
-            handle = getattr(model_service, "handle")
-            if handle is None:
-                raise ValueError("Expect handle method in class {}".format(str(model_class)))
-
-            service = Service(model_name, model_dir, manifest, model_service.handle, gpu_id, batch_size)
-            initialize = getattr(model_service, "initialize")
-            if initialize is not None:
-                model_service.initialize(service.context)
-            else:
-                raise ValueError("Expect initialize method in class {}".format(str(model_class)))
+            entry_point, initialize_fn = self._get_class_entry_point(module)
+
+        if envelope_class is not None:
+            envelope_instance = envelope_class(entry_point)
+            entry_point = envelope_instance.handle
+
+        service = Service(model_name, model_dir, manifest, entry_point, gpu_id, batch_size)
+        service.context.metrics = metrics
+        initialize_fn(service.context)
 
         return service
+
+    def _load_handler_file(self, handler):
+        temp = handler.split(":", 1)
+        module_name = temp[0]
+        function_name = None if len(temp) == 1 else temp[1]
+        if module_name.endswith(".py"):
+            module_name = module_name[:-3]
+        module_name = module_name.split("/")[-1]
+        module = importlib.import_module(module_name)
+        return module, function_name
+
+    def _load_default_handler(self, handler):
+        module_name = ".{0}".format(handler)
+        module = importlib.import_module(module_name, 'ts.torch_handler')
+        return module
+
+    def _load_default_envelope(self, envelope):
+        module_name = ".{0}".format(envelope)
+        module = importlib.import_module(module_name, 'ts.torch_handler.request_envelope')
+        envelope_class = list_classes_from_module(module)[0]
+        return envelope_class
+
+    def _get_function_entry_point(self, module, function_name):
+        entry_point = getattr(module, function_name)
+        initialize_fn = lambda ctx: entry_point(None, ctx)
+        return entry_point, initialize_fn
+
+    def _get_class_entry_point(self, module):
+        model_class_definitions = list_classes_from_module(module)
+        if len(model_class_definitions) != 1:
+            raise ValueError("Expected only one class in custom service code or a function entry point {}".format(
+                model_class_definitions))
+
+        model_class = model_class_definitions[0]
+        model_service = model_class()
+
+        if not hasattr(model_service, "handle"):
+            raise ValueError("Expect handle method in class {}".format(str(model_class)))
+
+        return model_service.handle, model_service.initialize
diff --git a/ts/model_service_worker.py b/ts/model_service_worker.py
@@ -63,6 +63,7 @@ def load_model(load_model_request):
             "modelName" : "name", string
             "gpu" : None if CPU else gpu_id, int
             "handler" : service handler entry point if provided, string
+            "envelope" : name of wrapper/unwrapper of request data if provided, string
             "batchSize" : batch size, int
         }
 
@@ -73,6 +74,9 @@ def load_model(load_model_request):
             model_dir = load_model_request["modelPath"].decode("utf-8")
             model_name = load_model_request["modelName"].decode("utf-8")
             handler = load_model_request["handler"].decode("utf-8") if load_model_request["handler"] else None
+            envelope = load_model_request["envelope"].decode("utf-8") if "envelope" in load_model_request else None
+            envelope = envelope if envelope is not None and len(envelope) > 0 else None
+
             batch_size = None
             if "batchSize" in load_model_request:
                 batch_size = int(load_model_request["batchSize"])
@@ -82,7 +86,7 @@ def load_model(load_model_request):
                 gpu = int(load_model_request["gpu"])
 
             model_loader = ModelLoaderFactory.get_model_loader()
-            service = model_loader.load(model_name, model_dir, handler, gpu, batch_size)
+            service = model_loader.load(model_name, model_dir, handler, gpu, batch_size, envelope)
 
             logging.debug("Model %s loaded.", model_name)
 
diff --git a/ts/protocol/otf_message_handler.py b/ts/protocol/otf_message_handler.py
@@ -192,6 +192,9 @@ def _retrieve_load_msg(conn):
     if gpu_id >= 0:
         msg["gpu"] = gpu_id
 
+    length = _retrieve_int(conn)
+    msg["envelope"] = _retrieve_buffer(conn, length)
+
     return msg
 
 
diff --git a/ts/tests/unit_tests/test_model_service_worker.py b/ts/tests/unit_tests/test_model_service_worker.py
@@ -26,7 +26,8 @@ def socket_patches(mocker):
         b"\x00\x00\x00\x0a", b"model_path",
         b"\x00\x00\x00\x01",
         b"\x00\x00\x00\x07", b"handler",
-        b"\x00\x00\x00\x01"
+        b"\x00\x00\x00\x01",
+        b"\x00\x00\x00\x08", b"envelope",
     ]
     return mock_patch
 
diff --git a/ts/tests/unit_tests/test_otf_codec_protocol.py b/ts/tests/unit_tests/test_otf_codec_protocol.py
@@ -33,15 +33,17 @@ def test_retrieve_msg_unknown(self, socket_patches):
 
     def test_retrieve_msg_load_gpu(self, socket_patches):
         expected = {"modelName": b"model_name", "modelPath": b"model_path",
-                    "batchSize": 1, "handler": b"handler", "gpu": 1}
+                    "batchSize": 1, "handler": b"handler", "gpu": 1,
+                    "envelope": b"envelope"}
 
         socket_patches.socket.recv.side_effect = [
             b"L",
             b"\x00\x00\x00\x0a", b"model_name",
             b"\x00\x00\x00\x0a", b"model_path",
             b"\x00\x00\x00\x01",
             b"\x00\x00\x00\x07", b"handler",
-            b"\x00\x00\x00\x01"
+            b"\x00\x00\x00\x01",
+            b"\x00\x00\x00\x08", b"envelope"
         ]
         cmd, ret = codec.retrieve_msg(socket_patches.socket)
 
@@ -50,15 +52,17 @@ def test_retrieve_msg_load_gpu(self, socket_patches):
 
     def test_retrieve_msg_load_no_gpu(self, socket_patches):
         expected = {"modelName": b"model_name", "modelPath": b"model_path",
-                    "batchSize": 1, "handler": b"handler"}
+                    "batchSize": 1, "handler": b"handler",
+                    "envelope": b"envelope"}
 
         socket_patches.socket.recv.side_effect = [
             b"L",
             b"\x00\x00\x00\x0a", b"model_name",
             b"\x00\x00\x00\x0a", b"model_path",
             b"\x00\x00\x00\x01",
             b"\x00\x00\x00\x07", b"handler",
-            b"\xFF\xFF\xFF\xFF"
+            b"\xFF\xFF\xFF\xFF",
+            b"\x00\x00\x00\x08", b"envelope"
         ]
         cmd, ret = codec.retrieve_msg(socket_patches.socket)
 
diff --git a/ts/torch_handler/request_envelope/__init__.py b/ts/torch_handler/request_envelope/__init__.py
diff --git a/ts/torch_handler/request_envelope/base.py b/ts/torch_handler/request_envelope/base.py
diff --git a/ts/torch_handler/request_envelope/body.py b/ts/torch_handler/request_envelope/body.py
diff --git a/ts/torch_handler/request_envelope/json.py b/ts/torch_handler/request_envelope/json.py
diff --git a/ts/torch_handler/unit_tests/run_unit_tests.sh b/ts/torch_handler/unit_tests/run_unit_tests.sh
diff --git a/ts/torch_handler/unit_tests/test_envelopes.py b/ts/torch_handler/unit_tests/test_envelopes.py

Original file line number	Diff line number	Diff line change
`@@ -155,6 +155,8 @@ private ModelArchive createModelArchive(`
`155`	`155`	`archive.getManifest().getModel().setHandler(configManager.getTsDefaultServiceHandler());`
`156`	`156`	`}`
`157`	`157`
	`158`	`+ archive.getManifest().getModel().setEnvelope(configManager.getTsServiceEnvelope());`
	`159`	`+`
`158`	`160`	`archive.validate();`
`159`	`161`
`160`	`162`	`return archive;`
Original file line number	Diff line number	Diff line change
`@@ -16,4 +16,4 @@`
`16`	`16`	`},`
`17`	`17`	`"tolerance":5`
`18`	`18`	`}`
`19`		`-]`
	`19`	`+]`