Model Deployers

Clipper provides a collection of model deployer modules to simplify the process of deploying a trained model to Clipper and avoid the need to figure out how to save models and build custom Docker containers capable of serving the saved models for some common use cases. With these modules, you can deploy models directly from Python to Clipper.

Currently, Clipper provides the following deployer modules:

  1. Arbitrary Python functions
  2. PySpark Models
  3. PyTorch Models
  4. Tensorflow Models
  5. MXNet Models
  6. PyTorch Models exported as ONNX file with Caffe2 Serving Backend (Experimental)
  7. Keras Models

These deployers support function that can only be pickled using Cloudpickle and/or pure python libraries that can be installed via pip. For reference, please use the following flowchart to make decision about which deployer to use.

digraph foo {
   "Pure Python?" -> "Use python deployer & pkgs_to_install arg" [ label="Yes" ];
   "Pure Python?" -> "Does Clipper provide a deployer?" [ label="No" ];
   "Does Clipper provide a deployer?" -> "Use {PyTorch | TensorFlow | PySpark | ...} deployers" [ label="Yes" ];
   "Does Clipper provide a deployer?" -> "Build your own container" [ label="No" ];
}

Note

You can find additional examples of using model deployers in Clipper’s integration tests.

Pure Python functions

This module supports deploying pure Python function closures to Clipper. A function deployed with this module must take a list of inputs as the sole argument, and return a list of strings of exactly the same length. The reason the prediction function takes a list of inputs rather than a single input is to provide models the possibility of computing multiple predictions in parallel to improve model performance. For example, many models that run on a GPU can significantly improve throughput by batching predictions to better utilize the many parallel cores of the GPU.

In addition, the function must only use pure Python code. More specifically, all of the state captured by the function will be pickled using Cloudpickle, so any state captured by the function must be able to be pickled. Most Python libraries that use C extensions create objects that cannot be pickled. This includes many common machine-learning frameworks such as PySpark, TensorFlow, PyTorch, and Caffe. You will have to use Clipper provided containers or create your own Docker containers and call the native serialization libraries of these frameworks in order to deploy them.

While this deployer will serialize your function, any Python libraries that the function depends on must be installed in the container to be able to load the function inside the model container. You can specify these libraries using the pkgs_to_install argument. All the packages specified by that argument will be installed in the container with pip prior to running it.

If your function has dependencies that cannot be installed directly with pip, you will need to build your own container.

clipper_admin.deployers.python.deploy_python_closure(clipper_conn, name, version, input_type, func, base_image='default', labels=None, registry=None, num_replicas=1, batch_size=-1, pkgs_to_install=None)[source]

Deploy an arbitrary Python function to Clipper.

The function should take a list of inputs of the type specified by input_type and return a Python list or numpy array of predictions as strings.

Parameters:
  • clipper_conn (clipper_admin.ClipperConnection()) – A ClipperConnection object connected to a running Clipper cluster.
  • name (str) – The name to be assigned to both the registered application and deployed model.
  • version (str) – The version to assign this model. Versions must be unique on a per-model basis, but may be re-used across different models.
  • input_type (str) – The input_type to be associated with the registered app and deployed model. One of “integers”, “floats”, “doubles”, “bytes”, or “strings”.
  • func (function) – The prediction function. Any state associated with the function will be captured via closure capture and pickled with Cloudpickle.
  • base_image (str, optional) – The base Docker image to build the new model image from. This image should contain all code necessary to run a Clipper model container RPC client.
  • labels (list(str), optional) – A list of strings annotating the model. These are ignored by Clipper and used purely for user annotations.
  • registry (str, optional) – The Docker container registry to push the freshly built model to. Note that if you are running Clipper on Kubernetes, this registry must be accesible to the Kubernetes cluster in order to fetch the container from the registry.
  • num_replicas (int, optional) – The number of replicas of the model to create. The number of replicas for a model can be changed at any time with clipper.ClipperConnection.set_num_replicas().
  • batch_size (int, optional) – The user-defined query batch size for the model. Replicas of the model will attempt to process at most batch_size queries simultaneously. They may process smaller batches if batch_size queries are not immediately available. If the default value of -1 is used, Clipper will adaptively calculate the batch size for individual replicas of this model.
  • pkgs_to_install (list (of strings), optional) – A list of the names of packages to install, using pip, in the container. The names must be strings.

Example

Define a pre-processing function center() and train a model on the pre-processed input:

from clipper_admin import ClipperConnection, DockerContainerManager
from clipper_admin.deployers.python import deploy_python_closure
import numpy as np
import sklearn

clipper_conn = ClipperConnection(DockerContainerManager())

# Connect to an already-running Clipper cluster
clipper_conn.connect()

def center(xs):
    means = np.mean(xs, axis=0)
    return xs - means

centered_xs = center(xs)
model = sklearn.linear_model.LogisticRegression()
model.fit(centered_xs, ys)

# Note that this function accesses the trained model via closure capture,
# rather than having the model passed in as an explicit argument.
def centered_predict(inputs):
    centered_inputs = center(inputs)
    # model.predict returns a list of predictions
    preds = model.predict(centered_inputs)
    return [str(p) for p in preds]

deploy_python_closure(
    clipper_conn,
    name="example",
    input_type="doubles",
    func=centered_predict)
clipper_admin.deployers.python.create_endpoint(clipper_conn, name, input_type, func, default_output='None', version=1, slo_micros=3000000, labels=None, registry=None, base_image='default', num_replicas=1, batch_size=-1, pkgs_to_install=None)[source]

Registers an application and deploys the provided predict function as a model.

Parameters:
  • clipper_conn (clipper_admin.ClipperConnection()) – A ClipperConnection object connected to a running Clipper cluster.
  • name (str) – The name to be assigned to both the registered application and deployed model.
  • input_type (str) – The input_type to be associated with the registered app and deployed model. One of “integers”, “floats”, “doubles”, “bytes”, or “strings”.
  • func (function) – The prediction function. Any state associated with the function will be captured via closure capture and pickled with Cloudpickle.
  • default_output (str, optional) – The default output for the application. The default output will be returned whenever an application is unable to receive a response from a model within the specified query latency SLO (service level objective). The reason the default output was returned is always provided as part of the prediction response object. Defaults to “None”.
  • version (str, optional) – The version to assign this model. Versions must be unique on a per-model basis, but may be re-used across different models.
  • slo_micros (int, optional) – The query latency objective for the application in microseconds. This is the processing latency between Clipper receiving a request and sending a response. It does not account for network latencies before a request is received or after a response is sent. If Clipper cannot process a query within the latency objective, the default output is returned. Therefore, it is recommended that the SLO not be set aggressively low unless absolutely necessary. 100000 (100ms) is a good starting value, but the optimal latency objective will vary depending on the application.
  • labels (list(str), optional) – A list of strings annotating the model. These are ignored by Clipper and used purely for user annotations.
  • registry (str, optional) – The Docker container registry to push the freshly built model to. Note that if you are running Clipper on Kubernetes, this registry must be accessible to the Kubernetes cluster in order to fetch the container from the registry.
  • base_image (str, optional) – The base Docker image to build the new model image from. This image should contain all code necessary to run a Clipper model container RPC client.
  • num_replicas (int, optional) – The number of replicas of the model to create. The number of replicas for a model can be changed at any time with clipper.ClipperConnection.set_num_replicas().
  • batch_size (int, optional) – The user-defined query batch size for the model. Replicas of the model will attempt to process at most batch_size queries simultaneously. They may process smaller batches if batch_size queries are not immediately available. If the default value of -1 is used, Clipper will adaptively calculate the batch size for individual replicas of this model.
  • pkgs_to_install (list (of strings), optional) – A list of the names of packages to install, using pip, in the container. The names must be strings.

PySpark Models

The PySpark model deployer module provides a small extension to the Python closure deployer to allow you to deploy Python functions that include PySpark models as part of the state. PySpark models cannot be pickled and so they break the Python closure deployer. Instead, they must be saved using the native PySpark save and load APIs. To get around this limitation, the PySpark model deployer introduces two changes to the Python closure deployer discussed above.

First, a function deployed with this module takes two additional arguments: a PySpark SparkSession object and a PySpark model object, along with a list of inputs as provided to the Python closures in the deployers.python module. It must still return a list of strings of the same length as the list of inputs.

Second, the pyspark.deploy_pyspark_model and pyspark.create_endpoint deployment methods introduce two additional arguments:

  • pyspark_model: A PySpark model object. This model will be serialized using the native PySpark serialization API and loaded into the deployed model container. The model container creates a long-lived SparkSession when it is first initialized and uses that to load this model once at initialization time. The long-lived SparkSession and loaded model are provided by the container as arguments to the prediction function each time the model container receives a new prediction request.
  • sc: The current SparkContext. The PySpark model serialization API requires the SparkContext as an argument

The effect of these two changes is to allow the deployed prediction function to capture all pure Python state through closure capture but explicitly declare the additional PySpark state which must be saved and loaded through a separate process.

clipper_admin.deployers.pyspark.deploy_pyspark_model(clipper_conn, name, version, input_type, func, pyspark_model, sc, base_image='default', labels=None, registry=None, num_replicas=1, batch_size=-1, pkgs_to_install=None)[source]

Deploy a Python function with a PySpark model.

The function must take 3 arguments (in order): a SparkSession, the PySpark model, and a list of inputs. It must return a list of strings of the same length as the list of inputs.

Parameters:
  • clipper_conn (clipper_admin.ClipperConnection()) – A ClipperConnection object connected to a running Clipper cluster.
  • name (str) – The name to be assigned to both the registered application and deployed model.
  • version (str) – The version to assign this model. Versions must be unique on a per-model basis, but may be re-used across different models.
  • input_type (str) – The input_type to be associated with the registered app and deployed model. One of “integers”, “floats”, “doubles”, “bytes”, or “strings”.
  • func (function) – The prediction function. Any state associated with the function will be captured via closure capture and pickled with Cloudpickle.
  • pyspark_model (pyspark.mllib.* or pyspark.ml.pipeline.PipelineModel object) – The PySpark model to save.
  • sc (SparkContext,) – The current SparkContext. This is needed to save the PySpark model.
  • base_image (str, optional) – The base Docker image to build the new model image from. This image should contain all code necessary to run a Clipper model container RPC client.
  • labels (list(str), optional) – A list of strings annotating the model. These are ignored by Clipper and used purely for user annotations.
  • registry (str, optional) – The Docker container registry to push the freshly built model to. Note that if you are running Clipper on Kubernetes, this registry must be accesible to the Kubernetes cluster in order to fetch the container from the registry.
  • num_replicas (int, optional) – The number of replicas of the model to create. The number of replicas for a model can be changed at any time with clipper.ClipperConnection.set_num_replicas().
  • batch_size (int, optional) – The user-defined query batch size for the model. Replicas of the model will attempt to process at most batch_size queries simultaneously. They may process smaller batches if batch_size queries are not immediately available. If the default value of -1 is used, Clipper will adaptively calculate the batch size for individual replicas of this model.
  • pkgs_to_install (list (of strings), optional) – A list of the names of packages to install, using pip, in the container. The names must be strings.

Example

Define a pre-processing function shift() to normalize prediction inputs:

from clipper_admin import ClipperConnection, DockerContainerManager
from clipper_admin.deployers.pyspark import deploy_pyspark_model
from pyspark.mllib.classification import LogisticRegressionWithSGD
from pyspark.sql import SparkSession
import numpy as np

spark = SparkSession.builder.appName("example").getOrCreate()

sc = spark.sparkContext

clipper_conn = ClipperConnection(DockerContainerManager())

# Connect to an already-running Clipper cluster
clipper_conn.connect()

# Loading a training dataset omitted...
model = LogisticRegressionWithSGD.train(trainRDD, iterations=10)

def shift(x):
    return x - np.mean(x)

# Note that this function accesses the trained PySpark model via an explicit
# argument, but other state can be captured via closure capture if necessary.
def predict(spark, model, inputs):
    return [str(model.predict(shift(x))) for x in inputs]

deploy_pyspark_model(
    clipper_conn,
    name="example",
    input_type="doubles",
    func=predict,
    pyspark_model=model,
    sc=sc)
clipper_admin.deployers.pyspark.create_endpoint(clipper_conn, name, input_type, func, pyspark_model, sc, default_output='None', version=1, slo_micros=3000000, labels=None, registry=None, base_image='default', num_replicas=1, batch_size=-1, pkgs_to_install=None)[source]

Registers an app and deploys the provided predict function with PySpark model as a Clipper model.

Parameters:
  • clipper_conn (clipper_admin.ClipperConnection()) – A ClipperConnection object connected to a running Clipper cluster.
  • name (str) – The name to be assigned to both the registered application and deployed model.
  • input_type (str) – The input_type to be associated with the registered app and deployed model. One of “integers”, “floats”, “doubles”, “bytes”, or “strings”.
  • func (function) – The prediction function. Any state associated with the function will be captured via closure capture and pickled with Cloudpickle.
  • pyspark_model (pyspark.mllib.* or pyspark.ml.pipeline.PipelineModel object) – The PySpark model to save.
  • sc (SparkContext,) – The current SparkContext. This is needed to save the PySpark model.
  • default_output (str, optional) – The default output for the application. The default output will be returned whenever an application is unable to receive a response from a model within the specified query latency SLO (service level objective). The reason the default output was returned is always provided as part of the prediction response object. Defaults to “None”.
  • version (str, optional) – The version to assign this model. Versions must be unique on a per-model basis, but may be re-used across different models.
  • slo_micros (int, optional) – The query latency objective for the application in microseconds. This is the processing latency between Clipper receiving a request and sending a response. It does not account for network latencies before a request is received or after a response is sent. If Clipper cannot process a query within the latency objective, the default output is returned. Therefore, it is recommended that the SLO not be set aggressively low unless absolutely necessary. 100000 (100ms) is a good starting value, but the optimal latency objective will vary depending on the application.
  • labels (list(str), optional) – A list of strings annotating the model. These are ignored by Clipper and used purely for user annotations.
  • registry (str, optional) – The Docker container registry to push the freshly built model to. Note that if you are running Clipper on Kubernetes, this registry must be accesible to the Kubernetes cluster in order to fetch the container from the registry.
  • base_image (str, optional) – The base Docker image to build the new model image from. This image should contain all code necessary to run a Clipper model container RPC client.
  • num_replicas (int, optional) – The number of replicas of the model to create. The number of replicas for a model can be changed at any time with clipper.ClipperConnection.set_num_replicas().
  • batch_size (int, optional) – The user-defined query batch size for the model. Replicas of the model will attempt to process at most batch_size queries simultaneously. They may process smaller batches if batch_size queries are not immediately available. If the default value of -1 is used, Clipper will adaptively calculate the batch size for individual replicas of this model.
  • pkgs_to_install (list (of strings), optional) – A list of the names of packages to install, using pip, in the container. The names must be strings.

PyTorch Models

Similar to the PySpark deployer, the PyTorch deployer provides a small extension to the Python closure deployer to allow you to deploy Python functions that include PyTorch models.

For PyTorch, Clipper will serialize the model using torch.save and it will be loaded using torch.load. It is expected the model has a forward method and can be called using model(input) to predict output.

clipper_admin.deployers.pytorch.deploy_pytorch_model(clipper_conn, name, version, input_type, func, pytorch_model, base_image='default', labels=None, registry=None, num_replicas=1, batch_size=-1, pkgs_to_install=None)[source]

Deploy a Python function with a PyTorch model.

Parameters:
  • clipper_conn (clipper_admin.ClipperConnection()) – A ClipperConnection object connected to a running Clipper cluster.
  • name (str) – The name to be assigned to both the registered application and deployed model.
  • version (str) – The version to assign this model. Versions must be unique on a per-model basis, but may be re-used across different models.
  • input_type (str) – The input_type to be associated with the registered app and deployed model. One of “integers”, “floats”, “doubles”, “bytes”, or “strings”.
  • func (function) – The prediction function. Any state associated with the function will be captured via closure capture and pickled with Cloudpickle.
  • pytorch_model (pytorch model object) – The Pytorch model to save.
  • base_image (str, optional) – The base Docker image to build the new model image from. This image should contain all code necessary to run a Clipper model container RPC client.
  • labels (list(str), optional) – A list of strings annotating the model. These are ignored by Clipper and used purely for user annotations.
  • registry (str, optional) – The Docker container registry to push the freshly built model to. Note that if you are running Clipper on Kubernetes, this registry must be accesible to the Kubernetes cluster in order to fetch the container from the registry.
  • num_replicas (int, optional) – The number of replicas of the model to create. The number of replicas for a model can be changed at any time with clipper.ClipperConnection.set_num_replicas().
  • batch_size (int, optional) – The user-defined query batch size for the model. Replicas of the model will attempt to process at most batch_size queries simultaneously. They may process smaller batches if batch_size queries are not immediately available. If the default value of -1 is used, Clipper will adaptively calculate the batch size for individual replicas of this model.
  • pkgs_to_install (list (of strings), optional) – A list of the names of packages to install, using pip, in the container. The names must be strings.

Example

Define a pytorch nn module and save the model:

from clipper_admin import ClipperConnection, DockerContainerManager
from clipper_admin.deployers.pytorch import deploy_pytorch_model
from torch import nn

clipper_conn = ClipperConnection(DockerContainerManager())

# Connect to an already-running Clipper cluster
clipper_conn.connect()
model = nn.Linear(1, 1)

# Define a shift function to normalize prediction inputs
def predict(model, inputs):
    pred = model(shift(inputs))
    pred = pred.data.numpy()
    return [str(x) for x in pred]

deploy_pytorch_model(
    clipper_conn,
    name="example",
    version=1,
    input_type="doubles",
    func=predict,
    pytorch_model=model)
clipper_admin.deployers.pytorch.create_endpoint(clipper_conn, name, input_type, func, pytorch_model, default_output='None', version=1, slo_micros=3000000, labels=None, registry=None, base_image='default', num_replicas=1, batch_size=-1, pkgs_to_install=None)[source]

Registers an app and deploys the provided predict function with PyTorch model as a Clipper model.

Parameters:
  • clipper_conn (clipper_admin.ClipperConnection()) – A ClipperConnection object connected to a running Clipper cluster.
  • name (str) – The name to be assigned to both the registered application and deployed model.
  • input_type (str) – The input_type to be associated with the registered app and deployed model. One of “integers”, “floats”, “doubles”, “bytes”, or “strings”.
  • func (function) – The prediction function. Any state associated with the function will be captured via closure capture and pickled with Cloudpickle.
  • pytorch_model (pytorch model object) – The PyTorch model to save.
  • default_output (str, optional) – The default output for the application. The default output will be returned whenever an application is unable to receive a response from a model within the specified query latency SLO (service level objective). The reason the default output was returned is always provided as part of the prediction response object. Defaults to “None”.
  • version (str, optional) – The version to assign this model. Versions must be unique on a per-model basis, but may be re-used across different models.
  • slo_micros (int, optional) – The query latency objective for the application in microseconds. This is the processing latency between Clipper receiving a request and sending a response. It does not account for network latencies before a request is received or after a response is sent. If Clipper cannot process a query within the latency objective, the default output is returned. Therefore, it is recommended that the SLO not be set aggressively low unless absolutely necessary. 100000 (100ms) is a good starting value, but the optimal latency objective will vary depending on the application.
  • labels (list(str), optional) – A list of strings annotating the model. These are ignored by Clipper and used purely for user annotations.
  • registry (str, optional) – The Docker container registry to push the freshly built model to. Note that if you are running Clipper on Kubernetes, this registry must be accesible to the Kubernetes cluster in order to fetch the container from the registry.
  • base_image (str, optional) – The base Docker image to build the new model image from. This image should contain all code necessary to run a Clipper model container RPC client.
  • num_replicas (int, optional) – The number of replicas of the model to create. The number of replicas for a model can be changed at any time with clipper.ClipperConnection.set_num_replicas().
  • batch_size (int, optional) – The user-defined query batch size for the model. Replicas of the model will attempt to process at most batch_size queries simultaneously. They may process smaller batches if batch_size queries are not immediately available. If the default value of -1 is used, Clipper will adaptively calculate the batch size for individual replicas of this model.
  • pkgs_to_install (list (of strings), optional) – A list of the names of packages to install, using pip, in the container. The names must be strings.

Tensorflow Models

Similar to the PySpark deployer, the TensorFlow deployer provides a small extension to the Python closure deployer to allow you to deploy Python functions that include TensorFlow models.

For Tensorflow, Clipper will save the Tensorflow Session.

clipper_admin.deployers.tensorflow.deploy_tensorflow_model(clipper_conn, name, version, input_type, func, tf_sess_or_saved_model_path, base_image='default', labels=None, registry=None, num_replicas=1, batch_size=-1, pkgs_to_install=None)[source]

Deploy a Python prediction function with a Tensorflow session or saved Tensorflow model.

Parameters:
  • clipper_conn (clipper_admin.ClipperConnection()) – A ClipperConnection object connected to a running Clipper cluster.
  • name (str) – The name to be assigned to both the registered application and deployed model.
  • version (str) – The version to assign this model. Versions must be unique on a per-model basis, but may be re-used across different models.
  • input_type (str) – The input_type to be associated with the registered app and deployed model. One of “integers”, “floats”, “doubles”, “bytes”, or “strings”.
  • func (function) – The prediction function. Any state associated with the function will be captured via closure capture and pickled with Cloudpickle.
  • tf_sess (tensorflow.python.client.session.Session) – The tensor flow session to save or path to an existing saved model.
  • base_image (str, optional) – The base Docker image to build the new model image from. This image should contain all code necessary to run a Clipper model container RPC client.
  • labels (list(str), optional) – A list of strings annotating the model. These are ignored by Clipper and used purely for user annotations.
  • registry (str, optional) – The Docker container registry to push the freshly built model to. Note that if you are running Clipper on Kubernetes, this registry must be accesible to the Kubernetes cluster in order to fetch the container from the registry.
  • num_replicas (int, optional) – The number of replicas of the model to create. The number of replicas for a model can be changed at any time with clipper.ClipperConnection.set_num_replicas().
  • batch_size (int, optional) – The user-defined query batch size for the model. Replicas of the model will attempt to process at most batch_size queries simultaneously. They may process smaller batches if batch_size queries are not immediately available. If the default value of -1 is used, Clipper will adaptively calculate the batch size for individual replicas of this model.
  • pkgs_to_install (list (of strings), optional) – A list of the names of packages to install, using pip, in the container. The names must be strings.

Example

Save and deploy a tensorflow session:

from clipper_admin import ClipperConnection, DockerContainerManager
from clipper_admin.deployers.tensorflow import deploy_tensorflow_model

clipper_conn = ClipperConnection(DockerContainerManager())

# Connect to an already-running Clipper cluster
clipper_conn.connect()

def predict(sess, inputs):
    preds = sess.run('predict_class:0', feed_dict={'pixels:0': inputs})
    return [str(p) for p in preds]

deploy_tensorflow_model(
    clipper_conn,
    model_name,
    version,
    input_type,
    predict_fn,
    sess)
clipper_admin.deployers.tensorflow.create_endpoint(clipper_conn, name, input_type, func, tf_sess_or_saved_model_path, default_output='None', version=1, slo_micros=3000000, labels=None, registry=None, base_image='default', num_replicas=1, batch_size=-1, pkgs_to_install=None)[source]

Registers an app and deploys the provided predict function with TensorFlow model as a Clipper model.

Parameters:
  • clipper_conn (clipper_admin.ClipperConnection()) – A ClipperConnection object connected to a running Clipper cluster.
  • name (str) – The name to be assigned to both the registered application and deployed model.
  • input_type (str) – The input_type to be associated with the registered app and deployed model. One of “integers”, “floats”, “doubles”, “bytes”, or “strings”.
  • func (function) – The prediction function. Any state associated with the function will be captured via closure capture and pickled with Cloudpickle.
  • tf_sess (tensorflow.python.client.session.Session) – The Tensorflow Session to save or path to an existing saved model.
  • default_output (str, optional) – The default output for the application. The default output will be returned whenever an application is unable to receive a response from a model within the specified query latency SLO (service level objective). The reason the default output was returned is always provided as part of the prediction response object. Defaults to “None”.
  • version (str, optional) – The version to assign this model. Versions must be unique on a per-model basis, but may be re-used across different models.
  • slo_micros (int, optional) – The query latency objective for the application in microseconds. This is the processing latency between Clipper receiving a request and sending a response. It does not account for network latencies before a request is received or after a response is sent. If Clipper cannot process a query within the latency objective, the default output is returned. Therefore, it is recommended that the SLO not be set aggressively low unless absolutely necessary. 100000 (100ms) is a good starting value, but the optimal latency objective will vary depending on the application.
  • labels (list(str), optional) – A list of strings annotating the model. These are ignored by Clipper and used purely for user annotations.
  • registry (str, optional) – The Docker container registry to push the freshly built model to. Note that if you are running Clipper on Kubernetes, this registry must be accesible to the Kubernetes cluster in order to fetch the container from the registry.
  • base_image (str, optional) – The base Docker image to build the new model image from. This image should contain all code necessary to run a Clipper model container RPC client.
  • num_replicas (int, optional) – The number of replicas of the model to create. The number of replicas for a model can be changed at any time with clipper.ClipperConnection.set_num_replicas().
  • batch_size (int, optional) – The user-defined query batch size for the model. Replicas of the model will attempt to process at most batch_size queries simultaneously. They may process smaller batches if batch_size queries are not immediately available. If the default value of -1 is used, Clipper will adaptively calculate the batch size for individual replicas of this model.
  • pkgs_to_install (list (of strings), optional) – A list of the names of packages to install, using pip, in the container. The names must be strings.

MXNet Models

Similar to PySpark deployer, the MXNet deployer provides a small extension to the Python closure deployer to allow you to deploy Python functions that include MXNet models.

For MXNet, Clipper will serialize the model using mxnet_model.save_checkpoint(..., epoch=0).

clipper_admin.deployers.mxnet.deploy_mxnet_model(clipper_conn, name, version, input_type, func, mxnet_model, mxnet_data_shapes, base_image='default', labels=None, registry=None, num_replicas=1, batch_size=-1, pkgs_to_install=None)[source]

Deploy a Python function with a MXNet model.

Parameters:
  • clipper_conn (clipper_admin.ClipperConnection()) – A ClipperConnection object connected to a running Clipper cluster.
  • name (str) – The name to be assigned to both the registered application and deployed model.
  • version (str) – The version to assign this model. Versions must be unique on a per-model basis, but may be re-used across different models.
  • input_type (str) – The input_type to be associated with the registered app and deployed model. One of “integers”, “floats”, “doubles”, “bytes”, or “strings”.
  • func (function) – The prediction function. Any state associated with the function will be captured via closure capture and pickled with Cloudpickle.
  • mxnet_model (mxnet model object) – The MXNet model to save.
  • mxnet_data_shapes (list of DataDesc objects) – List of DataDesc objects representing the name, shape, type and layout information of data used for model prediction. Required because loading serialized MXNet models involves binding, which requires the shape of the data used to train the model. https://mxnet.incubator.apache.org/api/python/module.html#mxnet.module.BaseModule.bind
  • base_image (str, optional) – The base Docker image to build the new model image from. This image should contain all code necessary to run a Clipper model container RPC client.
  • labels (list(str), optional) – A list of strings annotating the model. These are ignored by Clipper and used purely for user annotations.
  • registry (str, optional) – The Docker container registry to push the freshly built model to. Note that if you are running Clipper on Kubernetes, this registry must be accesible to the Kubernetes cluster in order to fetch the container from the registry.
  • num_replicas (int, optional) – The number of replicas of the model to create. The number of replicas for a model can be changed at any time with clipper.ClipperConnection.set_num_replicas().
  • batch_size (int, optional) – The user-defined query batch size for the model. Replicas of the model will attempt to process at most batch_size queries simultaneously. They may process smaller batches if batch_size queries are not immediately available. If the default value of -1 is used, Clipper will adaptively calculate the batch size for individual replicas of this model.
  • pkgs_to_install (list (of strings), optional) – A list of the names of packages to install, using pip, in the container. The names must be strings.

Note

Regarding mxnet_data_shapes parameter: Clipper may provide the model with variable size input batches. Because MXNet can’t handle variable size input batches, we recommend setting batch size for input data to 1, or dynamically reshaping the model with every prediction based on the current input batch size. More information regarding a DataDesc object can be found here: https://mxnet.incubator.apache.org/versions/0.11.0/api/python/io.html#mxnet.io.DataDesc

Example

Create a MXNet model and then deploy it:

from clipper_admin import ClipperConnection, DockerContainerManager
from clipper_admin.deployers.mxnet import deploy_mxnet_model
import mxnet as mx

clipper_conn = ClipperConnection(DockerContainerManager())

# Connect to an already-running Clipper cluster
clipper_conn.connect()

# Create a MXNet model
# Configure a two layer neuralnetwork
data = mx.symbol.Variable('data')
fc1 = mx.symbol.FullyConnected(data, name='fc1', num_hidden=128)
act1 = mx.symbol.Activation(fc1, name='relu1', act_type='relu')
fc2 = mx.symbol.FullyConnected(act1, name='fc2', num_hidden=10)
softmax = mx.symbol.SoftmaxOutput(fc2, name='softmax')

# Load some training data
data_iter = mx.io.CSVIter(
    data_csv="/path/to/train_data.csv", data_shape=(785, ), batch_size=1)

# Initialize the module and fit it
mxnet_model = mx.mod.Module(softmax)
mxnet_model.fit(data_iter, num_epoch=1)

data_shape = data_iter.provide_data

deploy_mxnet_model(
    clipper_conn,
    name="example",
    version = 1,
    input_type="doubles",
    func=predict,
    mxnet_model=model,
    mxnet_data_shapes=data_shape)
clipper_admin.deployers.mxnet.create_endpoint(clipper_conn, name, input_type, func, mxnet_model, mxnet_data_shapes, default_output='None', version=1, slo_micros=3000000, labels=None, registry=None, base_image='default', num_replicas=1, batch_size=-1, pkgs_to_install=None)[source]

Registers an app and deploys the provided predict function with MXNet model as a Clipper model.

Parameters:
  • clipper_conn (clipper_admin.ClipperConnection()) – A ClipperConnection object connected to a running Clipper cluster.
  • name (str) – The name to be assigned to both the registered application and deployed model.
  • input_type (str) – The input_type to be associated with the registered app and deployed model. One of “integers”, “floats”, “doubles”, “bytes”, or “strings”.
  • func (function) – The prediction function. Any state associated with the function will be captured via closure capture and pickled with Cloudpickle.
  • mxnet_model (mxnet model object) – The MXNet model to save. the shape of the data used to train the model.
  • mxnet_data_shapes (list of DataDesc objects) – List of DataDesc objects representing the name, shape, type and layout information of data used for model prediction. Required because loading serialized MXNet models involves binding, which requires https://mxnet.incubator.apache.org/api/python/module.html#mxnet.module.BaseModule.bind
  • default_output (str, optional) – The default output for the application. The default output will be returned whenever an application is unable to receive a response from a model within the specified query latency SLO (service level objective). The reason the default output was returned is always provided as part of the prediction response object. Defaults to “None”.
  • version (str, optional) – The version to assign this model. Versions must be unique on a per-model basis, but may be re-used across different models.
  • slo_micros (int, optional) – The query latency objective for the application in microseconds. This is the processing latency between Clipper receiving a request and sending a response. It does not account for network latencies before a request is received or after a response is sent. If Clipper cannot process a query within the latency objective, the default output is returned. Therefore, it is recommended that the SLO not be set aggressively low unless absolutely necessary. 100000 (100ms) is a good starting value, but the optimal latency objective will vary depending on the application.
  • labels (list(str), optional) – A list of strings annotating the model. These are ignored by Clipper and used purely for user annotations.
  • registry (str, optional) – The Docker container registry to push the freshly built model to. Note that if you are running Clipper on Kubernetes, this registry must be accesible to the Kubernetes cluster in order to fetch the container from the registry.
  • base_image (str, optional) – The base Docker image to build the new model image from. This image should contain all code necessary to run a Clipper model container RPC client.
  • num_replicas (int, optional) – The number of replicas of the model to create. The number of replicas for a model can be changed at any time with clipper.ClipperConnection.set_num_replicas().
  • batch_size (int, optional) – The user-defined query batch size for the model. Replicas of the model will attempt to process at most batch_size queries simultaneously. They may process smaller batches if batch_size queries are not immediately available. If the default value of -1 is used, Clipper will adaptively calculate the batch size for individual replicas of this model.
  • pkgs_to_install (list (of strings), optional) – A list of the names of packages to install, using pip, in the container. The names must be strings.

Note

Regarding mxnet_data_shapes parameter:
Clipper may provide the model with variable size input batches. Because MXNet can’t handle variable size input batches, we recommend setting batch size for input data to 1, or dynamically reshaping the model with every prediction based on the current input batch size. More information regarding a DataDesc object can be found here: https://mxnet.incubator.apache.org/versions/0.11.0/api/python/io.html#mxnet.io.DataDesc

Keras Models

Similar to PySpark deployer, the Keras deployer provides a small extension to the Python closure deployer to allow you to deploy Python functions that include Keras models.

For Keras, Clipper will serialize the model using keras_model.save(keras_model.h5).

Create Your Own Container

If none of the provided model deployers will meet your needs, you will need to create your own model container.

Tutorial on building your own model container