Train and host custom-built Scikit-Learn model container in Amazon SageMaker - 1CloudHub: Digital Transformation – Advisory | Solutions

Amazon SageMaker is a machine learning (ML) workflow service for developing, training, and deploying models, lowering the cost of building solutions, and increasing the productivity of data.

Sagemaker uses docker containers for training and deploying machine learning algorithms to provide a consistent experience by packaging all the code and run time libraries needed by the algorithm within the container.

There are three methods of training and inferencing using Amazon SageMaker.

Prebuilt SageMaker Docker images

SageMaker comes with a few common machine learning frameworks packaged in a container. We can use these images on SageMaker notebook instance or SageMaker Studio.

Modifying existing Docker Container and deploy in SageMaker

We can modify an existing Docker image to be compatible with SageMaker. This method can be used if the features or requirements are not currently supported by a prebuilt SageMaker image.

Create a container with own algorithms and models

If none of the existing SageMaker containers meet the needs and do not have an existing container of our own, we may need to create a new Docker container with our training and inference algorithms for use with SageMaker.

Amazon SageMaker comes with many predefined algorithms, but there might be requirements to use custom algorithms for hosting and inferencing using a managed solution. In Amazon SageMaker, we can provide custom Container images in ECR (Elastic Container Registry) for the training code and inference code, or you can combine them into a single Docker image. In our use case, we wanted to build a single image to support both training and hosting in Amazon SageMaker.

In this blog, we will create our own container and import our custom Scikit-Learn model onto the container and host, train, and inference in Amazon SageMaker.

Creating custom container

Amazon Sagemaker expects both training file (should be named ‘train’) and serving file (should be named ‘serve’) scripts along with the configuration files under the ‘/opt/program’ directory. The training script can be written in any language capable of running inside a docker container, our preferred language of choice was python. For inference, we used Nginx and Flask to create a RESTful microservice to serve HTTP requests for inference. Below is the directory structure of our container.

In the Docker file, we can specify the requirements and dependencies to be installed in the container, such as Python and Scikit. The “$PATH” must be updated to ‘/opt/program’ along with the “WORKDIR” so that the train and serve programs are found when the container is invoked.

Pushing Docker container to ECR

The necessary program scripts and configuration files, once copied into the container with appropriate file permissions, can be pushed to ECR as shown below, which will then be used by Amazon Sagemaker.

Model Training job

A training job in Amazon sagemaker is created pointing to the custom algorithm inside the container pushed in ECR as shown below. The training dataset stored in S3, with the creation of the configuration of the model, the S3 path with the training dataset is selected.

The “train” script basically processes the data in the path ‘/opt/ml/input/data/train/’. This path is created by Amazon Sagemaker Training Jobs from the training dataset in the provided S3 path and produces a model artifact (pickle file) in the path ‘/opt/ml/model/’.

Model Export as tar file

This model is exported to S3 on the provided path in the Training job as compressed tar file.

Model Hosting with endpoint

To host the trained model, a model is created in sagemaker along with the endpoint config and hosting the same in an endpoint. Below are the steps for the model creation and hosting in an endpoint in sagemaker.

Model Creation

A model is then created in Amazon Sagemaker pointing to the exported compressed tar file.

Endpoint Configuration

An endpoint configuration is created with the newly created model with specifications on instance type, number, etc.

Endpoint

The same endpoint configuration created above is used to create an endpoint, which once moved to “Inservice” status, can be used for inferencing.

Inference

The testing dataset in “csv” format with all the attributes can be provided as input to the Amazon Sagemaker endpoint. This endpoint can be invoked with Sagmaker SDK, we used python boto3 SDKs invoke endpoint function as shown below.