Hello, I don’t know know if I should ask this question here, any help is appreciated.
I have an ecommerce website with a product recommendation engine built in TensorFlow. The recommendation engine endpoint is hosted by Amazon SageMaker. Three compute-optimized instances support the expected peak load of the website.
Response times on the product recommendation page are increasing at the beginning of each month. Some users are encountering errors. The website receives the majority of its traffic between 8 AM and 6 PM on weekdays in a single time zone.
What could be the most optimum solution to resolve the issue while keeping costs to a minimum?
It seems like you’re encountering performance degradation and errors on your ecommerce website’s product recommendation page, especially during peak traffic hours. To resolve this issue while keeping costs in check, I would recommend considering the following approach:
Create a new endpoint configuration with two production variants.
By creating multiple production variants, you can deploy different versions of your model and split the incoming traffic between them. This can help in testing new models without impacting the entire user base and also provide a way to roll back quickly if issues arise.
blue/green deployment of models
Deploying a second instance pool to support a blue/green deployment of models can also be an effective solution to address the issue while maintaining cost efficiency. A blue/green deployment strategy involves running two separate environments (blue and green) simultaneously, allowing you to deploy and test new versions (green) without affecting the current production version (blue). In the context of Amazon SageMaker, this means deploying a second instance pool with the updated models (green) alongside the existing instance pool (blue).
For more in-depth guidance check out the AWS - Blue/Green Deployments Documentation which provides a comprehensive guide to implementing blue/green deployments on Amazon Web Services. It covers strategies, best practices, and step-by-step instructions to ensure a smooth deployment process. Your scenario relates to the AWS Machine Learning, you should see MLS-C01 practice questions as it will help you develop a better understanding of how to approach and solve similar challenges.