Large Language Models with Ray Serve
Prepare your environment for this section:
This will make the following changes to your lab environment:
- Installs Karpenter in the Amazon EKS cluster
- Creates an IAM Role for the Pods to use
You can view the Terraform that applies these changes here.
Mistral 7B, a 7.3B parameter model, is a powerful language model. It represents a significant advancement in language model technology, combining powerful capabilities like text generation and completion, information extraction, data analysis, API interactions and complex reasoning tasks with practical efficiency.
This section will focus on gaining insights into the intricacies of deploying LLMs efficiently on EKS.
For deploying and scaling the model, this lab will utilize AWS Trainium through the Trn1 family. Model inference will utilize the Ray Serve project for building online inference APIs and streamlining the deployment of machine learning models.