Today, we are excited to announce that Meta Llama 3 foundation models are available through I-Amazon SageMaker JumpStart to deploy and run inference. The Llama 3 models are a collection of pre-trained and fine-tuned generative text models.
In this post, we walk through how to discover and deploy Llama 3 models via SageMaker JumpStart.
What is Meta Llama 3
Llama 3 comes in two parameter sizes — 8B and 70B with 8k context length — that can support a broad range of use cases with improvements in reasoning, code generation, and instruction following. Llama 3 uses a decoder-only transformer architecture and new tokenizer that provides improved model performance with 128k size. In addition, Meta improved post-training procedures that substantially reduced false refusal rates, improved alignment, and increased diversity in model responses. You can now derive the combined advantages of Llama 3 performance and MLOps controls with Amazon SageMaker features such as SageMaker Pipelines, SageMaker Debugger, or container logs. In addition, the model will be deployed in an AWS secure environment under your VPC controls, helping provide data security.
Iyini i-SageMaker JumpStart
With SageMaker JumpStart, you can choose from a broad selection of publicly available foundation models. ML practitioners can deploy foundation models to dedicated SageMaker instances from a network isolated environment and customize models using SageMaker for model training and deployment. You can now discover and deploy Llama 3 models with a few clicks in I-Amazon SageMaker Studio noma ngokohlelo nge-SageMaker Python SDK, ekuvumela ukuthi uthole ukusebenza kwemodeli nezilawuli ze-MLOps ngezici ze-SageMaker ezifana I-SageMaker Pipelines, SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your VPC controls, helping provide data security. Llama 3 models are available today for deployment and inferencing in Amazon SageMaker Studio in us-east-1
(N. Virginia), us-east-2
(Ohio), us-west-2
(Oregon), eu-west-1
(Ireland) and ap-northeast-1
(Tokyo) AWS Regions.
Zitholele amamodeli
Ungafinyelela amamodeli ayisisekelo nge-SageMaker JumpStart ku-SageMaker Studio UI kanye ne-SageMaker Python SDK. Kulesi sigaba, sibheka ukuthi ungawathola kanjani amamodeli ku-SageMaker Studio.
I-SageMaker Studio iyindawo yokuthuthukisa edidiyelwe (IDE) enikeza isixhumi esibonakalayo esisodwa esisekelwe kuwebhu lapho ungafinyelela khona amathuluzi akhelwe inhloso ukwenza zonke izinyathelo zokuthuthukisa i-ML, kusukela ekulungiseleleni idatha kuye ekwakheni, ekuqeqesheni, nasekukhipheni amamodeli akho e-ML. Ukuze uthole imininingwane eyengeziwe yokuthi ungaqala kanjani futhi usethe i-SageMaker Studio, bheka I-Amazon SageMaker Studio.
Ku-SageMaker Studio, ungafinyelela ku-SageMaker JumpStart, equkethe amamodeli aqeqeshwe ngaphambilini, amabhuku okufunda, nezisombululo ezakhelwe ngaphambilini, ngaphansi. Izixazululo ezakhelwe ngaphambili nezizenzekelayo.
From the SageMaker JumpStart landing page, you can easily discover various models by browsing through different hubs which are named after model providers. You can find Llama 3 models in Meta hub. If you do not see Llama 3 models, please update your SageMaker Studio version by shutting down and restarting. For more information, refer to Shut down and Update Studio Classic Apps.
You can find Llama 3 models by searching for “Meta-llama-3“ from the search box located at top left.
You can discover all Meta models available in SageMaker JumpStart by clicking on Meta hub.
Clicking on a model card opens the corresponding model detail page, from which you can easily Deploy the model.
Hambisa imodeli
Lapho ukhetha Sebenzisa and acknowledge the EULA terms, deployment will start.
You can monitor progress of the deployment on the page that shows up after clicking the Deploy button.
Ngenye indlela, ungakhetha Vula incwadi yokubhalela to deploy through the example notebook. The example notebook provides end-to-end guidance on how to deploy the model for inference and clean up resources.
To deploy using the notebook, you start by selecting an appropriate model, specified by the model_id
. You can deploy any of the selected models on SageMaker with the following code.
Ngephutha accept_eula
isethelwe False
. You need to manually accept the EULA to deploy the endpoint successfully, By doing so, you accept the user license agreement and acceptable use policy. You can also find the license agreement Llama website. This deploys the model on SageMaker with default configurations including the default instance type and default VPC configurations. You can change these configuration by specifying non-default values in JumpStartModel
. To learn more, please refer to the following imibhalo.
The following table lists all the Llama 3 models available in SageMaker JumpStart along with the model_ids
, default instance types and maximum number of total tokens (sum of the number of input tokens and number of generated tokens) supported for each of these models.
Model Igama | Imodeli ye-ID | Amathokheni Aphelele Aphelele | Default instance type |
Meta-Llama-3-8B | meta-textgeneration-llama-3-8B | 8192 | ml.g5.12xlarge |
Meta-Llama-3-8B-Instruct | meta-textgeneration-llama-3-8B-instruct | 8192 | ml.g5.12xlarge |
Meta-Llama-3-70B | i-meta-textgeneration-llama-3-70b | 8192 | ml.p4d.24xlarge |
Meta-Llama-3-70B-Instruct | meta-textgeneration-llama-3-70b-instruct | 8192 | ml.p4d.24xlarge |
Qalisa i-inference
After you deploy the model, you can run inference against the deployed endpoint through SageMaker predictor. Fine-tuned instruct models (Llama 3: 8B Instruct and 70B Instruct) accept a history of chats between the user and the chat assistant, and generate the subsequent chat. The pre-trained models (Llama 3: 8B and 70B) require a string prompt and perform text completion on the provided prompt.
Inference parameters control the text generation process at the endpoint. The Max new tokens control the size of the output generated by the model. This is not same as the number of words because the vocabulary of the model is not the same as the English language vocabulary, and each token may not be an English language word. The temperature parameter controls the randomness in the output. Higher temperature results in more creative and hallucinated outputs. All the inference parameters are optional.
Example prompts for the 70B model
You can use Llama 3 models for text completion for any piece of text. Through text generation, you can perform a variety of tasks such as question answering, language translation, and sentiment analysis, and more. The input payload to the endpoint looks like the following code:
Okulandelayo ezinye zezibonelo zezibonelo nombhalo okhiqizwe imodeli. Yonke imiphumela ikhiqizwa ngamapharamitha we-inference {"max_new_tokens":64, "top_p":0.9, "temperature":0.6}
.
In the next example, we show how to use Llama 3 models with few shot in-context learning where we provide training samples available to the model. We only run inference on the deployed model and during this process, and model weights do not change.
Example prompts for the 70B-Instruct model
With Llama 3 instruct models which are optimized for dialogue use cases, the input to the instruct model endpoints is the previous history between the chat assistant and the user. You can ask questions contextual to the conversation that has happened so far. You can also provide the system configuration, such as personas, which define the chat assistant’s behavior. While the input payload format is the same as the base pre-trained model, the input text should be formatted in the following manner:
In this instruction template, you can optionally start with a system
role and include as many alternating roles as desired in the turn-based history. The final role should always be assistant
and end with two new line feeds.
Next, consider a few example prompts and responses from the model. In the following example, the user is asking a simple question to the assistant.
In the following example, the user has a conversation with the assistant about tourist sites in Paris. Then the user inquires about the first option recommended by the chat assistant.
In the following examples, we set the system’s configuration.
Hlanza
Ngemuva kokuthi usuqedile ukusebenzisa incwadi yokubhalela, qiniseka ukuthi ususa zonke izinsiza ozidalile ohlelweni ukuze ukukhokha kwakho kumiswe. Sebenzisa ikhodi elandelayo:
Isiphetho
In this post, we showed you how to get started with Llama 3 models in SageMaker Studio. You now have access to four Llama 3 foundation models that contain billions of parameters. Because foundation models are pretrained, they can also help lower training and infrastructure costs and enable customization for your use case. Check out I-SageMaker JumpStart in SageMaker Studio now to get started.
Mayelana Nababhali
Kyle Ulrich is an Applied Scientist II at AWS
Xin Huang is a Senior Applied Scientist at AWS
Qing Lan is a Senior Software Developer Engineer at AWS
Haotian An is a Software Developer Engineer II at AWS
Christopher Whitten is a Software Development Engineer II at AWS
Tyler Osterberg is a Software Development Engineer I at AWS
Manan Shah is a Software Development Manager at AWS
Jonathan Guinegagne is a Senior Software Developer Engineer at AWS
Adriana Simmons is a Senior Product Marketing Manager at AWS
UJuni Won is a Senior Product Manager at AWS
Ashish Khetan is a Senior Applied Scientist at AWS
Rachna Chadha is a Principal Solution Architect – AI/ML at AWS
Deepak Rupakula is a Principal GTM Specialist at AWS
- I-SEO Powered Content & PR Distribution. Khuliswa Namuhla.
- I-PlatoData.Network Vertical Generative Ai. Zinike Amandla. Finyelela Lapha.
- I-PlatoAiStream. I-Web3 Intelligence. Ulwazi Lukhulisiwe. Finyelela Lapha.
- I-PlatoESG. Ikhabhoni, I-CleanTech, Amandla, Environment, Ilanga, Ukuphathwa Kwemfucuza. Finyelela Lapha.
- I-PlatoHealth. I-Biotech kanye ne-Clinical Trials Intelligence. Finyelela Lapha.
- Source: https://aws.amazon.com/blogs/machine-learning/meta-llama-3-models-are-now-available-in-amazon-sagemaker-jumpstart/