Tag: AWS Inferentia
Talk to your slide deck using multimodal foundation models hosted on Amazon Bedrock – Part 2 | Amazon Web Services
In Part 1 of this series, we presented a solution that used the Amazon Titan Multimodal Embeddings model to convert individual slides from a...
Breaking News
Generative AI roadshow in North America with AWS and Hugging Face | Amazon Web Services
In 2023, AWS announced an expanded collaboration with Hugging Face to accelerate our customers’ generative artificial intelligence (AI) journey. Hugging Face, founded in 2016,...
Gradient makes LLM benchmarking cost-effective and effortless with AWS Inferentia | Amazon Web Services
This is a guest post co-written with Michael Feil at Gradient.
Evaluating the performance of large language...
Best practices to build generative AI applications on AWS | Amazon Web Services
Generative AI applications driven by foundational models (FMs) are enabling organizations with significant business value in customer experience, productivity, process optimization, and innovations. However,...
Run ML inference on unplanned and spiky traffic using Amazon SageMaker multi-model endpoints | Amazon Web Services
Amazon SageMaker multi-model endpoints (MMEs) are a fully managed capability of SageMaker inference that allows you to deploy thousands of models on a single...
Talk to your slide deck using multimodal foundation models hosted on Amazon Bedrock and Amazon SageMaker – Part 1 | Amazon Web Services
With the advent of generative AI, today’s foundation models (FMs), such as the large language models (LLMs) Claude 2 and Llama 2, can perform...
Fine-tune and deploy Llama 2 models cost-effectively in Amazon SageMaker JumpStart with AWS Inferentia and AWS Trainium | Amazon Web Services
Today, we’re excited to announce the availability of Llama 2 inference and fine-tuning support on AWS Trainium and AWS Inferentia instances in Amazon SageMaker...
Fine-tune Llama 2 using QLoRA and Deploy it on Amazon SageMaker with AWS Inferentia2 | Amazon Web Services
In this post, we showcase fine-tuning a Llama 2 model using a Parameter-Efficient Fine-Tuning (PEFT) method and deploy the fine-tuned model on AWS Inferentia2....
Welcome to a New Era of Building in the Cloud with Generative AI on AWS | Amazon Web Services
We believe generative AI has the potential over time to transform virtually every customer experience we know. The number of companies launching generative AI...
Scale foundation model inference to hundreds of models with Amazon SageMaker – Part 1 | Amazon Web Services
As democratization of foundation models (FMs) becomes more prevalent and demand for AI-augmented services increases, software as a service (SaaS) providers are looking to...
Reduce model deployment costs by 50% on average using the latest features of Amazon SageMaker | Amazon Web Services
As organizations deploy models to production, they are constantly looking for ways to optimize the performance of their foundation models (FMs) running on the...
Minimize real-time inference latency by using Amazon SageMaker routing strategies | Amazon Web Services
Amazon SageMaker makes it straightforward to deploy machine learning (ML) models for real-time inference and offers a broad selection of ML instances spanning CPUs...
How Amazon Search M5 saved 30% for LLM training cost by using AWS Trainium | Amazon Web Services
For decades, Amazon has pioneered and innovated machine learning (ML), bringing delightful experiences to its customers. From the earliest days, Amazon has used ML...