AWS Inferentia - Plato Data Intelligence

Scale foundation model inference to hundreds of models with Amazon SageMaker – Part 1 | Amazon Web Services

AINovember 30, 2023

As democratization of foundation models (FMs) becomes more prevalent and demand for AI-augmented services increases, software as a service (SaaS) providers are looking to...

Reduce model deployment costs by 50% on average using the latest features of Amazon SageMaker | Amazon Web Services

AINovember 30, 2023

As organizations deploy models to production, they are constantly looking for ways to optimize the performance of their foundation models (FMs) running on the...

Minimize real-time inference latency by using Amazon SageMaker routing strategies | Amazon Web Services

AINovember 30, 2023

Amazon SageMaker makes it straightforward to deploy machine learning (ML) models for real-time inference and offers a broad selection of ML instances spanning CPUs...

How Amazon Search M5 saved 30% for LLM training cost by using AWS Trainium | Amazon Web Services

AINovember 22, 2023

For decades, Amazon has pioneered and innovated machine learning (ML), bringing delightful experiences to its customers. From the earliest days, Amazon has used ML...

Intuitivo achieves higher throughput while saving on AI/ML costs using AWS Inferentia and PyTorch | Amazon Web Services

AIOctober 26, 2023

This is a guest post by Jose Benitez, Founder and Director of AI and Mattias Ponchon, Head of Infrastructure at Intuitivo. Intuitivo, a pioneer...

Retrieval-Augmented Generation & RAG Workflows

AIOctober 24, 2023

IntroductionRetrieval Augmented Generation, or RAG, is a mechanism that helps large language models (LLMs) like GPT become more useful and knowledgeable by pulling in...

Optimize generative AI workloads for environmental sustainability | Amazon Web Services

AISeptember 21, 2023

The adoption of generative AI is rapidly expanding, reaching an ever-growing number of industries and users worldwide. With the increasing complexity and scale of...

Train and deploy ML models in a multicloud environment using Amazon SageMaker | Amazon Web Services

AISeptember 20, 2023

As customers accelerate their migrations to the cloud and transform their business, some find themselves in situations where they have to manage IT operations...

Machine learning with decentralized training data using federated learning on Amazon SageMaker | Amazon Web Services

AIAugust 22, 2023

Machine learning (ML) is revolutionizing solutions across industries and driving new forms of insights and intelligence from data. Many ML algorithms train over large...

Optimize AWS Inferentia utilization with FastAPI and PyTorch models on Amazon EC2 Inf1 & Inf2 instances | Amazon Web Services

AIJuly 24, 2023

When deploying Deep Learning models at scale, it is crucial to effectively utilize the underlying hardware to maximize performance and cost benefits. For production...

Reduce energy consumption of your machine learning workloads by up to 90% with AWS purpose-built accelerators | Amazon Web Services

AIJune 20, 2023

Machine learning (ML) engineers have traditionally focused on striking a balance between model training and deployment cost vs. performance. Increasingly, sustainability (energy efficiency) is...

AWS Inferentia2 builds on AWS Inferentia1 by delivering 4x higher throughput and 10x lower latency | Amazon Web Services

AIJune 13, 2023

The size of the machine learning (ML) models––large language models (LLMs) and foundation models (FMs)––is growing fast year-over-year, and these models need faster and...

123 4 Page 2 of 4

Generative Data Intelligence

Tag: AWS Inferentia

Latest Intelligence

Scale foundation model inference to hundreds of models with Amazon SageMaker – Part 1 | Amazon Web Services

Reduce model deployment costs by 50% on average using the latest features of Amazon SageMaker | Amazon Web Services

Minimize real-time inference latency by using Amazon SageMaker routing strategies | Amazon Web Services

How Amazon Search M5 saved 30% for LLM training cost by using AWS Trainium | Amazon Web Services

Intuitivo achieves higher throughput while saving on AI/ML costs using AWS Inferentia and PyTorch | Amazon Web Services

Retrieval-Augmented Generation & RAG Workflows

Optimize generative AI workloads for environmental sustainability | Amazon Web Services

Intuitivo achieves higher throughput while saving on AI/ML costs using AWS Inferentia and PyTorch | Amazon Web Services

Retrieval-Augmented Generation & RAG Workflows

Optimize generative AI workloads for environmental sustainability | Amazon Web Services

Chat with us