Tag: AWS Inferentia

AWS and Hugging Face collaborate to make generative AI more accessible and cost efficient

AIFebruary 21, 2023

We’re thrilled to announce an expanded collaboration between AWS and Hugging Face to accelerate the training, fine-tuning, and deployment of large language and vision...

Scaling distributed training with AWS Trainium and Amazon EKS

AIFebruary 1, 2023

Recent developments in deep learning have led to increasingly large models such as GPT-3, BLOOM, and OPT, some of which are already in excess...

Who Owns the Generative AI Platform?

BlockchainJanuary 19, 2023

We’re starting to see the very early stages of a tech stack emerge in generative artificial intelligence (AI). Hundreds of new startups are rushing...

Exafunction supports AWS Inferentia to unlock best price performance for machine learning inference

AIDecember 8, 2022

Across all industries, machine learning (ML) models are getting deeper, workflows are getting more complex, and workloads are operating at larger scales. Significant effort...

ByteDance saves up to 60% on inference costs while reducing latency and increasing throughput using AWS Inferentia

AINovember 22, 2022

This is a guest blog post co-written with Minghui Yu and Jianzhe Xiao from Bytedance. ByteDance is a technology company that operates a range...

Brain tumor segmentation at scale using AWS Inferentia

AINovember 9, 2022

Medical imaging is an important tool for the diagnosis and localization of disease. Over the past decade, collections of medical images have grown rapidly,...

How Amazon Search reduced ML inference costs by 85% with AWS Inferentia

AISeptember 22, 2022

Amazon’s product search engine indexes billions of products, serves hundreds of millions of customers worldwide, and is one of the most heavily used services...

How InfoJobs (Adevinta) improves NLP model prediction performance with AWS Inferentia and Amazon SageMaker

AIJune 7, 2022

This is a guest post co-written by Juan Francisco Fernandez, ML Engineer in Adevinta Spain, and AWS AI/ML Specialist Solutions Architects Antonio Rodriguez and...

How Amazon Search achieves low-latency, high-throughput T5 inference with NVIDIA Triton on AWS

AIMarch 22, 2022

Amazon Search’s vision is to enable customers to search effortlessly. Our spelling correction helps you find what you want even if you don’t know the exact spelling of the intended words. In the past, we used classical machine learning (ML) algorithms with manual feature engineering for spelling correction. To make the next generational leap in […]

1 2 34Page 4 of 4

Latest Intelligence

Scale foundation model inference to hundreds of models with Amazon SageMaker – Part 1 | Amazon Web Services

AI November 30, 2023

Reduce model deployment costs by 50% on average using the latest features of Amazon SageMaker | Amazon Web Services

AI November 30, 2023

Minimize real-time inference latency by using Amazon SageMaker routing strategies | Amazon Web Services

AI November 30, 2023

How Amazon Search M5 saved 30% for LLM training cost by using AWS Trainium | Amazon Web Services

AI November 22, 2023

Intuitivo achieves higher throughput while saving on AI/ML costs using AWS Inferentia and PyTorch | Amazon Web Services

AI October 26, 2023

Retrieval-Augmented Generation & RAG Workflows

AI October 24, 2023

Optimize generative AI workloads for environmental sustainability | Amazon Web Services

AI September 21, 2023

Intuitivo achieves higher throughput while saving on AI/ML costs using AWS Inferentia and PyTorch | Amazon Web Services

AI October 26, 2023

Retrieval-Augmented Generation & RAG Workflows

AI October 24, 2023

Optimize generative AI workloads for environmental sustainability | Amazon Web Services

AI September 21, 2023

Generative Data Intelligence

Tag: AWS Inferentia

Latest Intelligence

Scale foundation model inference to hundreds of models with Amazon SageMaker – Part 1 | Amazon Web Services

Reduce model deployment costs by 50% on average using the latest features of Amazon SageMaker | Amazon Web Services

Minimize real-time inference latency by using Amazon SageMaker routing strategies | Amazon Web Services

How Amazon Search M5 saved 30% for LLM training cost by using AWS Trainium | Amazon Web Services

Intuitivo achieves higher throughput while saving on AI/ML costs using AWS Inferentia and PyTorch | Amazon Web Services

Retrieval-Augmented Generation & RAG Workflows

Optimize generative AI workloads for environmental sustainability | Amazon Web Services

Intuitivo achieves higher throughput while saving on AI/ML costs using AWS Inferentia and PyTorch | Amazon Web Services

Retrieval-Augmented Generation & RAG Workflows

Optimize generative AI workloads for environmental sustainability | Amazon Web Services

Chat with us