I-Generative Data Intelligence

Izisekelo Zolwazi ze-Amazon Bedrock manje zisekela ukuhlunga imethadatha ukuze kuthuthukiswe ukunemba kokubuyisa | Izinsizakalo Zewebhu ze-Amazon

Usuku:

At AWS re:Invent 2023, we announced the general availability of Izisekelo Zolwazi ze-Amazon Bedrock. With Knowledge Bases for Amazon Bedrock, you can securely connect foundation models (FMs) in I-Amazon Bedrock to your company data using a fully managed Retrieval Augmented Generation (RAG) model.

For RAG-based applications, the accuracy of the generated responses from FMs depend on the context provided to the model. Contexts are retrieved from vector stores based on user queries. In the recently released feature for Knowledge Bases for Amazon Bedrock, hybrid search, you can combine semantic search with keyword search. However, in many situations, you may need to retrieve documents created in a defined period or tagged with certain categories. To refine the search results, you can filter based on document metadata to improve retrieval accuracy, which in turn leads to more relevant FM generations aligned with your interests.

In this post, we discuss the new custom metadata filtering feature in Knowledge Bases for Amazon Bedrock, which you can use to improve search results by pre-filtering your retrievals from vector stores.

Metadata filtering overview

Prior to the release of metadata filtering, all semantically relevant chunks up to the pre-set maximum would be returned as context for the FM to use to generate a response. Now, with metadata filters, you can retrieve not only semantically relevant chunks but a well-defined subset of those relevant chucks based on applied metadata filters and associated values.

With this feature, you can now supply a custom metadata file (each up to 10 KB) for each document in the knowledge base. You can apply filters to your retrievals, instructing the vector store to pre-filter based on document metadata and then search for relevant documents. This way, you have control over the retrieved documents, especially if your queries are ambiguous. For example, you can use legal documents with similar terms for different contexts, or movies that have a similar plot released in different years. In addition, by reducing the number of chunks that are being searched over, you achieve performance advantages like a reduction in CPU cycles and cost of querying the vector store, in addition to improvement in accuracy.

To use the metadata filtering feature, you need to provide metadata files alongside the source data files with the same name as the source data file and .metadata.json suffix. Metadata can be string, number, or Boolean. The following is an example of the metadata file content:

{
    "metadataAttributes" : { 
        "tag" : "project EVE",
        "year" :  2016,
        "team": "ninjas"
    }
}

The metadata filtering feature of Knowledge Bases for Amazon Bedrock is available in AWS Regions US East (N. Virginia) and US West (Oregon).

The following are common use cases for metadata filtering:

  • Document chatbot for a software company – This allows users to find product information and troubleshooting guides. Filters on the operating system or application version, for example, can help avoid retrieving obsolete or irrelevant documents.
  • Conversational search of an organization’s application – This allows users to search through documents, kanbans, meeting recording transcripts, and other assets. Using metadata filters on work groups, business units, or project IDs, you can personalize the chat experience and improve collaboration. An example would be, “What is the status of project Sphinx and risks raised,” where users can filter documents for a specific project or source type (such as email or meeting documents).
  • Intelligent search for software developers – This allows developers to look for information of a specific release. Filters on the release version, document type (such as code, API reference, or issue) can help pinpoint relevant documents.

Ukubukwa kwesisombululo

In the following sections, we demonstrate how to prepare a dataset to use as a knowledge base, and then query with metadata filtering. You can query using either the I-AWS Management Console noma i-SDK.

Prepare a dataset for Knowledge Bases for Amazon Bedrock

For this post, we use a isampula yedathasethi about fictional video games to illustrate how to ingest and retrieve metadata using Knowledge Bases for Amazon Bedrock. If you want to follow along in your own AWS account, download the file.

If you want to add metadata to your documents in an existing knowledge base, create the metadata files with the expected filename and schema, then skip to the step to sync your data with the knowledge base to start the incremental ingestion.

In our sample dataset, each game’s document is a separate CSV file (for example, s3://$bucket_name/video_game/$game_id.csv) with the following columns:

title, description, genres, year, publisher, score

Each game’s metadata has the suffix .metadata.json (Ngokwesibonelo, s3://$bucket_name/video_game/$game_id.csv.metadata.json) with the following schema:

{
  "metadataAttributes": {
    "id": number, 
    "genres": string,
    "year": number,
    "publisher": string,
    "score": number
  }
}

Create a knowledge base for Amazon Bedrock

For instructions to create a new knowledge base, see Dala isisekelo solwazi. For this example, we use the following settings:

  • Use Set up data source ikhasi, ngaphansi Isu le-Chunking, khetha Akukho ukugoqa, because you’ve already preprocessed the documents in the previous step.
  • In the Imodeli yokushumeka isigaba, khetha Titan G1 Embeddings – Text.
  • In the Idatha yeVector isigaba, khetha Dala ngokushesha isitolo se-vector esisha. The metadata filtering feature is available for all supported vector stores.

Synchronize the dataset with the knowledge base

After you create the knowledge base, and your data files and metadata files are in an Isevisi ye-Amazon Simple Storage (Amazon S3) bucket, you can start the incremental ingestion. For instructions, see Sync to ingest your data sources into the knowledge base.

Query with metadata filtering on the Amazon Bedrock console

To use the metadata filtering options on the Amazon Bedrock console, complete the following steps:

  1. Ku-console ye-Amazon Bedrock, khetha Izisekelo zolwazi kufasitela lokuhambisa.
  2. Choose the knowledge base you created.
  3. Khetha Isisekelo solwazi lokuhlola.
  4. Khetha Ukulungiselelwa icon, then expand Izihlungi.
  5. Enter a condition using the format: key = value (for example, genres = Strategy) and press Faka.
  6. To change the key, value, or operator, choose the condition.
  7. Continue with the remaining conditions (for example, (genres = Strategy AND year >= 2023) OR (rating >= 9))
  8. When finished, enter your query in the message box, then choose Qalisa.

For this post, we enter the query “A strategy game with cool graphic released after 2023.”

Query with metadata filtering using the SDK

To use the SDK, first create the client for the Ama-ejenti we-Amazon Bedrock isikhathi sokusebenza:

import boto3

bedrock_agent_runtime = boto3.client(
    service_name = "bedrock-agent-runtime"
)

Then construct the filter (the following are some examples):

# genres = Strategy
single_filter= {
    "equals": {
        "key": "genres",
        "value": "Strategy"
    }
}

# genres = Strategy AND year >= 2023
one_group_filter= {
    "andAll": [
        {
            "equals": {
                "key": "genres",
                "value": "Strategy"
            }
        },
        {
            "GreaterThanOrEquals": {
                "key": "year",
                "value": 2023
            }
        }
    ]
}

# (genres = Strategy AND year >=2023) OR score >= 9
two_group_filter = {
    "orAll": [
        {
            "andAll": [
                {
                    "equals": {
                        "key": "genres",
                        "value": "Strategy"
                    }
                },
                {
                    "GreaterThanOrEquals": {
                        "key": "year",
                        "value": 2023
                    }
                }
            ]
        },
        {
            "GreaterThanOrEquals": {
                "key": "score",
                "value": "9"
            }
        }
    ]
}

Pass the filter to retrievalConfiguration we Retrieval API or RetrieveAndGenerate I-API:

retrievalConfiguration={
        "vectorSearchConfiguration": {
            "filter": metadata_filter
        }
    }

The following table lists a few responses with different metadata filtering conditions.

Umbuzo Metadata Filtering Retrieved Documents kokuma
“A strategy game with cool graphic released after 2023” Off

* Viking Saga: The Sea Raider, year:2023, genres: Strategy

* Medieval Castle: Siege and Conquest, year:2022, genres: Strategy
* Fantasy Kingdoms: Chronicles of Eldoria, year:2023, genres: Strategy

* Cybernetic Revolution: Rise of the Machines, year:2022, genres: Strategy
* Steampunk Chronicles: Clockwork Empires, year:2021, genres: City-Building

2/5 games meet the condition (genres = Strategy and year >= 2023)
On * Viking Saga: The Sea Raider, year:2023, genres: Strategy
* Fantasy Kingdoms: Chronicles of Eldoria, year:2023, genres: Strategy
2/2 games meet the condition (genres = Strategy and year >= 2023)

In addition to custom metadata, you can also filter using S3 prefixes (which is a built-in metadata, so you don’t need to provide any metadata files). For example, if you organize the game documents into prefixes by publisher (for example, s3://$bucket_name/video_game/$publisher/$game_id.csv), you can filter with the specific publisher (for example, neo_tokyo_games) using the following syntax:

publisher_filter = {
    "startsWith": {
                    "key": "x-amz-bedrock-kb-source-uri",
                    "value": "s3://$bucket_name/video_game/neo_tokyo_games/"
                }
}

Hlanza

Ukuze uhlanze izinsiza zakho, qedela lezi zinyathelo ezilandelayo:

  1. Delete the knowledge base:
    1. Ku-console ye-Amazon Bedrock, khetha Izisekelo zolwazi ngaphansi I-Orchestration kufasitela lokuhambisa.
    2. Choose the knowledge base you created.
    3. Qaphela Ubunikazi be-AWS Nokuphathwa Kokufinyelela (IAM) service role name in the Knowledge base overview ingxenye.
    4. In the Idatha yeVector section, take note of the collection ARN.
    5. Khetha Susa, then enter delete to confirm.
  2. Delete the vector database:
    1. Use Isevisi ye-Amazon OpenSearch console, khetha Amaqoqo ngaphansi Iseva kufasitela lokuhambisa.
    2. Enter the collection ARN you saved in the search bar.
    3. Select the collection and chose Susa.
    4. Enter confirm in the confirmation prompt, then choose Susa.
  3. Delete the IAM service role:
    1. Kukhonsoli ye-IAM, khetha Izindima kufasitela lokuhambisa.
    2. Search for the role name you noted earlier.
    3. Select the role and choose Susa.
    4. Enter the role name in the confirmation prompt and delete the role.
  4. Delete the sample dataset:
    1. On the Amazon S3 console, navigate to the S3 bucket you used.
    2. Select the prefix and files, then choose Susa.
    3. Enter permanently delete in the confirmation prompt to delete.

Isiphetho

In this post, we covered the metadata filtering feature in Knowledge Bases for Amazon Bedrock. You learned how to add custom metadata to documents and use them as filters while retrieving and querying the documents using the Amazon Bedrock console and the SDK. This helps improve context accuracy, making query responses even more relevant while achieving a reduction in cost of querying the vector database.

For additional resources, refer to the following:


Mayelana Ababhali

Corvus Lee is a Senior GenAI Labs Solutions Architect based in London. He is passionate about designing and developing prototypes that use generative AI to solve customer problems. He also keeps up with the latest developments in generative AI and retrieval techniques by applying them to real-world scenarios.

Ahmed Ewis is a Senior Solutions Architect at AWS GenAI Labs, helping customers build generative AI prototypes to solve business problems. When not collaborating with customers, he enjoys playing with his kids and cooking.

Chris Pecora is a Generative AI Data Scientist at Amazon Web Services. He is passionate about building innovative products and solutions while also focusing on customer-obsessed science. When not running experiments and keeping up with the latest developments in GenAI, he loves spending time with his kids.

indawo_img

Latest Intelligence

indawo_img