Blog

Building My First AI project using AWS Bedrock And LangChain

Building RAG model using Anthropic Claude, Titan Embeddings & Streamlit

 

Published: Aug 30, 2024 by Sai Jahnavi Bachu, AI/ML Developer


Retrieval-Augmented Generation (RAG) is a cutting-edge approach in the field of natural language processing that leverages both neural text generation and information retrieval to enhance the quality and relevance of model outputs. RAG models can produce more accurate and contextually appropriate responses, especially in scenarios where external knowledge is crucial.

When I set out to create my personal assistant bot, I wanted it to be both powerful and intuitive. To make that happen, I harnessed some seriously advanced tech: the AWS Bedrock runtime environment, FAISS vector storage, and the Titan embedding model. Here’s how each piece plays a critical role in making my bot smart and responsive:

These LangChain components create a remarkably capable and reliable assistant, demonstrating the power and versatility of integrating advanced AI tools.

from langchain_core.prompts import PromptTemplate
from langchain_aws import ChatBedrock
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
from langchain_community.embeddings import BedrockEmbeddings

First up, the Bedrock runtime from AWS. This isn’t just any platform; it’s the backbone for deploying large language models that can process and generate responses dynamically. It’s all about scalability and power. Here’s how I hooked it up:

bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-east-1'
)
llm = ChatBedrock(model_id="anthropic.claude-3-5-sonnet-20240620-v1:0", client=bedrock_runtime)

Next up, I configure the Titan embedding model. This part of the setup is crucial because it transforms regular text into high-quality embeddings. These embeddings are vector representations of text, capturing the subtle semantic meanings, which are key for the retrieval tasks later on.

bedrock_embeddings = BedrockEmbeddings(
    model_id="amazon.titan-embed-text-v1", client=bedrock_runtime)
embeddings = bedrock_embeddings

Here, I load the documents from a text file. These documents could be anything I need my bot to know about, like FAQs or detailed guides on specific topics. After loading, I split these documents into smaller chunks. The reason? It makes them easier to handle and quicker to search through. You can load as many documents as you want !! The document can be in any format. It can be a PDF, Txt, CSV, or a Webpage.

loader = TextLoader('./QA.txt')
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=4)
docs = text_splitter.split_documents(documents)

Setting Up FAISS for Efficient Retrieval Once I have all my text chunked up, I move on to setting up FAISS. This is where all those chunks get turned into a searchable database. I use the embeddings from Titan to create a vector for each chunk, and FAISS makes these vectors searchable.

vectorstore = FAISS.from_documents(
    documents=docs, embedding=embeddings
)

With FAISS ready, I create a retriever. This component is what lets my bot pull up the most relevant chunks of text based on the user’s query. It’s like having a super-fast librarian who knows exactly where everything is stored.

retriever = vectorstore.as_retriever()

Setting Up the Prompt Template To make sure my bot understands exactly how to format its responses, I set up a prompt template. This template guides the model on how to integrate the retrieved information with the user’s query to provide a cohesive and contextually relevant answer.

template = """
you are a brilliant assistant. answer the question in a polite way. Use following piece of context to answer the question.
Combine the chat history and follow up question into a standalone question. Chat History: {chat_history}
Follow up question: {question}
Context: {context}
Answer: 
"""
prompt = PromptTemplate(template=template, input_variables=["context", "question"])

Conversation Memory and RAG Chain To enhance the conversation, I use a memory buffer. This buffer keeps track of the chat history, which helps in maintaining context over a session. The retrieval-augmented generation (RAG) chain combines this memory with the prompt and the retriever to provide responses.

memory = ConversationBufferMemory(
    llm=llm, memory_key="chat_history", output_key='answer', return_messages=True)

rag_chain = ConversationalRetrievalChain.from_llm(retriever=retriever, llm=llm, memory=memory, combine_docs_chain_kwargs={
    'prompt': prompt}, response_if_no_docs_found="I don't know")

Bringing It All to Life with Streamlit Finally, I use Streamlit to create an interactive web interface. This lets me (and anyone I share it with) chat with the bot in a user-friendly environment.

Here is my output: 🤖

Thank you for reading!

View another article