Beginner’s Guide To Retrieval Chain From LangChain

5 min readApr 26, 2024

In the previous article, we saw how LangChain can be used to create a simple LLM chain with a prompt template. In this article, we will see how LangChain can be used as a Retrieval Chain when there is too much data to pass to the LLM as context. The source for the data can be anything: a SQL table, the Internet or documents.

The logic is as follows.

Instead of sending the complete data along with the user question to the LLM, we convert the data into vector embeddings and store them in a special kind of database called the vector store.

Every time, the user asks a question, a retriever will fetch only the relevant pieces of information from the vector store and pass it along with the user question as a prompt to the LLM.

Vector Store and Vector Embeddings

Let’s look at closely at what a vector store and vector embeddings are. A vector store is used to store data in the form of vector embeddings. Vector Embeddings are numerical representations of source data, that captures the meaning and relationship of words, phrases and other data types.

For example, the vector embeddings for “dog” and “puppy” would be close together because they share a similar meaning and often appear in similar contexts.

Source: LangChain

When user asks a question, the retriever creates a vector embedding of the user question and then retrieves only those vector embeddings from the vector store that are ‘most similar’ to the vector embedding of the user question. This, therefore, prevents the need for the entire source data to be sent to the LLM each time the user asks a question.

For our example, let’s say we want a website to be the context for the user question to the LLM.

Loading Data From Website

First, we need to obtain the data from the website. To do this, we will use the WebBase Loader.

This requires installing BeautifulSoup library which is used to retrieve html data from web pages.

pip install beautifulsoup4

After that, we use WebBaseLoader to extract all the data from the website.

from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://www.dasa.org")

docs = loader.load()

Now, we have the data from the website https://www.dasa.org as unstructured textual data in “docs”.

Splitting Data Into Fragments

In order to create vector embeddings, we need to split the textual data into fragments based on number of characters. For this, we will use RecursiveCharacterTextSplitter. The default fragment size is 1000 characters.

from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)

Now, we have the split text stored in “documents”.

Next, we need to convert the split text into vector embeddings.

Converting Text Into Vector Embeddings

For creating vector embeddings, we can use an embedding model from OpenAIEmbeddings

from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()

Storing in a Vector Store

Next, we need a vector store to store the vector embeddings. We can use FAISS, a vector store developed by Facebook.

First, from the terminal, we need to install the required package to work with the vector store.

pip install faiss-cpu

Then, we import the vector store.

from langchain_community.vectorstores import FAISS

Now, we convert the split text “documents” into vector embeddings using the “embeddings” algorithm and put them into the vector store FAISS.

vector = FAISS.from_documents(documents, embeddings)

The process we have done till now is visually represented below:

Process of creating vector embeddings and inserting into vector store

Creating a Retrieval Chain

Now that we have the data in the vector store, let’s create a retrieval chain. For the retrieval chain, we need a prompt.

The prompt will have the retrieved data and the user question. To do this, we use a prompt template.

# Import ChatOpenAI and create an llm with the Open AI API key
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(api_key="<Your Open AI API Key here>")

# Create a prompt template
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_template("""Answer the following question based only on the provided context:
<context>
{context}
</context>
Question: {input}""")

Here, “context” is the retrieved data, and “input” is the user question. Remember to insert your OpenAI API Key in the code.

We use a document chain called create_stuff_documents_chain to send the prompt to the llm.

from langchain.chains.combine_documents import create_stuff_documents_chain
document_chain = create_stuff_documents_chain(llm,prompt)

All that is left now is to retrieve the data that is closely related to the user question from the vector store and provide it to the LLM as part of the prompt. This is illustrated in the figure below.

Sending the prompt with retrieved data

This is done using a retriever and a retrieval chain.

retriever = vector.as_retriever()
from langchain.chains import create_retrieval_chain
retrieval_chain = create_retrieval_chain(retriever, document_chain)

Finally, we can now invoke this chain.

response = retrieval_chain.invoke({"input":"What are the talent products delivered by DASA"})
print (response["answer"])

The answer given is highly accurate due to the context provided.

The talent products delivered by DASA focus on enhancing individual and team capabilities in organizations, preparing them for high-performance environments that boost enterprise agility, customer centricity, and innovation.

To sum it up, the Retrieval Chain from LangChain uses the input question to retrieve the relevant data from a vector store, and sends only the relevant data as context along with the user question to the LLM. This enables a large amount of data to be used as context for the LLM.