Next-Level Chatbots: Using Pinecone and Langchain to Understand and Leverage User Documents
Enhancing Chatbot Intelligence: Step-by-Step Integration of Pinecone and LangChain for Smarter Interactions
Introduction
Chatbots are getting smarter, helping us more by talking and understanding our needs on websites and apps. However, many chatbots still find it hard to fully grasp what we need, especially when it comes to specific information in documents we use. This often makes chatting with them feel less helpful and personal.
That’s where Pinecone and LangChain come in. Pinecone helps sort and find information quickly, while LangChain helps turn the words in our documents into meaningful chatbot conversations.
By using these technologies together, you can make chatbots that understand user documents better and give answers that make more sense. This article will show you how to use Pinecone and Langchain to build smarter chatbots, with easy steps and examples of how they make a difference.
Pre-requisites
You are going to need a few things to get started
The nice thing about using cloud providers to start is everything is modular with this system. If a new embedding model or LLM exists we can easily swap in the new piece to modify the tech stack.
Pinecone stores the document embeddings to later be recalled at any point and searched for using Langchain.
OpenAI is used to both calculate the embeddings and generate the chat bot responses.
Getting Started
The main flow is going to follow this path:
- Initialize embeddings & Create Index
- Create the Pinecone vector stores
- Load and split documents before adding them to Pinecone
- Build a retriever with Langchain & Load LLM
- Create a memory buffer
- Query the conversation chain with our question
- Get the response
Make sure you install all our dependancies
# In your terminal/shell
pip install \
langchain \
langchain-community \
langchain-pinecone \
langchain-openai \
pinecone-client \
pinecone-datasets \
python-dotenv \
pypdf \
You must set the environment variables for Pinecone and OpenAI API keys. The easiest way is to create a .env file with these values and import it in your python file
# .env
# Replace these with your own API keys
OPENAI_API_KEY=sk-abcdefghijklmo
PINECONE_API_KEY=1111-1111-11111-1111
# main.py
# Load env variables
from dotenv import load_dotenv
load_dotenv()
1. Initialize the Embeddings & Create Index
Here we create the embeddings object. For this I am using the large embedding model from Open AI. You can use any model you just have to set the right dimension when creating the index
# main.py
from langchain_openai import OpenAIEmbeddings
# Load embeddings
embeddings = OpenAIEmbeddings(
model="text-embedding-3-large",
)
The Pinecone index is essentially just a database table. Pinecone lets you have multiple indexes.
Next we load Pinecone and attempt to get the index. If it does not exist we create one using the size of our embedding model.
I am using Serverless pinecone indexes as they have $100 free credit promo right now.
# main.py
from pinecone import Pinecone, ServerlessSpec
index_name = "your-index-name"
# Load Pinecone
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
if index_name not in pc.list_indexes().names():
spec = ServerlessSpec(cloud='aws', region='us-east-1')
# I am using dimension 3072 as that is the dimension of the large embedding model
pc.create_index(index_name, dimension=3072, metric="cosine", spec=spec)
while not pc.describe_index(index_name).status['ready']:
time.sleep(1)
2. Create the Pinecone vector stores
A Vector Store is an accessible object of the pinecone index. If you would like the documents to be separated by user you can also define a namespace such as a user id that is passed or some other identifier.
A namespace just tells the vector store to only look at anything tagged with the same namespace. This is useful in a system to not bleed user uploaded documents across instances.
# main.py
from langchain_pinecone import PineconeVectorStore
# Initialize vector store
vectorstore = PineconeVectorStore(index_name=index_name, embedding=embeddings)
# OR
# Initialize vector store with a userID passed in
userID = "ABC123"
vectorstore = PineconeVectorStore(index_name=index_name, embedding=embeddings, namespace=userID)
You are also able to create multiple indexes and create a system that can retrieve from the “global” documents as well as user specific documents.
This is helpful if you want the chat bot to have context of your business that every instance knows (global context) but then also know individual user documents that are kept independent from each other.
I will publish an article about this soon for now we will keep this simple.
3. Load and split Documents & Adding them to Pinecone
This is a 3 step process:
- Load the documents
- Split the documents into chunks
- Add the documents to Pinecone
We will create a helper function for each of these to keep it simple
Load Documents
We will create a load_docs helper function by passing it the document directory path and the vector store object from the previous step.
This function will load all the documents found in the directory_path one by one.
In my next article I will show how to support multiple file types other than PDF including txt files, docx and HTML
# Load documents
import os
from langchain_community.document_loaders import PyPDFLoader
def load_docs(directory_path, vectorstore):
documents = [] # Create an empty list to fill with documents
for filename in os.listdir(directory_path):
file_path = os.path.join(directory_path, filename)
loader = PyPDFLoader(file_path) # Load the filepath with PyPDFLoader
documents.extend(loader.load()) # Add the loaded document onto the list
return documents
Split Documents
We then need to split the documents up into different chunks. There are a few reasons we do this
- Manageable Size: Large documents can be unwieldy and difficult to process in one go due to memory constraints and performance issues.
- Improved Search Precision: When documents are split into chunks, each piece can be individually indexed and searched. This can improve the precision of search results.
- Overlap for Context Preservation: The use of an overlap (like the 20 in the example) ensures that the context around the borders of each chunk is not lost.
- Efficiency in Vector Storage and Retrieval: When using vector databases like Pinecone, each chunk can be independently encoded as a vector and stored. This makes the retrieval process more efficient.
For this function we are passing in the documents list from the last step along with the chunk_size and chunk_overlap. You can play around with these numbers based on the complexity of your documents.
# Split Documents
from langchain.text_splitter import RecursiveCharacterTextSplitter
def split_docs(documents, chunk_size=500, chunk_overlap=20):
text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size,
chunk_overlap=chunk_overlap)
docs = text_splitter.split_documents(documents)
return docs
Add Documents
We now have both load_docs and split_docs to pre-process our documents and now it is time to add them to our Pinecone vector database.
At this point we can also add metadata to each document that allows us to filter and search better when querying. Tagging with a custom ID can be important for later deleting document chunks with the same ID.
We pass in our vectorstore from before along with our processed documents object.
def add_documents(vectorstore, documents):
for i,doc in enumerate(documents): # Enumerate through each chunk
existing_metadata = doc.metadata # Get the default metadata like source
# Add metadata to each document in the list
metadata = {
"document_id": i,
"any_value": "value"
}
metadata.update(existing_metadata)
doc.metadata = metadata # Here you can add them and Pinecone will automatically set their ID's as autogenerated
# or you can set your own id
vectorstore.add_documents(documents, ids=[str(i) for i in range(len(documents))])
Chaining Together
Now we can chain our three helper functions together to process our documents and get them added
# main.py
directory_path = 'test-dataset'
# Load documents
documents = load_docs(directory_path, vectorstore)
docs = split_docs(documents, chunk_size=1000, chunk_overlap=150)
# Add documents
add_documents(vectorstore, docs)
4. Build a retriever & Load LLM
A retriever is able to grab relevant information from the vector store. There are many different types you can read about here, but for now we will use the simplest one.
# main.py
# Create a retriever
retriever = vectorstore.as_retriever()
We can also now load the LLM to “answer” our queries when we later pass it our relevant context as well.
For this I am using gpt-3.5-turbo but you can use any model you like it doesn’t have to be OpenAI as well.
The temperature is set a 0.2. This scale ranges from 0.0–1.0 and specifies how random the LLM should be.
# main.py
from langchain_openai import ChatOpenAI
# Load LLM
llm = ChatOpenAI(
openai_api_key=os.getenv("OPENAI_API_KEY"),
model_name='gpt-3.5-turbo',
temperature=0.2
)
5. Create a memory for the chatbot
Here we are using a simple memory. There are multiple types of memory you can use. Window memory holds the last k messages in memory. This prevents using an exponential number of tokens since we are only holding the “recent” history.
You can choose a different type of memory here
# main.py
from langchain.memory import ConversationBufferWindowMemory
memory = ConversationBufferWindowMemory(
memory_key="chat_history",
return_messages=True,
output_key="answer",
k=8,
)
This prompt template allows the bot to take in the chat history and condense the question giving the LLM context.
In a future article I will show how to link the memory to a database or external file.
Query the Conversation
Now it is time to ask our chatbot questions.
We are using a ConversationalRetrievalChain with our previously created objects. We are using chain_type of “stuff” but you can experiment with the other types of chaining.
We also are setting return_source_documents=True which will provide the chunks used to generate the response.
# main.py
from langchain.chains import ConversationalRetrievalChain
query = "Any query here"
qa = ConversationalRetrievalChain.from_llm(
llm=llm,
chain_type="stuff",
retriever=retriever,
memory=memory,
return_source_documents=True,
)
response = qa(query)
print("Response: ", response) # See the full response
print("Answer: ", response.answer) # Just the chatbots response
Conclusion
Using Pinecone and Langchain, developers can make chatbots smarter and more aware of context. This combination helps chatbots understand and use information from user documents better, providing more accurate and relevant answers. By following the steps in this guide, you can build a chatbot that not only meets your users’ needs through insights from documents but also improves how they interact with your service.
Looking ahead, the goal for chatbot development is to make conversations as meaningful and personalized as possible. With Pinecone and Langchain, you’re well-equipped to enhance user engagement. As you put these technologies to work, keep refining your chatbot based on user feedback to make sure it remains helpful and effective in every conversation.
Follow me for more postings going father in depth including:
- Multi global context
- Enhanced chat memory with across-session storage
- Integration with a NextJS frontend to create an interactive user friendly experience
- Deletion of documents keyed by ID
and more!