A LangChain LLM RAG application
Introduction to RAG: Making AI Models Work with Your Private Data
Large Language Models (LLMs) are trained on vast amounts of public internet data, which is why they can answer general questions like “What is the capital of France?” or help proofread documents.
But what if you want to use these powerful language capabilities with your organization’s private documents?
There are two main approaches to accomplish this:
- Full Retraining: Train the model from scratch using your private documents. This is typically expensive and unnecessary for most use cases
- Context Injection: Feed your documents into the model’s context window. This is much simpler to implement. However, context windows have size limits that may not fit all your documents
This is where RAG (Retrieval Augmented Generation) comes in: RAG is a technique that allows you to enhance LLMs with your own data without the need for complete retraining.
It efficiently retrieves and uses only the most relevant parts of your documents when needed.
In the following sections, we’ll walk through a practical example of building a RAG application using Langchain.
RAG application with LangChain
Langchain comes with a lot of utilities to build the entire chain of processing required to build a RAG application
input content
The input document I’ve chosen is a transcript of one of Led Fridman podcast episode: Rick Spence: CIA, KGB, Illuminati, Secret Societies, Cults & Conspiracies | Lex Fridman Podcast #451
I copied the transcript into a markdown file. See below for the link to the github repository.
document loader and splitter
LangChain comes with the concept of DocumentLoaders which can load your documents and return Documents that can then be processed by LLMs
First step is to load the markdown document
I had a few errors with the loader…
First I got the error:
The loader uses the NLTK - Natural Language Toolkit dependency and this package needs some data file.
They suggest to run a small python program to load those data files, but that gave another error:
Good old Stack Overflow is still a good source of information and I found the solution here: https://stackoverflow.com/questions/38916452/nltk-download-ssl-certificate-verify-failed
The trick was to find a shell command that would install some certiicates.
After that step the python snippet worked:
I had to load those other data file in order for the LangChain loader code to finally work: punkt_tab
, and then averaged_perceptron_tagger_eng
We can check the length of the document:
That gives:
Even though the document may fit in a few LLM context, LangChain still recommends a document of that size as models can struggle to find information in very long inputs.
We split the document with a “splitter” :
this gives us 261
splits.
document storage: vector database
These splits must be stored in a format that allows runtime searching.
This is typically accomplished by converting the splits into embeddings and storing them in a vector database
Here again, the vector database configuration was not as smooth as it could be.
The LangChain example was suggesting to use the Chroma vectore store, but when I tried to install the langchain_chroma
, I got the error:
It seems that the error is due with some incompatibility with my version of Python so I decided to use a different vector database.
The nice thing with LangChain is that it comes with a lots of different vector stores.
I decided to use a redis backed vector store:
yet another error
The error was easy to solve: this module relies on a specific version of redis server.
The installation was a no brainer: I installed the new version, stopped my current redis server and started the new redis from the command line
To create a vector store, you’ll need an embedding generator. In this case, I chose to use OpenAI’s embedding model to convert text into numerical vectors.
The vector store takes the splits we got after loading the document.
querying - where the magic happens
That store can now be used as a retriever when we will query that document:
How RAG Processes Your Questions:
When you ask a question about a document, here’s what happens:
- The retriever takes your question and creates an embedding (a numerical representation) from it
- It then searches the vector store to find matching content from your stored documents
- The matching content becomes the context for your question
- Finally, this context is sent along with your question to the LLM for answering
The best part about LangChain is that all these steps can be connected with a single command.
This rag_ chain
is the mechanism by which you ask a question based on the document:
And now you can query your document:
Here are a few questions I asked to that document:
references
- the code for this example is at: https://github.com/pcarion/deeplearning/tree/main/ex-003