Introduction to RAG: Making AI Models Work with Your Private Data

Large Language Models (LLMs) are trained on vast amounts of public internet data, which is why they can answer general questions like “What is the capital of France?” or help proofread documents.

But what if you want to use these powerful language capabilities with your organization’s private documents?

There are two main approaches to accomplish this:

Full Retraining: Train the model from scratch using your private documents. This is typically expensive and unnecessary for most use cases
Context Injection: Feed your documents into the model’s context window. This is much simpler to implement. However, context windows have size limits that may not fit all your documents

This is where RAG (Retrieval Augmented Generation) comes in: RAG is a technique that allows you to enhance LLMs with your own data without the need for complete retraining.

It efficiently retrieves and uses only the most relevant parts of your documents when needed.

In the following sections, we’ll walk through a practical example of building a RAG application using Langchain.

RAG application with LangChain

Langchain comes with a lot of utilities to build the entire chain of processing required to build a RAG application

input content

The input document I’ve chosen is a transcript of one of Led Fridman podcast episode: Rick Spence: CIA, KGB, Illuminati, Secret Societies, Cults & Conspiracies | Lex Fridman Podcast #451

I copied the transcript into a markdown file. See below for the link to the github repository.

document loader and splitter

LangChain comes with the concept of DocumentLoaders which can load your documents and return Documents that can then be processed by LLMs

First step is to load the markdown document

from langchain_community.document_loaders import UnstructuredMarkdownLoader
from langchain_core.documents import Document

markdown_path = "./lex-fridman-451.md"
loader = UnstructuredMarkdownLoader(markdown_path)
documents = loader.load()

print(documents)

I had a few errors with the loader…

First I got the error:

Traceback (most recent call last):
  File ".../ex-003/rag01.py", line 6, in <module>
    documents = loader.load()

<stack traces>

 Resource punkt_tab not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt_tab')

The loader uses the NLTK - Natural Language Toolkit dependency and this package needs some data file.

They suggest to run a small python program to load those data files, but that gave another error:

[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1020)

Good old Stack Overflow is still a good source of information and I found the solution here: https://stackoverflow.com/questions/38916452/nltk-download-ssl-certificate-verify-failed

The trick was to find a shell command that would install some certiicates.

find /Applications -name "Install*.*"
/Applications/Python 3.13/Install Certificates.command

After that step the python snippet worked:

import nltk
nltk.download('punkt')


[nltk_data] Downloading package punkt to $HOME/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.

I had to load those other data file in order for the LangChain loader code to finally work: punkt_tab , and then averaged_perceptron_tagger_eng

We can check the length of the document:

# print number of documents
print("number of documents: ", len(documents))

# we print the length of the first document's page content
print("length of the first document's page content: ", len(documents[0].page_content))

That gives:

number of documents:  1
length of the first document's page content:  173611

Even though the document may fit in a few LLM context, LangChain still recommends a document of that size as models can struggle to find information in very long inputs.

We split the document with a “splitter” :

from langchain_text_splitters import RecursiveCharacterTextSplitter


# we split the documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, add_start_index=True
)
all_splits = text_splitter.split_documents(documents)

print("number of splits: ", len(all_splits))

this gives us 261 splits.

document storage: vector database

These splits must be stored in a format that allows runtime searching.

This is typically accomplished by converting the splits into embeddings and storing them in a vector database

Here again, the vector database configuration was not as smooth as it could be.

The LangChain example was suggesting to use the Chroma vectore store, but when I tried to install the langchain_chroma, I got the error:

ERROR: Cannot install langchain_chroma because these package versions have conflicting dependencies.

The conflict is caused by:
    chromadb 0.5.20 depends on onnxruntime>=1.14.1
    chromadb 0.5.18 depends on onnxruntime>=1.14.1
    chromadb 0.5.17 depends on onnxruntime>=1.14.1

It seems that the error is due with some incompatibility with my version of Python so I decided to use a different vector database.

The nice thing with LangChain is that it comes with a lots of different vector stores.

I decided to use a redis backed vector store:

pip install langchain_redis

yet another error

 File ".../ex-003/env/lib/python3.13/site-packages/redisvl/redis/connection.py", line 306, in validate_sync_redis
    validate_modules(installed_modules, required_modules)
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../ex-003/env/lib/python3.13/site-packages/redisvl/redis/connection.py", line 179, in validate_modules
    raise RedisModuleVersionError(error_message)
redisvl.exceptions.RedisModuleVersionError: Required Redis db module search >= 20600 OR searchlight >= 20600 not installed. See Redis Stack docs at https://redis.io/docs/latest/operate/oss_and_stack/install/install-stack/.

The error was easy to solve: this module relies on a specific version of redis server.

The installation was a no brainer: I installed the new version, stopped my current redis server and started the new redis from the command line

brew tap redis-stack/redis-stack
brew install redis-stack
brew services stop redis

redis-stack-server

To create a vector store, you’ll need an embedding generator. In this case, I chose to use OpenAI’s embedding model to convert text into numerical vectors.

vectorstore = RedisVectorStore.from_documents(
    documents=all_splits,
    embedding=OpenAIEmbeddings())

The vector store takes the splits we got after loading the document.

querying - where the magic happens

That store can now be used as a retriever when we will query that document:

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 6})

How RAG Processes Your Questions:

When you ask a question about a document, here’s what happens:

The retriever takes your question and creates an embedding (a numerical representation) from it
It then searches the vector store to find matching content from your stored documents
The matching content becomes the context for your question
Finally, this context is sent along with your question to the LLM for answering

The best part about LangChain is that all these steps can be connected with a single command.

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

This rag_ chain is the mechanism by which you ask a question based on the document:

while True:
    try:
        question = input("\n\nEnter your question (or 'quit' to exit): ")
        if question.lower() in ['quit', 'exit', 'q']:
            break

        print("\nAnswer:", flush=True)
        for chunk in rag_chain.stream(question):
            print(chunk, end="", flush=True)

    except KeyboardInterrupt:
        print("\nExiting...")
        break
    except Exception as e:
        print(f"\nError occurred: {str(e)}")

And now you can query your document:

Here are a few questions I asked to that document:

Enter your question (or 'quit' to exit): Who are the illuminati?

Answer:
The Illuminati, originally founded as the Order Perfectibilists by Adam Weishaupt in 1776 in Germany, aimed to create a one-world order by subverting existing religions and governments. The term "Illuminati" refers to those who are "illuminated" or have seen the light, and various organizations have adopted this name over time. Its legacy continues to fascinate and influence modern secret societies and conspiracy theories.

Enter your question (or 'quit' to exit): what can you tell me about the Zodiac killer?

Answer:
The Zodiac Killer is a notorious unidentified serial killer active in Northern California during the late 1960s and early 1970s, known for his cryptic letters and symbols sent to the press. His victims primarily included couples, and he used both guns and knives in his attacks. There are theories regarding possible occult connections to his crimes, but the definitive identity and motives of the Zodiac Killer remain a mystery.

Enter your question (or 'quit' to exit): what is the best agency?CIA or KGB?

Answer:
Determining the "best" agency between the CIA and KGB is subjective and depends on the criteria used. Historically, some argue that Russian Intelligence Services, including the KGB, have shown consistent performance and effectiveness, while the CIA has faced challenges due to its separation from domestic intelligence operations. Ultimately, both agencies have their strengths and weaknesses, making direct comparison complex.

references

the code for this example is at: https://github.com/pcarion/deeplearning/tree/main/ex-003

A LangChain LLM RAG application