LangChain for easier LLM output parsing

Large Language Models (LLMs) excel at answering questions about given text inputs, but parsing their output in a structured way can be challenging.

For example, when using an LLM to analyze movie reviews, you might want to extract the following information:

  • The overall sentiment (positive or negative)
  • Actors mentioned in the review
  • A one-line summary of the review

While some model providers offer these capabilities natively in their APIs, others don’t.

The Python LangChain API simplifies this extraction process by abstracting it away.

The key step is creating a model that describes the structure of the expected output. This model includes instructions for each property you want to extract from the LLM response.

Here’s an example for our movie review scenario:

class MovieReview(BaseModel):
    isPositive: bool = Field(description="is the review positive or negative")
    summary: str = Field(description="a summary of the review")
    actors: list[str] = Field(description="a list of actors in the movie")

As usual with LangChain, you then assemble all those pieces together by chaining them:

model = ChatOpenAI(model=llmModelName)

# Set up a parser + inject instructions into the prompt template.
parser = PydanticOutputParser(pydantic_object=MovieReview)

prompt = PromptTemplate(
    template="Analyze the following movie review.\n{format_instructions}\n{review}\n",
    input_variables=["review"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

# And a query intended to prompt a language model to populate the data structure.
prompt_and_model = prompt | model

assuming you have a sample of 5 reviews on file, you can then process them:

for i in range(1, 5):
    review_file = f"review{i:02d}.txt"
    output = prompt_and_model.invoke({"review": read_text_file(review_file)})
    response = parser.invoke(output)
    print(f"\nResults for {review_file}:")
    print(response)
    display_json_with_properties(response, ['isPositive', 'summary', 'actors'])

and the output is nicely formatted as expected:

review01.txt
{
  "isPositive": true,
  "summary": "Artistically, 'Joyeux Noel' deeply touches its viewers' hearts and souls, with magnificent performances by all actors, especially Diane Krueger.",
  "actors": [
    "Diane Krueger"
  ],
  "type": "MovieReview"
}

review02.txt:
{
  "isPositive": true,
  "summary": "This version is one of the most beautiful versions, taking poetic license, but still capturing the sense of what happened during that sad time yet wonderful day.",
  "actors": [],
  "type": "MovieReview"
}

review03.txt:
{
  "isPositive": false,
  "summary": "The movie depicts a factual event of WWI but is drawn out and boring at times. It struggles with clarity in its storytelling and diverges from historical facts.",
  "actors": [],
  "type": "MovieReview"
}

review04.txt:
{
  "isPositive": true,
  "summary": "'Joyeux Noël' is an inspirational film that beautifully depicts the Christmas truce during World War I, showcasing the joys and pains of its characters while balancing sentimentality with the harsh realities of war.",
  "actors": [
    "Guillaume Canet",
    "Daniel Brühl",
    "Gary Lewis",
    "Diane Kruger",
    "Benno Fürmann",
    "Natalie Dessay"
  ],
  "type": "MovieReview"
}

This approach saves you from writing a lot of boilerplate code.

references