Qiyang Wang

Introduction to AI agents and applications

Large language models (LLMs) are essential for applications that answer complex questions, generate content, and summarize documents. They also enable AI agents that use natural language to coordinate actions and return results.

AI applications and agent systems are complex, but frameworks like LangChain, LangGraph, and LangSmith simplify development by providing modular building blocks.

Building LLM-based applications and agents

LLMs excel at natural language processing, making them valuable for applications like summarization, translation, and chatbots across diverse fields.

LLM apps, despite diverse use cases, generally follow a similar structure: processing natural language input, utilizing unstructured data, and generating prompts for the model. These systems are categorized into LLM-based applications/engines, chatbots, and AI agents, each with distinct functionalities and capabilities.

LLM-based applications: Summarization and Q&A engines

LLM-based applications act as backend tools for specific natural language requests, such as summarization engines condensing lengthy text passages.

A summarization engine efficiently summarizes and stores content from large
volumes of text and can be invoked by other systems through the REST
API.

Summarization engines are shared services accessed via REST APIs, while Q&A engines answer queries against a knowledge base in two phases: ingestion and query.

The engine builds its knowledge base by converting text into embeddings and storing them in a vector store for efficient retrieval.

Embeddings are vector representations of words or text units that capture semantic and syntactic relationships. They enable language models to understand meaning, context, and similarity, and are typically learned during pretraining.

The engine uses a semantic search to retrieve relevant information from a vector store, combines it with the user’s question, and sends it to an LLM to generate an answer.

Retrieval-Augmented Generation (RAG) is a foundational technique for improving LLM outputs.

Retrieval-Augmented Generation (RAG) enhances LLM text generation by incorporating context from a local knowledge base at query time.

A Q&A engine implemented with RAG design. An LLM query engine stores
domain-specific document information in a vector store. When an external system
sends a query, it converts the natural language question into its embeddings (or
vector) representation, retrieves the related documents from the vector store,
and then gives the LLM the information it needs to craft a natural language
response.

Engines automate processes and simplify workflows by handling retrieval, transformation, and orchestration. Unlike engines, AI agents use LLMs to plan and adapt workflows at runtime, offering greater flexibility for open-ended tasks.

LLM-based chatbots

LLM-based chatbots, unlike simple question-answer scripts, enable ongoing, natural conversations while ensuring safety and relevance through prompt design and role-based messaging formats.

Chatbots use local knowledge sources to improve accuracy and provide relevant, reliable answers.

Chatbots use conversation memory to maintain coherent and personalized responses, but this is limited by the model’s context window.

LLM-based chatbots are specialized for tasks like summarization, question answering, or translation, and can respond directly or combine input with stored knowledge.

A summarization chatbot shares some similarities with a summarization engine,
but it offers an interactive experience where the LLM and the user can work
together to fine-tune and improve the
results.

Chatbots, unlike summarization engines, allow for real-time refinement of responses, making the summarization process collaborative and producing more tailored answers.

Sequence diagram that outlines how a user interacts with an LLM through a
chatbot to create a more concise
summary

AI agents

AI agents, unlike simple pipelines, work with LLMs to independently carry out multi-step tasks involving multiple data sources and adaptive decision-making.

The agent uses an LLM to decide which tools to use, runs them, and processes the results until a complete solution is produced.

Workflow of an AI agent tasked with assembling holiday
packages

A tour operator uses an AI agent to generate holiday packages. The agent prompts an LLM to select relevant tools, execute queries, and summarize the holiday plan.

AI agents are gaining popularity, with companies like OpenAI, Google, and Amazon releasing agent SDKs. These agents can be programmed to include human-in-the-loop steps for critical actions, ensuring oversight and trust.

LangGraph, LangChain’s agent framework, provides prebuilt classes and tool integrations for building advanced AI agents. These agents leverage LLM capabilities for complex, automated workflows across various domains.

The Model Context Protocol (MCP) by Anthropic, introduced in late 2024, standardizes tool exposure for agents, shifting integration burden to services. This de facto standard, adopted by major LLM providers, simplifies agent development by eliminating the need for custom connectors.

Introducing LangChain

LangChain addresses common challenges in building chatbots and search engines by providing a consistent set of building blocks for data ingestion, processing, and retrieval. It helps developers manage context limits, costs, and maintainability while enabling multi-step workflows and API calls.

LangChain, a rapidly evolving framework, enables building LLM-based systems with modularity, composability, and extensibility. Its design allows for easy integration of new components and promotes interoperability, making it a valuable skill for future frameworks.

LangChain architecture

LangChain’s workflow involves ingesting text from various sources, splitting it into chunks, embedding the chunks, and storing both the chunks and embeddings in a vector store for efficient retrieval.

LangChain architecture. The document loader imports data, which the text
splitter divides into chunks. These are vectorized by an embedding model, stored
in a vector store, and retrieved through a retriever for the LLM. The LLM cache
checks for prior requests to return cached responses, while the output parser
formats the LLM’s final
response.

LLM applications typically use vector stores for context retrieval, but graph databases are increasingly used for representing and reasoning about entity relationships, especially in agent architectures. LangChain integrates with graph databases like Neo4j and supports graph-based memory and planning, enabling advanced agent functionalities.

LangChain introduced the Runnable interface and LCEL to simplify component chaining. LangGraph allows for more complex, branching workflows beyond simple pipelines.

LangChain processes documents by extracting content, splitting large texts, and converting them into LangChain Document objects. It utilizes embedding models, vector stores, and knowledge graph databases for semantic retrieval, and retrievers to query these backends for relevant documents.

LangChain components can be organized into a chain, a customized sequence for specific use cases, or an agent, which manages a dynamic workflow with flexible processing.

LangChain’s core object model

LangChain’s object model, organized into class hierarchies, centers around the Document entity. It illustrates how loaders generate documents, splitters divide them, and these are processed by vector stores and retrievers.

Object model of classes associated with the Document core entity, including
Document loaders (create Document objects), splitters (create a list of Document
objects), vector stores (store Document objects in vector stores), and
retrievers (retrieve Document objects from vector stores and other
sources)

LangChain’s architecture includes primary classes like Document, DocumentLoader, TextSplitter, VectorStore, and Retriever, with a focus on modular workflows enabled by the Runnable interface. The LangChain Hub offers a repository for reusable components.

Typical LLM use cases

Large Language Models (LLMs) are used in various tasks, including text classification, natural language understanding, semantic search, autonomous reasoning, structured data extraction, code generation, and personalized education. While LLMs are versatile, ensuring they meet user needs in specific domains requires additional considerations.

How to adapt an LLM to your needs

Prompt engineering, retrieval-augmented generation, and fine-tuning are techniques to enhance LLM response ability.

Prompt engineering

Prompt engineering involves designing inputs for large language models (LLMs) to guide their behavior and improve response accuracy. Techniques like in-context learning and few-shot prompting enable LLMs to generalize from examples without fine-tuning. While powerful, prompt engineering has limitations, particularly when applications require grounding answers in specific data, necessitating techniques like RAG.

RAG

Improving LLM responses involves grounding them in your data using RAG to retrieve relevant context from a local knowledge base. This workflow includes building a knowledge base by ingesting documents, splitting them into chunks, and converting them into vector representations for similarity comparison.

A collection of documents is split into text chunks and transformed into
vector-based embeddings. Both text chunks and related embeddings are then stored
in a vector
store.

RAG offers efficiency by retrieving key document chunks, accuracy by grounding responses on real data, and flexibility by adapting to different domains.

Grounding an LLM involves using prompts with context from a trusted knowledge source to ensure factual responses.

Hallucinations in LLMs occur when the model generates incorrect or fabricated responses due to poor-quality training data or lack of information.

RAG reliability requires explicit instructions for LLMs to rely on retrieved context. LangChain offers guardrails and validators, but human review is best for high-stakes cases.

Fine-tuning

Fine-tuning adapts pretrained LLMs for specific tasks by training on curated datasets, improving efficiency but requiring time, expertise, and potentially costly training. Recent advances like LoRA and instruction tuning have made fine-tuning more accessible.

Fine-tuning is debated, with general-purpose LLMs and RAG showing strong performance. However, fine-tuning remains valuable in specialized domains like medicine, law, and finance.

Fine-tuning customizes LLMs for domain expertise but is time-consuming and complex. This book focuses on building AI agents and applications, not on creating or modifying models.

Which LLMs to choose

LangChain simplifies integration with various LLMs, offering a standardized interface for easy model switching and minimal code changes.

Larger models are more accurate but slower and more expensive, while smaller models are faster and cheaper but less accurate.

The best AI model balances accuracy, speed, and cost according to application requirements. Additional considerations include model purpose, context window size, multilingual support, instruction versus reasoning capabilities, and open source versus proprietary options.

What you’ll learn from this book

This book guides readers through building LLM-powered applications, starting with prompt engineering and progressing to custom engines, chatbots, and AI agents using LangChain and LangGraph. It covers architectural patterns like RAG, explores open-source models, and delves into the full application lifecycle, including debugging, monitoring, and deployment. By the end, readers will have a portfolio of applications and the skills to design and implement LLM-powered systems.

Last updated on

On this page