Introduction to AI agents and applications
Large language models (LLMs) are essential for applications that answer complex questions, generate content, and summarize documents. They also enable AI agents that use natural language to coordinate actions and return results.
AI applications and agent systems are complex, but frameworks like LangChain, LangGraph, and LangSmith simplify development by providing modular building blocks.
Building LLM-based applications and agents
LLMs excel at natural language processing, making them valuable for applications like summarization, translation, and chatbots across diverse fields.
LLM apps, despite diverse use cases, generally follow a similar structure: processing natural language input, utilizing unstructured data, and generating prompts for the model. These systems are categorized into LLM-based applications/engines, chatbots, and AI agents, each with distinct functionalities and capabilities.
LLM-based applications: Summarization and Q&A engines
LLM-based applications act as backend tools for specific natural language requests, such as summarization engines condensing lengthy text passages.

Summarization engines are shared services accessed via REST APIs, while Q&A engines answer queries against a knowledge base in two phases: ingestion and query.
The engine builds its knowledge base by converting text into embeddings and storing them in a vector store for efficient retrieval.
Embeddings are vector representations of words or text units that capture semantic and syntactic relationships. They enable language models to understand meaning, context, and similarity, and are typically learned during pretraining.
The engine uses a semantic search to retrieve relevant information from a vector store, combines it with the user’s question, and sends it to an LLM to generate an answer.
Retrieval-Augmented Generation (RAG) is a foundational technique for improving LLM outputs.
Retrieval-Augmented Generation (RAG) enhances LLM text generation by incorporating context from a local knowledge base at query time.

Engines automate processes and simplify workflows by handling retrieval, transformation, and orchestration. Unlike engines, AI agents use LLMs to plan and adapt workflows at runtime, offering greater flexibility for open-ended tasks.
LLM-based chatbots
LLM-based chatbots, unlike simple question-answer scripts, enable ongoing, natural conversations while ensuring safety and relevance through prompt design and role-based messaging formats.
Chatbots use local knowledge sources to improve accuracy and provide relevant, reliable answers.
Chatbots use conversation memory to maintain coherent and personalized responses, but this is limited by the model’s context window.
LLM-based chatbots are specialized for tasks like summarization, question answering, or translation, and can respond directly or combine input with stored knowledge.

Chatbots, unlike summarization engines, allow for real-time refinement of responses, making the summarization process collaborative and producing more tailored answers.

AI agents
AI agents, unlike simple pipelines, work with LLMs to independently carry out multi-step tasks involving multiple data sources and adaptive decision-making.
The agent uses an LLM to decide which tools to use, runs them, and processes the results until a complete solution is produced.

A tour operator uses an AI agent to generate holiday packages. The agent prompts an LLM to select relevant tools, execute queries, and summarize the holiday plan.
AI agents are gaining popularity, with companies like OpenAI, Google, and Amazon releasing agent SDKs. These agents can be programmed to include human-in-the-loop steps for critical actions, ensuring oversight and trust.
LangGraph, LangChain’s agent framework, provides prebuilt classes and tool integrations for building advanced AI agents. These agents leverage LLM capabilities for complex, automated workflows across various domains.
The Model Context Protocol (MCP) by Anthropic, introduced in late 2024, standardizes tool exposure for agents, shifting integration burden to services. This de facto standard, adopted by major LLM providers, simplifies agent development by eliminating the need for custom connectors.
Introducing LangChain
LangChain addresses common challenges in building chatbots and search engines by providing a consistent set of building blocks for data ingestion, processing, and retrieval. It helps developers manage context limits, costs, and maintainability while enabling multi-step workflows and API calls.
LangChain, a rapidly evolving framework, enables building LLM-based systems with modularity, composability, and extensibility. Its design allows for easy integration of new components and promotes interoperability, making it a valuable skill for future frameworks.
LangChain architecture
LangChain’s workflow involves ingesting text from various sources, splitting it into chunks, embedding the chunks, and storing both the chunks and embeddings in a vector store for efficient retrieval.

LLM applications typically use vector stores for context retrieval, but graph databases are increasingly used for representing and reasoning about entity relationships, especially in agent architectures. LangChain integrates with graph databases like Neo4j and supports graph-based memory and planning, enabling advanced agent functionalities.
LangChain introduced the Runnable interface and LCEL to simplify component chaining. LangGraph allows for more complex, branching workflows beyond simple pipelines.
LangChain processes documents by extracting content, splitting large texts, and converting them into LangChain Document objects. It utilizes embedding models, vector stores, and knowledge graph databases for semantic retrieval, and retrievers to query these backends for relevant documents.
LangChain components can be organized into a chain, a customized sequence for specific use cases, or an agent, which manages a dynamic workflow with flexible processing.
LangChain’s core object model
LangChain’s object model, organized into class hierarchies, centers around the
Document entity. It illustrates how loaders generate documents, splitters
divide them, and these are processed by vector stores and retrievers.

LangChain’s architecture includes primary classes like Document,
DocumentLoader, TextSplitter, VectorStore, and Retriever, with a focus
on modular workflows enabled by the Runnable interface. The LangChain Hub offers
a repository for reusable components.

Typical LLM use cases
Large Language Models (LLMs) are used in various tasks, including text classification, natural language understanding, semantic search, autonomous reasoning, structured data extraction, code generation, and personalized education. While LLMs are versatile, ensuring they meet user needs in specific domains requires additional considerations.
How to adapt an LLM to your needs
Prompt engineering, retrieval-augmented generation, and fine-tuning are techniques to enhance LLM response ability.
Prompt engineering
Prompt engineering involves designing inputs for large language models (LLMs) to guide their behavior and improve response accuracy. Techniques like in-context learning and few-shot prompting enable LLMs to generalize from examples without fine-tuning. While powerful, prompt engineering has limitations, particularly when applications require grounding answers in specific data, necessitating techniques like RAG.
RAG
Improving LLM responses involves grounding them in your data using RAG to retrieve relevant context from a local knowledge base. This workflow includes building a knowledge base by ingesting documents, splitting them into chunks, and converting them into vector representations for similarity comparison.

RAG offers efficiency by retrieving key document chunks, accuracy by grounding responses on real data, and flexibility by adapting to different domains.
Grounding an LLM involves using prompts with context from a trusted knowledge source to ensure factual responses.
Hallucinations in LLMs occur when the model generates incorrect or fabricated responses due to poor-quality training data or lack of information.
RAG reliability requires explicit instructions for LLMs to rely on retrieved context. LangChain offers guardrails and validators, but human review is best for high-stakes cases.
Fine-tuning
Fine-tuning adapts pretrained LLMs for specific tasks by training on curated datasets, improving efficiency but requiring time, expertise, and potentially costly training. Recent advances like LoRA and instruction tuning have made fine-tuning more accessible.
Fine-tuning is debated, with general-purpose LLMs and RAG showing strong performance. However, fine-tuning remains valuable in specialized domains like medicine, law, and finance.
Fine-tuning customizes LLMs for domain expertise but is time-consuming and complex. This book focuses on building AI agents and applications, not on creating or modifying models.
Which LLMs to choose
LangChain simplifies integration with various LLMs, offering a standardized interface for easy model switching and minimal code changes.
Larger models are more accurate but slower and more expensive, while smaller models are faster and cheaper but less accurate.
The best AI model balances accuracy, speed, and cost according to application requirements. Additional considerations include model purpose, context window size, multilingual support, instruction versus reasoning capabilities, and open source versus proprietary options.
What you’ll learn from this book
This book guides readers through building LLM-powered applications, starting with prompt engineering and progressing to custom engines, chatbots, and AI agents using LangChain and LangGraph. It covers architectural patterns like RAG, explores open-source models, and delves into the full application lifecycle, including debugging, monitoring, and deployment. By the end, readers will have a portfolio of applications and the skills to design and implement LLM-powered systems.
Last updated on