Prompting Techniques

Advanced prompting engineering techniques improve LLM performance on complex tasks.

Zero-Shot Prompting

Large language models (LLMs) like GPT-3.5 Turbo and Claude 3 are trained on large datasets and can perform tasks in a “zero-shot” manner, meaning they can understand and execute tasks without examples. Instruction tuning and reinforcement learning from human feedback (RLHF) enhance these capabilities, aligning models with human preferences.

Few-Shot Prompting

Few-shot prompting, a technique enabling in-context learning, improves large-language model performance on complex tasks by providing demonstrations in the prompt. The format and distribution of demonstrations, even with random labels, significantly impact performance, with newer GPT models showing increased robustness to format variations.

Limitations of Few-shot Prompting

Few-shot prompting is not always effective for complex reasoning tasks. Adding examples to the prompt did not improve the model’s ability to correctly answer the reasoning problem. More advanced prompting techniques, such as chain-of-thought prompting, may be necessary for these tasks.

Chain-of-Thought Prompting

Chain-of-Thought (CoT) Prompting

Chain-of-thought (CoT) prompting, introduced in Wei et al. (2022), enables complex reasoning by incorporating intermediate reasoning steps. Combining CoT with few-shot prompting yields better results on complex tasks requiring reasoning.

Zero-shot COT Prompting

Zero-shot CoT, a recent idea, involves adding “Let’s think step by step” to prompts. This simple addition significantly improves model performance, especially when examples are limited.

Automatic Chain-of-Thought (Auto-CoT)

Auto-CoT, a method for generating demonstrations in chain-of-thought prompting, uses question clustering and demonstration sampling to create diverse and accurate examples. It leverages LLMs to generate reasoning chains, mitigating the effects of mistakes in generated chains.

Meta Prompting

Meta prompting is an advanced prompting technique that focuses on the structure and syntax of tasks rather than specific content. It offers advantages over few-shot prompting, including token efficiency, fair comparison, and zero-shot efficacy, making it applicable across various domains for complex reasoning tasks.

Self-Consistency

Self-consistency, a technique for prompt engineering, improves chain-of-thought prompting by sampling diverse reasoning paths and selecting the most consistent answer. This technique is demonstrated using arithmetic reasoning examples, where the final answer is determined by majority vote among multiple outputs.

Generated Knowledge Prompting

The paper by Liu et al. (2022) explores using language models (LLMs) to generate knowledge before making predictions, particularly for commonsense reasoning tasks. The example provided demonstrates how generating knowledge about golf can improve the model’s accuracy in answering questions about the sport. The model’s confidence in its answers varied depending on the generated knowledge, highlighting the importance of knowledge generation in improving LLM performance.

Prompt Chaining

Introduction to Prompt Chaining

Prompt chaining, a technique in prompt engineering, breaks tasks into subtasks for LLMs. This approach improves performance, transparency, and controllability, making it valuable for building conversational assistants and enhancing user experience.

Use Cases for Prompt Chaining

Prompt Chaining for Document QA

Prompt chaining is used to answer questions about large text documents by breaking down the task into smaller steps. The first prompt extracts relevant quotes from the document, while the second prompt uses those quotes to generate a helpful answer. This approach simplifies complex tasks and can be applied to various scenarios.

Tree of Thoughts (ToT)

Tree of Thoughts (ToT) is a framework that enhances language models’ problem-solving capabilities by maintaining a tree of intermediate thoughts. ToT uses search algorithms like DFS and BFS to systematically explore these thoughts, enabling the model to self-evaluate progress and eliminate impossible solutions. This approach outperforms traditional prompting techniques, particularly for complex tasks requiring strategic lookahead.

Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG) is a method that combines information retrieval with text generation to improve factual consistency and reliability in language models. RAG retrieves relevant documents from external sources like Wikipedia to provide context for the text generator, allowing for adaptive and up-to-date information. This approach has shown strong performance on various benchmarks and is increasingly being combined with large language models.

Automatic Reasoning and Tool-use (ART)

ART, a new framework, uses a frozen LLM to automatically generate intermediate reasoning steps as a program. It outperforms few-shot prompting and automatic CoT on unseen tasks in the BigBench and MMLU benchmarks.

Automatic Prompt Engineer (APE)

Zhou et al. (2022) propose the Automatic Prompt Engineer (APE) framework, which uses large language models to automatically generate and select instructions for tasks. APE outperforms human-engineered prompts, discovering a better zero-shot CoT prompt that improves performance on benchmarks.

Active-Prompt

Active-Prompt, a new prompting approach, adapts LLMs to task-specific example prompts by selecting the most uncertain questions for human annotation, improving the effectiveness of exemplars.

Directional Stimulus Prompting

Li et al. (2023) propose a new prompting technique, Directional Stimulus Prompting, using a tuneable policy LM to guide LLM generation.

PAL (Program-Aided Language Models)

A method called program-aided language models (PAL) is presented, which uses LLMs to generate programs as intermediate reasoning steps. This approach is demonstrated using LangChain and OpenAI GPT-3 to answer date-related questions by leveraging the Python interpreter. The example shows how to configure the model, set up a prompt, and execute the generated Python code to obtain the desired answer.

ReAct Prompting

ReAct, a framework introduced by Yao et al., 2022, uses LLMs to generate reasoning traces and task-specific actions, improving reliability and factual responses. ReAct outperforms state-of-the-art baselines and enhances human interpretability and trustworthiness.

How it Works?

ReAct is a paradigm that combines reasoning and acting with LLMs, enabling dynamic reasoning and interaction with external environments. It generates verbal reasoning traces and actions for tasks, allowing the system to perform dynamic reasoning and adjust plans while incorporating additional information.

ReAct Prompting

ReAct prompting involves creating few-shot exemplars from training set cases, like HotPotQA, to demonstrate its functionality. These exemplars, consisting of thought-action-observation steps, are used to achieve various tasks such as question decomposition, information extraction, and answer synthesis. Different prompt setups are employed based on the task type, with more thought steps for reasoning-heavy tasks and sparse thoughts for decision-making tasks.

Results on Knowledge-Intensive Tasks

ReAct outperforms Act on knowledge-intensive reasoning tasks, but lags behind CoT on HotpotQA. Combining ReAct and CoT+Self-Consistency generally yields the best results.

Results on Decision Making Tasks

ReAct outperforms Act on decision-making tasks in complex environments, demonstrating the advantage of combining reasoning and acting. However, prompting-based methods still fall short of expert human performance.

LangChain ReAct Usage

The ReAct prompting approach is demonstrated using OpenAI and LangChain. The example shows how an agent, configured with a search API and LLM math tool, can answer a query about Olivia Wilde’s boyfriend’s age raised to a power. The agent breaks down the task into steps, using search and calculator tools to find the answer.

https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/notebooks/react.ipynb

Reflexion

Reflexion is a framework that enhances language-based agents through linguistic feedback. It uses an Actor to generate actions, an Evaluator to score outputs, and a Self-Reflection model to provide verbal reinforcement cues for self-improvement. This process helps the agent learn from mistakes and improve performance on various tasks.

Results

Reflexion agents significantly improve performance on decision-making, reasoning, and programming tasks. Reflexion outperforms baseline approaches and previous state-of-the-art methods on various benchmarks.

When to Use Reflexion?

Reflexion is a lightweight alternative to traditional reinforcement learning, enabling agents to learn from trial and error through nuanced verbal feedback and explicit memory. It excels in sequential decision-making, reasoning, and programming tasks, but faces limitations in self-evaluation, long-term memory, and code generation.

Multimodal CoT Prompting

Multimodal chain-of-thought prompting, incorporating text and vision, outperforms GPT-3.5 on the ScienceQA benchmark.

On this page