Chapter 4: Advanced Prompting Techniques

Having mastered the core principles (Chapter 3), we can now explore more sophisticated techniques. These advanced strategies help unlock deeper reasoning, improve accuracy on complex tasks, integrate external knowledge, and control model behavior more finely.

We will cover:

Chain-of-Thought (CoT) Prompting: Encouraging step-by-step reasoning.
Self-Consistency: Improving reliability by sampling multiple reasoning paths.
Retrieval-Augmented Generation (RAG): Grounding responses in external data.
System Messages: Setting persistent context and rules.
Function Calling / Tool Use: Enabling models to interact with external tools.

1. Chain-of-Thought (CoT) Prompting

Concept: CoT prompting encourages the model to break down a complex problem into intermediate reasoning steps before arriving at the final answer. Instead of just asking for the result, you ask the model to “think step-by-step” or demonstrate the reasoning process through examples.

Why it works: Large language models are better at reasoning when they articulate the steps. It mimics how humans solve complex problems by breaking them down. This is particularly effective for arithmetic, commonsense reasoning, and symbolic manipulation tasks.

How to implement:

Zero-Shot CoT: Simply add phrases like “Let’s think step by step” or “Show your reasoning” to your prompt.
Few-Shot CoT: Provide examples where the desired output includes the reasoning steps leading to the final answer.

Example (Zero-Shot CoT for a Math Problem):

Standard Prompt:

Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A:

Output from GPT-4o:

Plaintext

A: 11

(Model might jump to an incorrect answer like 11 or 8)

Zero-Shot CoT Prompt:

Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: Let's think step by step. 1. Roger starts with 5 tennis balls. 2. He buys 2 cans of balls, and each can has 3 balls. So, he gets 2 * 3 = 6 new balls. 3. In total, he now has his initial 5 balls plus the 6 new balls. 5 + 6 = 11 balls. The final answer is 11.

Plaintext

Roger initially has 5 tennis balls.

He buys 2 cans, each containing 3 balls, so that’s 
2×3=6 additional balls.

Total balls = initial balls + new balls = 
5+6=11.

Answer: 11 tennis balls.

(Adding “Let’s think step by step.” guides the model to articulate the intermediate calculations, increasing accuracy.)

Example (Few-Shot CoT for Logic):

Plaintext

Q: The cafeteria had 23 apples. If they used 20 for lunch and bought 6 more, how many apples do they have?
A: They started with 23 apples. They used 20, so 23 - 20 = 3. They bought 6 more, so 3 + 6 = 9. The answer is 9.

Q: John has 4 pencils, Mary has 3 times as many pencils as John. How many pencils do they have together?
A: John has 4 pencils. Mary has 3 * 4 = 12 pencils. Together they have 4 + 12 = 16 pencils. The answer is 16.

Q: A juggler can juggle 16 balls. Half the balls are golf balls, and half of the golf balls are blue. How many blue golf balls are there?
A:

Output from GPT-4o Mini:

Plaintext

The juggler can juggle 16 balls, and half of them are golf balls, so there are 16 / 2 = 8 golf balls. Half of these golf balls are blue, so 8 / 2 = 4 blue golf balls. The answer is 4.

(The model learns the pattern of showing reasoning steps from the examples.)

%%{ init: { 'theme': 'base', 'themeVariables': { 'primaryColor': '#EDE9FE', 'primaryTextColor': '#5B21B6', 'lineColor': '#A78BFA', 'textColor': '#1F2937', 'fontSize': '18px' }}}%%
graph TD
    A[User Prompt with Complex Question] --> B{CoT Instruction Added?};
    B -- Yes (Zero-Shot CoT) --> C["'Let's think step by step.'"];
    B -- No (Standard Prompt) --> D[Model Attempts Direct Answer];
    B -- Yes (Few-Shot CoT) --> E[Examples Provided with Reasoning Steps];
    C --> F[Model Generates Intermediate Reasoning Steps];
    E --> F;
    F --> G[Model Generates Final Answer Based on Steps];
    D --> H[Potentially Less Accurate Answer];
    G --> I((✅ Accurate Result));
    H --> J((❓ Potentially Inaccurate Result));

    style A fill:#FFFBEB,stroke:#FBBF24,stroke-width:2px
    style C fill:#F3E8FF,stroke:#C4B5FD,stroke-width:1px
    style E fill:#F3E8FF,stroke:#C4B5FD,stroke-width:1px
    style F fill:#DCFCE7,stroke:#86EFAC,stroke-width:1px
    style G fill:#DCFCE7,stroke:#86EFAC,stroke-width:1px
    style D fill:#FEF2F2,stroke:#FCA5A5,stroke-width:1px
    style H fill:#FEF2F2,stroke:#FCA5A5,stroke-width:1px
    style I fill:#10B981,stroke:#059669,color:#fff
    style J fill:#F87171,stroke:#DC2626,color:#fff

2. Self-Consistency

Concept: Self-Consistency builds upon CoT. Instead of just taking the first reasoning path the model generates, you prompt the model (often using CoT) multiple times with the same question (setting a higher temperature parameter encourages diversity). Then, you select the most frequent answer among the different reasoning paths generated.

Why it works: Complex problems can sometimes be solved via multiple reasoning paths. If several different lines of reasoning converge on the same answer, that answer is more likely to be correct than an answer derived from a single, potentially flawed path. It leverages the idea that there’s usually one correct answer but many ways to get it wrong.

How to implement:

Use a Few-Shot CoT prompt structure.
Generate multiple responses (e.g., 5-10) to the same prompt, using a higher temperature setting (e.g., 0.7) to encourage diverse reasoning paths.
Extract the final answer from each response.
Choose the answer that appears most frequently (majority vote).

Example (Conceptual):

Prompt: (Using Few-Shot CoT for a complex logic puzzle)

Response 1: Reasoning path A -> Final Answer: X

Response 2: Reasoning path B -> Final Answer: Y

Response 3: Reasoning path C -> Final Answer: X

Response 4: Reasoning path D -> Final Answer: X

Response 5: Reasoning path E -> Final Answer: Z

Self-Consistency Result: The final answer X is chosen because it appeared most often (3 out of 5 times).

%%{ init: { 'theme': 'base', 'themeVariables': { 'primaryColor': '#EDE9FE', 'primaryTextColor': '#5B21B6', 'lineColor': '#A78BFA', 'textColor': '#1F2937', 'fontSize': '18px' }}}%%
graph TD
    A["User Prompt (often with CoT setup)"] --> B(Generate Multiple Responses <br><i>Higher Temperature</i>);
    B --> C{Response 1: <br> Reasoning A -> Answer X};
    B --> D{Response 2: <br> Reasoning B -> Answer Y};
    B --> E{Response ...};
    B --> F{Response N: <br> Reasoning Z -> Answer X};
    C --> G[Extract Final Answers];
    D --> G;
    E --> G;
    F --> G;
    G --> H{"Aggregate & Count Answers <br><i>(X: 2, Y: 1, ...)</i>"};
    H --> I["Select Most Frequent Answer <br><i>(Majority Vote)</i>"];
    I --> J((🏆 Final Chosen Answer));

    style A fill:#FFFBEB,stroke:#FBBF24,stroke-width:2px
    style B fill:#E0F2FE,stroke:#7DD3FC,stroke-width:1px
    style C fill:#F3E8FF,stroke:#C4B5FD,stroke-width:1px
    style D fill:#F3E8FF,stroke:#C4B5FD,stroke-width:1px
    style E fill:#F3E8FF,stroke:#C4B5FD,stroke-width:1px
    style F fill:#F3E8FF,stroke:#C4B5FD,stroke-width:1px
    style H fill:#FEFCE8,stroke:#FDE047,stroke-width:1px
    style I fill:#DCFCE7,stroke:#86EFAC,stroke-width:1px
    style J fill:#10B981,stroke:#059669,color:#fff

Note: This technique requires more computational resources as you generate multiple outputs for a single prompt, but it can significantly boost accuracy on challenging reasoning tasks.

3. Retrieval-Augmented Generation (RAG)

Concept: RAG addresses the limitation that LLMs only know information present in their training data (which can be outdated or lack specific domain knowledge). RAG connects the LLM to an external, up-to-date knowledge source (like a database, document collection, or API). When a prompt is received, the system first retrieves relevant information from the external source and then augments the original prompt with this retrieved context before sending it to the LLM to generate the final response.

Why it works: It grounds the LLM’s response in specific, current, or proprietary information, reducing hallucinations and allowing it to answer questions beyond its training data.

How it works (Simplified Flow):

User Prompt: User asks a question (e.g., “What were our company’s Q1 sales figures?”).
Retrieval: The system identifies keywords in the prompt (“Q1 sales figures”) and searches an external knowledge base (e.g., internal sales database/reports). It retrieves the most relevant documents or data snippets.
Augmentation: The original prompt is combined with the retrieved information. (e.g., “Based on the following data: [Retrieved Q1 sales data snippet], what were our company’s Q1 sales figures?”).
Generation: This augmented prompt is sent to the LLM, which generates an answer based on the provided context.

Prompt Engineering Role: While much of RAG involves system architecture (vector databases, retrieval algorithms), prompt engineering is crucial for:

Formulating the initial query to the retrieval system.
Structuring the augmented prompt sent to the LLM, clearly instructing it to use the provided context.

%%{ init: { 'theme': 'base', 'themeVariables': { 'primaryColor': '#EDE9FE', 'primaryTextColor': '#5B21B6', 'lineColor': '#A78BFA', 'textColor': '#1F2937', 'fontSize': '18px' }}}%%
graph TD
    A[User Prompt] --> B(Identify Keywords / Intent);
    B --> C["Query External Knowledge Source <br><i>(e.g., Database, Docs)</i>"];
    C -- Retrieved Relevant Context --> D(Augment Prompt <br><i>Original Prompt + Context</i>);
    D --> E[Send Augmented Prompt to LLM];
    E --> F[LLM Generates Response <br><i>Grounded in Provided Context</i>];
    F --> G((💬 Final Answer));

    subgraph Retrieval System
        C
    end
    subgraph LLM Interaction
        E
        F
    end

    style A fill:#FFFBEB,stroke:#FBBF24,stroke-width:2px
    style C fill:#DBEAFE,stroke:#93C5FD,stroke-width:1px
    style D fill:#FEFCE8,stroke:#FDE047,stroke-width:1px
    style E fill:#F3E8FF,stroke:#C4B5FD,stroke-width:1px
    style F fill:#DCFCE7,stroke:#86EFAC,stroke-width:1px
    style G fill:#10B981,stroke:#059669,color:#fff

Example Prompt (for the Generation step in RAG):

### Context from Knowledge Base
Document: Q1_Sales_Report.pdf
Content: "Total Q1 revenue reached $1.2M, driven primarily by strong performance in the North region ($0.5M) and the new product line Alpha ($0.3M)."

### User Question
What were our company's Q1 sales figures, and what drove them?

### Task
Based *only* on the provided context from the knowledge base, answer the user's question concisely.

Output from GPT-4o Mini:

Plaintext

The company's Q1 sales reached $1.2M, primarily driven by strong performance in the North region ($0.5M) and the new product line Alpha ($0.3M).

4. System Messages

Concept: Many chat-based models (like GPT-4 via API, Claude) allow the use of a special “system” message. This message sets the context, rules, persona, or instructions for the AI that should persist throughout the entire conversation, separate from the user’s turn-by-turn prompts.

Why it works: It provides a persistent set of instructions that the model is designed to adhere to more strongly than regular user prompts. This is useful for defining the AI’s overall behavior, personality, capabilities, or constraints without repeating them in every user message.

How to implement: Provided as a separate input field or message role in the API call or playground interface, distinct from the “user” and “assistant” roles.

Example (API Structure):

JSON

[
  {
    "role": "system",
    "content": "You are 'CodeBot', an expert Python programming assistant. You provide concise code examples and explain them clearly. You refuse to answer questions unrelated to programming. Always format code using Markdown code blocks."
  },
  {
    "role": "user",
    "content": "How do I write a function to calculate a factorial in Python?"
  },
  {
    "role": "assistant",
    "content": "```python\ndef factorial(n):\n  \"\"\"Calculates the factorial of a non-negative integer.\"\"\"\n  if n < 0:\n    return \"Factorial not defined for negative numbers\"\n  elif n == 0:\n    return 1\n  else:\n    res = 1\n    for i in range(1, n + 1):\n      res *= i\n    return res\n\n# Example usage:\n# print(factorial(5))  # Output: 120\n```\nThis function first checks for negative input. If the input is 0, it returns 1. Otherwise, it iteratively multiplies numbers from 1 up to n to compute the factorial."
  },
  {
    "role": "user",
    "content": "What's the weather like today?"
  },
   {
    "role": "assistant",
    "content": "I am CodeBot, designed to assist with programming questions. I cannot provide weather information."
  }
]

(The system message ensures the bot maintains its persona, focuses on programming, uses correct formatting, and handles off-topic requests appropriately throughout the conversation.)

5. Function Calling / Tool Use

Concept: This advanced feature allows developers to define external functions or tools (like APIs for weather, calculators, databases, booking systems) that the LLM can choose to “call” during its response generation. The model doesn’t execute the function itself; instead, it generates a structured output (usually JSON) indicating which function to call and what arguments to use. The developer’s code then executes the actual function and sends the result back to the model, which uses that information to formulate its final answer to the user.

Why it works: It allows LLMs to interact with the real world, access real-time data, perform precise calculations, or take actions, overcoming limitations of their static training data.

%%{ init: { 'theme': 'base', 'themeVariables': { 'primaryColor': '#EDE9FE', 'primaryTextColor': '#5B21B6', 'lineColor': '#A78BFA', 'textColor': '#1F2937', 'fontSize': '18px' }}}%%
graph TD
    A["User Prompt <br><i>(e.g., <b>What's weather in London?</b>)</i>"] --> B{LLM Analyzes Prompt & Available Functions};
    B -- Function Needed --> C["Model Generates JSON <br><i>{func: 'get_weather', args: {'city': 'London'}}</i>"];
    B -- No Function Needed --> D[Model Generates Standard Text Response];
    C --> E[Application Code Parses JSON];
    E --> F["Execute External Function/API Call <br><i>(e.g., Call Weather API)</i>"];
    F -- Function Result <br><i>(e.g., <b>15°C, cloudy</b>)</i> --> G[Send Result Back to LLM];
    G --> H[LLM Uses Result to Formulate Final Answer];
    H --> I(("💬 Final Response <br><i><b>The weather in London is 15°C...</b></i>"));
    D --> I;

    subgraph Application / Backend
        E
        F
    end
    subgraph LLM Interaction
        B
        C
        G
        H
    end

    style A fill:#FFFBEB,stroke:#FBBF24,stroke-width:2px
    style C fill:#FEF2F2,stroke:#FCA5A5,stroke-width:1px,color:#B91C1C
    style F fill:#DBEAFE,stroke:#93C5FD,stroke-width:1px
    style G fill:#FEFCE8,stroke:#FDE047,stroke-width:1px
    style H fill:#DCFCE7,stroke:#86EFAC,stroke-width:1px
    style I fill:#10B981,stroke:#059669,color:#fff

How it works (Simplified Flow):

User Prompt: User asks something requiring external data or action (e.g., “What’s the weather in London and book me a flight there for tomorrow?”).
Model Decision: The LLM, based on the prompt and the list of available functions provided by the developer, decides one or more functions are needed. It generates JSON specifying the function calls (e.g., {"function": "get_weather", "arguments": {"city": "London"}} and {"function": "book_flight", "arguments": {"destination": "London", "date": "tomorrow"}}).
Execution: The developer’s application code parses this JSON, calls the actual get_weather API and book_flight system, and gets the results (e.g., “Weather in London is 15°C and cloudy”, “Flight booked successfully, confirmation #123”).
Response Generation: The results are sent back to the LLM.
Final Answer: The LLM uses the function results to generate a natural language response to the user (e.g., “The weather in London is currently 15°C and cloudy. I have successfully booked your flight for tomorrow; your confirmation number is 123.”).

Prompt Engineering Role: While heavily dependent on API integration, the initial prompt and the descriptions of the available functions provided to the model are crucial for enabling it to correctly identify when and how to use the tools.

Example (Function Description provided to the model):

JSON

{
  "name": "get_stock_price",
  "description": "Get the current stock price for a given ticker symbol.",
  "parameters": {
    "type": "object",
    "properties": {
      "ticker_symbol": {
        "type": "string",
        "description": "The stock ticker symbol (e.g., AAPL, GOOG)."
      }
    },
    "required": ["ticker_symbol"]
  }
}

(Clear descriptions help the model understand what the tool does and what information it needs.)

Summary

Technique	Concept	Key Implementation / Example Snippet
Chain-of-Thought (CoT)	Encourage step-by-step reasoning before the final answer to improve accuracy on complex tasks.	Add phrases like `"Let's think step by step."` (Zero-Shot) or provide examples showing reasoning (Few-Shot).
Self-Consistency	Generate multiple reasoning paths (often using CoT) for the same prompt and choose the most frequent answer.	Generate multiple responses (higher temperature), extract final answers, perform majority vote.
Retrieval-Augmented Generation (RAG)	Ground LLM responses in external, up-to-date knowledge by retrieving relevant information first, then generating based on it.	User Prompt → Retrieve Info → Augment Prompt with Context → Generate Answer based on Context.
System Messages	Set persistent context, rules, or persona for the AI throughout a conversation using a dedicated message role.	Use the `"system"` role in API calls (e.g., `{"role": "system", "content": "You are..."}`).
Function Calling / Tool Use	Allow the LLM to request calls to external functions/APIs to access real-time data or perform actions.	Define available functions; model outputs JSON requesting a call; application executes; result fed back to model.

Advanced prompting techniques like Chain-of-Thought, Self-Consistency, RAG, System Messages, and Function Calling allow prompt engineers to elicit more accurate, reliable, context-aware, and capable responses from AI models. CoT and Self-Consistency enhance reasoning, RAG grounds answers in external knowledge, System Messages provide persistent control, and Function Calling enables interaction with external tools and data. Mastering these techniques significantly expands the possibilities for building sophisticated AI applications.

Practical Exercises

CoT Practice: Take a multi-step arithmetic or logic word problem. First, try solving it with a standard prompt. Then, rewrite the prompt using Zero-Shot CoT (“Let’s think step by step.”) and compare the results.
RAG Simulation: Find a short news article online. Write a prompt that asks the AI a specific question answerable only from that article. Structure the prompt as if RAG was used: include a section ### Context from Article where you paste the article text, and then instruct the AI Based only on the provided context, answer the following question: [Your Question].
System Message Design: Define a persona and specific rules for a chatbot (e.g., a Sarcastic Historian who only answers questions about the Roman Empire). Write a system message encapsulating this. Then, write a few user prompts (some on-topic, some off-topic) and predict how the AI should respond based on the system message.
Function Call Idea: Think of a task where an LLM would need external information (e.g., checking product availability, finding nearby restaurants, converting currency). Describe the function the LLM would need to “call” – what would you name it, what information would it need (arguments), and what would it return?

In the next chapter of the course, we will focus on Domain-Specific Prompting, exploring how to tailor these general and advanced techniques for specific applications like code generation, creative writing, and image synthesis.

External Sources:

IBM – What is Chain-of-Thought (CoT) prompting?: https://www.ibm.com/think/topics/chain-of-thoughts
Prompt Engineering Guide – Function Calling: https://www.promptingguide.ai/applications/function_calling
Pinecone – Retrieval Augmented Generation (RAG): https://www.pinecone.io/learn/retrieval-augmented-generation/

Chapter 4: Advanced Prompting Techniques

1. Chain-of-Thought (CoT) Prompting

2. Self-Consistency

3. Retrieval-Augmented Generation (RAG)

4. System Messages

5. Function Calling / Tool Use

Summary

Practical Exercises

External Sources:

Related Posts

Leave a Comment Cancel Reply