Prompt Engineering¶

Concepts¶

Température:
- bouton magique qui change si les réponses sont plus prévisibles
- température basse = toujours dire la même chose
- température élevée = réponses plus amusantes et différentes
Top P:
- = Top Percentage
- choisit parmi les mots les plus importants pour la réponse
- rend les réponses meilleures et plus compréhensibles

Exemple:

Température	Top P	Completion
Faible	Faible	Le ciel est bleu et clair.
Elevée	Faible	Le ciel est rempli de couleurs éclatantes et de magie.
Faible	Elevée	Le ciel est bleu, clair, et les nuages flottent doucement.
Elevée	Elevée	Le ciel est une toile infinie de rêves où les étoiles dansent avec les licornes.

My Prompt Structure¶

Structure:

Question | Task:
Context:
Rules:
[Expected Result:]
[Data Formatting | Plan:]

Example:

Task:
Can you explain the fundamental concepts, practical applications, and key differences between quantum computing and classical computing in simple terms?

Context:
You are seeking an understanding of quantum computing. This involves the basic principles, its distinctions from classical computing, and how it is applied in the real world.

Rules:
Focus on a simplified and beginner-friendly explanation of quantum computing.
Highlight key concepts such as qubits, superposition, entanglement, and quantum gates.
Compare it with classical computing in a concise manner.
Include examples of real-world applications.
Ensure the explanation avoids heavy technical jargon while being informative.

Plan:
Introduction to what quantum computing is.
Explanation of core principles.
Differences from classical computing.
Real-world applications.
Wrap-up summary.

OpenAI Prompt Structure¶

For GPT-4.1 (non-reasoning model)¶

Structure:

# Role and Objective
# Instructions
## Sub-categories for more detailed instructions
# Reasoning Steps
# Output Format
# Examples
## Example 1
# Context
# Final instructions and prompt to think step by step

Délimiteurs:
- Markdown, H1 à H4 => point de départ
- XML, adapté pour bien marquer le début/fin + le modèle comprend les attributs de balises
- JSON, fonctionne mais overhead pour échapper les caractères

Techniques¶

Application Domains¶

The different prompt engineering techniques can be classified in different application domains.
Here are the main domains according to the task we want to accomplish:
- New Tasks without Extensive Training
- Reasoning and Logic
- Reduce Hallucination
- User Interaction
- Fine-tuning and Optimization
- Knowledge-based Reasoning and Generation
- Improving Consistency and Coherence
- Managing Emotion and Tone
- Understanding User Intent
- Metacognition and Self-Reflection

New Tasks without Extensive Training¶

Goal: Leverage the current capabilities and knowledge of the model as is to accomplish tasks.
Techniques:
- Zero-shot prompting
- Few-shot prompting

Zero-shot prompting¶

Description¶

Zero-shot prompting offers a paradigm shift in leveraging large LLMs. This technique removes the need for extensive training data, instead relying on carefully crafted prompts that guide the model toward novel tasks. Specifically, the model receives a task description in the prompt but lacks labeled data for training on specific input-output mappings. The model then leverages its pre-existing knowledge to generate predictions based on the given prompt for the new task.

Relevance¶

Zero-Shot Prompting is ideal for tasks where the model is expected to perform a task without any prior examples or specific training on that task. It is especially useful in open-ended tasks like generating text, answering questions, or performing simple tasks such as translation or summarization. In this case, the model uses its general knowledge to complete the task based solely on the instructions provided in the prompt.

Example of problem¶

Imagine you want to translate the phrase "Hello, world!" into French without providing any example translations in the prompt.

Why it’s a good candidate for this problem¶

Zero-shot prompting leverages the pre-trained knowledge of large language models (LLMs) without requiring additional context or examples, making it efficient and versatile for broad applications.

Example¶

Prompt

Classify the text into neutral, negative or positive. 
Text: I think the vacation is okay.
Sentiment: ?

Output

Neutral

Few-Shot Prompting¶

Description¶

Few-shot prompting provides models with a few input-output examples to induce an understanding of a given task, unlike zero-shot prompting, where no examples are supplied. Providing even a few high-quality examples has improved model performance on complex tasks compared to no demonstration. However, few-shot prompting requires additional tokens to include the examples, which may become prohibitive for longer text inputs. Moreover, the selection and composition of prompt examples can significantly influence model behavior, and biases like favoring frequent words may still affect few-shot results. While few-shot prompting enhances capabilities for complex tasks, especially among large pre-trained models like GPT-3, careful prompt engineering is critical to achieve optimal performance and mitigate unintended model biases.

Relevance¶

Few-Shot Prompting is designed for tasks where the model needs to learn from a few examples to perform a specific task. It is especially useful for situations where providing labeled data for training is impractical. This technique is suited for tasks like text classification, translation, or even question-answering, where a few high-quality examples can demonstrate how to handle the problem.

Example of problem¶

Suppose you want a model to classify movie reviews as positive or negative. Providing labeled examples of reviews can guide the model.

Why it’s a good candidate for this problem¶

Few-shot prompting allows the model to observe patterns from the provided examples and generalize to new inputs, enabling better performance on tasks where labeled data is available in small quantities.

Example¶

Prompt

Translate these English words to French:
- Dog: Chien
- Cat: Chat
- Bird: Oiseau
- Apple:

Output

Apple: Pomme

Reasoning and Logic¶

Goal: Teach the model some logic and example in order to give a path of self-thinking
Techniques:
- Chain-of-Thought (CoT) Prompting
- Automatic Chain-of-Thought (Auto-CoT)
- Self-Consistency
- Logical CoT (LogiCoT) Prompting
- Chain-of-Symbol (CoS) Prompting
- Tree-of-Thoughts (ToT) Prompting
- Graph-of-Thought (GoT) Prompting
- System 2 Attention Prompting
- Thread of Thought (ThoT) Prompting
- Chain of Table Prompting

Chain-of-Thought (CoT) Prompting¶

Description¶

LLMs often stumble in the face of complex reasoning, limiting their potential. Aiming to bridge this gap Wei introduced Chain-of-Thought (CoT) prompting as a technique to prompt LLMs in a way that facilitates coherent and step-by-step reasoning processes. The primary contribution lies in the proposal and exploration of CoT prompting, demonstrating its effectiveness in eliciting more structured and thoughtful responses from LLMs compared to traditional prompts. Through a series of experiments, the authors showcase the distinctive qualities of CoT prompting, emphasizing its ability to guide LLMs through a logical reasoning chain. This results in responses that reflect a deeper understanding of the given prompts. For example, the prompt would show the reasoning process and final answer for a multi-step math word problem and mimic how humans break down problems into logical intermediate steps. The authors achieved state- of-the-art performance in math and commonsense reasoning benchmarks by utilizing CoT prompts for PaLM 540B, achieving an accuracy of 90.2%.

Relevance¶

Chain-of-Thought (CoT) is designed for problems that require a structured, step-by-step approach to reach a solution. It is particularly useful for complex reasoning tasks, such as multi-step math problems, logical reasoning challenges, and tasks requiring detailed explanations, where breaking down the problem into intermediate steps improves understanding and accuracy.

Example of problem¶

You want the model to calculate the total cost of items in a shopping cart while including taxes and discounts step by step.

Why it’s a good candidate for this problem¶

CoT prompting helps the model articulate intermediate steps, reducing errors and making its reasoning more transparent. This aligns well with tasks where step-by-step reasoning is essential to achieve the correct solution.

Example¶

Prompt

Jack has 3 baskets, and each basket contains 5 apples.
How many apples does Jack have in total?
- First, let's calculate how many apples are in one basket: 5 apples.
- Next, we multiply the number of baskets by the number of apples per basket: 3 * 5 = 15.
- Therefore, Jack has 15 apples in total.

Jones take flight at 4pm to reach Bali who's 20 flight hours far.
When will Jones land at Denpasar, Bali ?

Output

- First, the flight departs at 4 PM.
- The flight duration is 20 hours.
- To calculate the landing time, we add 20 hours to the departure time.
- 4 PM + 12 hours brings us to 4 AM (next day).
- Adding the remaining 8 hours: 4 AM + 8 hours = 12 PM (noon, next day).
- Therefore, Jones will land at 12 PM (noon) the following day.

Automatic Chain-of-Thought (Auto-CoT)¶

Description¶

Manual creation of high-quality CoT examples is both time-consuming and suboptimal. Zhang introduced Auto-CoT to automatically instruct LLMs with a "Let’s think step-by-step" prompt to generate reasoning chains. Recognizing the possibility of errors in individually generated chains, Auto-CoT enhances robustness through diverse sampling. It samples various questions and generates multiple distinct reasoning chains for each, forming a final set of demonstrations. This automated diverse sampling minimizes errors and enhances few-shot learning, eliminating the need for labor-intensive manual creation of reasoning chains. Auto-CoT demonstrated enhanced performance, surpassing the CoT paradigm with average accuracy improvements of 1.33% and 1.5% on arithmetic and symbolic reasoning tasks, respectively, employing GPT-3.

Relevance¶

Automatic Chain-of-Thought (Auto-CoT) is suited for tasks requiring step-by-step reasoning but where manually crafting examples is impractical or time-consuming. It is ideal for problems where multiple reasoning paths need to be explored, such as complex arithmetic, logical problems, or decision-making tasks, where the model can benefit from generating reasoning steps automatically.

Example of problem¶

You need the model to evaluate a student's multi-part math test, ensuring it provides detailed reasoning for partial credit grading.

Why it’s a good candidate for this problem¶

Auto-CoT automates the generation of reasoning steps using zero-shot techniques like "Let's think step by step" prompts, which minimizes manual effort while leveraging the model's capabilities to create reasoning exemplars dynamically.

Example¶

Prompt

Let's think step by step. Jones takes a flight at 4 PM, and the flight duration is 20 hours. When will Jones land at Denpasar, Bali?

Output

Let's solve this step by step using only the flight duration, which is 20 hours, and the departure time of 4 PM.

Step 1: Start with the departure time.
Jones departs at 4 PM.

Step 2: Add the flight duration.
The flight duration is 20 hours. Adding 20 hours to 4 PM involves first counting the hours to reach midnight (12 AM), and then adding the remaining hours.

From 4 PM to midnight: 8 hours.
Now, we have 12 hours left to add (since 20 hours - 8 hours = 12 hours).
So, adding the remaining 12 hours to midnight:

12 AM (midnight) + 12 hours = 12 PM (noon) the next day.

Final Answer:
Jones will land at 12 PM (noon) the next day in the local time zone of the departure location, assuming no time zone changes. If there is a time zone difference, we would need to factor that in separately, but with this information alone, Jones lands at noon the next day.

Self-Consistency¶

Description¶

Wang introduced self-consistency, a decoding strategy enhancing reasoning performance compared to greedy decoding in CoT prompting. For complex reasoning tasks with multiple valid paths, self-consistency generates diverse reasoning chains by sampling from the language model’s decoder. It then identifies the most consistent final answer by marginalizing these sampled chains. This approach capitalizes on the observation that problems requiring thoughtful analysis often entail greater reasoning diversity, leading to a solution. The combination of self-consistency and chain-of-thought prompting results in significant accuracy improvements across various benchmarks, such as 17.9% on GSM8K, 11.0% on SVAMP, 12.2% on AQuA, 6.4% on StrategyQA, and 3.9% on ARC-challenge compared to the baseline chain-of-thought prompting.

Relevance¶

Self-Consistency is best suited for tasks that involve complex reasoning with multiple possible valid paths to a solution. This technique is effective in scenarios where diverse reasoning chains can lead to different yet potentially correct answers, such as solving ambiguous questions, mathematical problems, or logical puzzles that allow multiple approaches.

Example of problem¶

You want the model to determine the most probable explanation for a series of events based on incomplete information.

Why it’s a good candidate for this problem¶

Self-consistency involves sampling multiple reasoning chains for the same prompt and selecting the most consistent final answer. This reduces the likelihood of errors from a single flawed reasoning path, increasing reliability for tasks where diverse reasoning can occur.

Example¶

Prompt

I want you to solve the following math problem using step-by-step reasoning and then check your solution by generating multiple different reasoning paths. Finally, provide the most consistent answer across all attempts.

Here is the problem:
Sarah has 15 apples. She gives 3 apples to each of her 4 friends. How many apples does Sarah have left?

Output

Let's break down the problem step by step:

Initial Information: Sarah has 15 apples.

Action: Sarah gives 3 apples to each of her 4 friends.

Calculation of total apples given:
Sarah gives 3 apples to each of 4 friends, so the total apples given is:  3×4=12 apples.

Apples left with Sarah:
Sarah started with 15 apples and gave away 12, so the number of apples she has left is: 15−12=3 apples.

Multiple Reasoning Paths:

Path 1:
Sarah has 15 apples.
She gives 3 apples to 4 friends: 3×4=12 apples.
Apples left: 15−12=3.

Path 2:
Sarah starts with 15 apples.
She gives away 3+3+3+3=12 apples.
Apples left: 15−12=3.

Path 3:
Sarah has 15 apples.
After giving 3 apples to each friend, the total given away is 3×4=12.
Subtract from the total apples: 15−12=3.

Final Answer (Most Consistent):
Across all reasoning paths, the most consistent answer is that Sarah has 3 apples left.

Logical CoT (LogiCoT) Prompting¶

Description¶

The ability to perform logical reasoning is critical for LLMs to solve complex, multi-step problems across diverse domains. Existing methods, like CoT prompting, encourage step-by-step reasoning but lack effective verification mechanisms. Zhao proposes a Logical Chain-of-Thought (LogiCoT) prompting, a neurosymbolic framework that leverages principles from symbolic logic to enhance reasoning in a coherent and structured manner. Specifically, LogiCoT applies the concept of reductio ad absurdum to verify each step of reasoning generated by the model and provide targeted feedback to revise incorrect steps. LogiCoT can reduce logical errors and hallucinations through a think-verify-revise loop. Experimenting with Vicuna-33b and GPT-4, the findings underscore LogiCoT’s notable enhancement of reasoning abilities, exhibiting improvements of 0.16% and 1.42% on the GSM8K dataset and 3.15% and 2.75% on the AQuA dataset compared to CoT, respectively.

Relevance¶

Logical Chain-of-Thought (LogiCoT) is designed for problems requiring rigorous logical reasoning and verification at each step. It excels in scenarios where each intermediate step needs validation for correctness before progressing, making it suitable for solving mathematical proofs, logic puzzles, and problems where errors in reasoning need to be minimized through step-by-step validation.

Example of problem¶

You need the model to evaluate whether a proposed experiment's setup logically supports its hypothesis.

Why it’s a good candidate for this problem¶

LogiCoT incorporates a “think-verify-revise” loop that applies principles of symbolic logic to validate each reasoning step. This minimizes logical errors and hallucinations, making it a strong candidate for tasks requiring high reasoning accuracy.

Example¶

Prompt

Passage: A major health insurance company in Lagolia
pays for special procedures prescribed by physicians only
if the procedure is first approved as medically necessary
by a company-appointed review panel. The rule is
intended to save the company the money it might
otherwise spend on medically unnecessary procedures.
The company has recently announced that in order to
reduce its costs, it will abandon this rule.

Question: Which of the following, if true, provides the
strongest justification for the company's decision?

A.The review process is expensive and practically always
results in approval of the prescribed procedure.
B. Patients oFen register dissatisfaction with physicians
who prescribe nothing for their ailments.
C. The number of members of the company-appointed
review panel who review a given procedure depends on
the cost of the procedure.
D. Physicians often prescribe special procedures that are
helpful but not altogether necessary for the health of the
patient.

Answer and reason:

Output

The correct answer is A: "The review process is expensive and practically always results in approval of the prescribed procedure."

Reason:
The company's initial rule was to have a review panel decide whether a procedure is medically necessary before agreeing to pay for it. This rule was put in place to avoid paying for unnecessary procedures and thus save costs. However, if the review process is both expensive and almost always approves the procedures anyway, the rule no longer serves its cost-saving purpose. It would then make sense for the company to eliminate this process to reduce the overall expenses, since the rule isn’t significantly reducing unnecessary spending but is still incurring high review costs. Therefore, this option provides the strongest justification for abandoning the rule.

Chain-of-Symbol (CoS) Prompting¶

Description¶

LLMs often struggle with tasks involving complex spatial relationships due to their reliance on natural language, which is susceptible to ambiguity and biases. To overcome this limitation, Hu introduced CoS, employing condensed symbols instead of natural language. CoS provides distinct advantages: clear and concise prompts, heightened spatial reasoning for LLMs, and improved human interpretability. CoS suffers from challenges such as scalability, generalizability, integration with other techniques, and interpretability of LLM reasoning based on symbols. Notably, the implementation of CoS significantly elevates ChatGPT’s performance, boosting accuracy from 31.8% to an impressive 92.6% on Brick World tasks. Moreover, CoS achieves up to a 65.8% reduction in prompt tokens, streamlining the process while maintaining high accuracy.

Relevance¶

The Chain-of-Symbols (CoS) technique is ideal for problems involving spatial relationships, abstract reasoning, or symbolic manipulation. Instead of relying solely on natural language, it uses symbolic representations to express relationships and solve tasks. This method reduces ambiguity and biases that can arise from language-based instructions, making it well-suited for tasks requiring clarity in structure, such as solving geometric problems or logical puzzles.

Example of problem¶

You need the model to analyze a flowchart and determine the outcome of a series of decisions based on given conditions.

Why it’s a good candidate for this problem¶

CoS uses symbols or structured representations instead of natural language to improve clarity and reduce ambiguity in reasoning tasks. This ensures precision and interpretability, particularly for spatially complex or highly structured problems.

Example¶

Prompt

You have three shapes: a square, a cross, and a triangle. Those shapes should be placed on a circle or at the center of the circle.

The square is placed to the left of the cross, and the triangle is above the square so that the square is at the center of the circle:
- The square is S
- The cross is C
- The triangle is T

Here is the spatial representation:
T
|
S-C

Question: S remaining at the center of the circle, rotate the above counterclockwise.

Output

Let's break down the problem and rotate the shape arrangement counterclockwise.

Initially, the arrangement is:
T
|
S - C
In this arrangement:

The square (S) is in the center of the circle.
The triangle (T) is above the square.
The cross (C) is to the right of the square.
Now, when rotating the entire setup counterclockwise by 90 degrees, we can expect the following:

The cross (C), which was initially to the right of the square, will move to the top of the square.
The triangle (T), which was initially above the square, will move to the left of the square.
The new arrangement after rotating counterclockwise by 90 degrees will be:

    C
    |
T - S
So, the triangle (T) is to the left, the square (S) is still at the center, and the cross (C) is above the square.

Tree-of-Thoughts (ToT) Prompting¶

Description¶

Yao and Long proposed the Tree-of- Thoughts (ToT) framework to enhance prompting capabilities for complex tasks requiring exploration and look-ahead reasoning. ToT extends CoT prompting by managing a tree structure of intermediate reasoning steps, known as "thoughts". Each thought represents a coherent language sequence moving toward the final solution. This structure allows language models to deliberately reason by assessing the progress generated by thoughts in solving the problem. ToT integrates the model’s abilities to produce and evaluate thoughts with search algorithms like breadth-first or depth-first search. This enables systematic exploration among reasoning chains, with a look-ahead to expand promising directions and to backtrack when solutions are incorrect. ToT excelled in the Game of 24 tasks, achieving a 74% success rate compared to CoT’s 4%. Additionally, in word-level tasks, ToT outperformed CoT with a 60% success rate versus 16%.

Relevance¶

The Tree-of-Thoughts (ToT) technique is best suited for problems requiring complex reasoning, exploration of multiple possibilities, and iterative planning. It allows the model to explore a variety of paths, much like a tree branching into different directions, before choosing the optimal one. This is especially useful in tasks like game-solving, logical reasoning, or tasks requiring multiple solution paths before arriving at a final conclusion.

Summary of differences with Self Consistency:

Structure: Self Consistency is an aggregation of multiple linear reasonings, while Tree of Thoughts uses a tree structure to explore different branches of reasoning.

Response selection: Self Consistency is based on the frequency of coherent responses, while ToT prunes less promising branches and deepens the more promising ones.

Use cases: Self Consistency is suitable for tasks where a coherent response naturally emerges from multiple attempts, whereas ToT is more suited for complex planning tasks requiring flexibility in exploring multiple possible steps.

Example of problem¶

You want the model to determine the optimal series of moves in a puzzle game where players must arrange tiles to reach a target configuration.

Why it’s a good candidate for this problem¶

ToT creates a tree structure of possible reasoning paths, evaluating progress at each step and allowing backtracking when needed. This iterative exploration ensures comprehensive evaluation and prioritization of promising paths.

Example¶

Prompt

You are a brave knight who must reach the castle by facing various challenges.
At each stage of your journey, you must choose one of three possible paths:

Path 1 contains a treasure, but it may be trapped.
Path 2 has a dragon that you can fight or avoid.
Path 3 seems safe but extends your journey.

Use a step-by-step approach to decide which path to take at each point. Develop several ideas for each stage, evaluate the options, and choose the one that gets you closest to the castle most effectively while minimizing risks. Keep exploring the different branches until you find the best solution.

Output

Ah, the life of a brave knight! Each choice along the journey to the castle demands careful consideration. Let’s approach this step by step, thinking through each path at various points in our quest. We will weigh the pros and cons of every option, and at each stage, decide on the most strategic course to take.

Stage 1: Initial Fork in the Road

Path 1: Treasure, possibly trapped
Pros: Treasure could provide helpful tools or resources, such as weapons, armor, or healing potions.
Cons: A trapped treasure could result in injury or worse, which would severely hinder the rest of the journey.
Assessment: A gamble. We would need to decide whether to risk an early setback for the chance of future rewards.

Path 2: Dragon
Pros: Fighting the dragon and winning could earn us honor, treasure, or even a powerful ally. Avoiding the dragon may save resources, but there’s still a risk of being noticed.
Cons: Facing a dragon too soon may be dangerous without better equipment. Avoidance is a safer route but might involve sneaking, which could be risky if detected.
Assessment: Potential for great reward but high risk. We need to evaluate our strength and the situation.

Path 3: Safe but longer
Pros: This is the safest option and allows us to preserve our health and resources for later challenges.
Cons: Time is extended, and while it reduces immediate danger, the longer journey could introduce other, unforeseen challenges.
Assessment: Safe but may delay reaching the castle, potentially exposing us to exhaustion or more encounters.

Decision for Stage 1:
I believe Path 3 (the safe but longer route) would be the wisest choice at this early stage. It allows us to stay strong and conserve our energy for future challenges. We can afford to be cautious while we assess the dangers ahead.


Stage 2: A New Fork after the Safe Route

Path 1: Treasure again, possibly trapped
Pros: By now, we might have picked up clues or tools to help us identify or disarm traps. The treasure could provide critical resources.
Cons: The threat of traps remains, and encountering a trap could hinder progress.
Assessment: We are better prepared for this now but should still be cautious.

Path 2: Another dragon encounter
Pros: We have traveled further, so we may be stronger or have gained knowledge. Defeating or evading a dragon could bring benefits or open a shorter path to the castle.
Cons: Still a high-risk situation. If we’re not prepared to fight or evade, it could end the journey early.
Assessment: Likely risky, but this time we are better equipped to consider it if necessary.

Path 3: Another safe but long road
Pros: It keeps us safe, allowing us to conserve strength for the final push to the castle.
Cons: The journey becomes even longer, and fatigue could start setting in.
Assessment: A good choice to maintain energy, but we should be mindful of diminishing returns.

Decision for Stage 2:
At this point, I would choose Path 1, as we are better prepared to handle traps. The treasure could provide a strategic advantage for the final stages, potentially giving us crucial items. If we fail to disarm a trap, it may be less disastrous now compared to facing a dragon unprepared.


Stage 3: Approaching the Castle

Path 1: Treasure near the castle, again with potential traps
Pros: The treasure here could be especially valuable, containing powerful items for the final confrontation or healing before reaching the castle.
Cons: With the castle in sight, any injury from a trap would be devastating.
Assessment: A big risk this close to the goal. Caution is crucial.

Path 2: A final dragon before the gates
Pros: Defeating the dragon at this stage could grant us safe passage into the castle or even result in the dragon helping us.
Cons: A final dragon fight is risky, especially if we’ve been weakened. Avoiding it may be even harder with the castle so close.
Assessment: Risky, but if we've gathered resources (from treasure earlier), we may be in a better position to fight.

Path 3: The safest but slowest route
Pros: Allows us to rest and approach the castle without immediate threat.
Cons: Time is now becoming a factor. Delaying too long could lead to other unforeseen dangers, or we might miss a crucial event at the castle.
Assessment: Safe, but now time might be more valuable than safety.

Decision for Stage 3:
Given the proximity to the castle and the resources we’ve gathered, I would choose Path 2 and confront the dragon. With the tools and experience from earlier stages, we have a chance to defeat or avoid the dragon and reach the castle in a more timely manner. A direct confrontation might be the most efficient way forward.


Final Outcome
By choosing a safe but steady approach early on, followed by taking calculated risks when we are better equipped, we ensure that we reach the castle in good condition while minimizing potential setbacks.

Graph-of-Thought (GoT) Prompting¶

Description¶

The inherent non-linear nature of human thought processes challenges the conventional sequential approach of CoT prompting. Yao introduced the "Graph of Thoughts" prompting, a graph-based framework advancing traditional sequential methods to better align with the non-linear characteristics of human thinking. This framework permits dynamic interplay, backtracking, and evaluation of ideas, allowing the aggregation and combination of thoughts from various branches, departing from the linear structure of the tree of thoughts. The key contributions encompass modeling the reasoning process as a directed graph, offering a modular architecture with diverse transformation operations. The framework is presented as a versatile and dynamic approach to language model prompting, capturing the intricacies of human thought processes and enhancing model capabilities. The GoT reasoning model demonstrates substantial gains over the CoT baseline, improving accuracy by 3.41% with T5-base and 5.08% with T5-large on GSM8K. It also boosts accuracy over the state-of-the-art Multimodal- CoT by 6.63% using T5-base and 1.09% with T5-large on ScienceQA.

Relevance¶

The Graph-of-Thought (GoT) technique is particularly useful for solving problems that require managing complex chains of reasoning with dependencies and interactions between multiple ideas or thought paths. It differs from traditional sequential approaches (such as "Chain-of-Thought") by exploring possible solutions more in parallel, like a graph, where the nodes represent ideas or steps, and the edges represent the relationships or transitions between them.

Summary of differences with Tree of Throught:

Structure and flexibility: ToT follows a rigid tree structure, while GoT allows for cross-connections in a more flexible graph structure.

Feedback and combination: GoT allows revisiting previous steps and combining multiple paths (enabling nodes or thoughts to connect not only sequentially but also transversally), unlike ToT, which follows a linear progression logic with pruning (each sub-step stems only from the preceding thought without reusing alternative branches).

Use cases: ToT is suitable for sequential or linear problems, while GoT is more powerful for tasks requiring interactions and combinations of multiple reasoning branches.

Example of problem¶

You need the model to evaluate how multiple factors (e.g., weather, location, and time of day) affect crop growth and suggest an optimal planting strategy.

Why it’s a good candidate for this problem¶

GoT organizes reasoning as a directed graph, allowing for branching, revisiting, and merging of ideas. This dynamic structure captures the complex interplay between factors and synthesizes insights from various reasoning paths.

Example¶

Prompt

Context: You are planning a trip to Europe, visiting 5 different cities (Paris, Berlin, Rome, Amsterdam, and Vienna). However, there are several constraints:

You want to minimize the total cost of the trip (plane tickets, hotels, etc.). You want to avoid long journeys between two consecutive cities (less than 3 hours of flight). You want to spend at least 2 days in each city. Some cities have specific events on certain dates (a festival in Berlin, an exhibition in Paris).

Instructions:

Break the problem down into distinct steps:

Start by identifying the costs and durations of possible trips between the cities. Explore several combinations of cities to visit in different orders, comparing costs and flight times. 

Formulate hypotheses for each step:

Node 1: If you start in Paris, what are the flight options to Berlin or Rome, and what are the associated costs?
Node 2: If you choose Berlin before Vienna, what impact does that have on transportation costs and flight times?
Node 3: If you stay 2 days in Amsterdam, can you still attend the festival in Berlin?

Explore the interactions between these hypotheses:

If you find an itinerary that minimizes costs but exceeds 3 hours of flight, explore another combination by adjusting the order of cities. If visiting Paris first allows you to see the exhibition, how does that influence the order of the following cities?

Test alternative routes:

Simultaneously explore itineraries by prioritizing cost, then another by respecting flight time constraints, or slightly adjusting dates for the events.

Adjustment and backtracking:

If a hypothesis leads to a conflict (e.g., budget overrun or date problem), go back and test another itinerary combination, while keeping information from previously explored routes.

Output

To approach this trip planning task, we'll break it down into distinct steps and test different hypotheses, ensuring we meet the outlined constraints (minimizing cost, staying within flight time limits, and ensuring event attendance).

Step 1: Identify Costs and Durations of Possible Trips Between Cities
Research Flights and Transport Costs
Start by researching average flight costs and durations between the five cities: Paris, Berlin, Rome, Amsterdam, and Vienna. We'll need to estimate:

Cost of flights: Use airline websites or aggregate platforms to gather the cheapest flights between these cities.
Flight duration: Ensure the flights between two consecutive cities are under 3 hours.
Estimated Costs and Durations:
From    To  Flight Time (hrs)   Avg Cost (USD)
Paris   Berlin  1h 50m  $90
Paris   Rome    2h 10m  $120
Paris   Amsterdam   1h 20m  $80
Paris   Vienna  2h 15m  $110
Berlin  Rome    2h 20m  $130
Berlin  Vienna  1h 10m  $70
Berlin  Amsterdam   1h 30m  $85
Rome    Vienna  1h 25m  $95
Rome    Amsterdam   2h 25m  $150
Vienna  Amsterdam   1h 40m  $100

Step 2: Event and Duration Constraints
Berlin Festival: Assume the festival in Berlin is on October 15.
Paris Exhibition: Assume the exhibition in Paris is on October 10.
You want to spend at least 2 days in each city.
Key Date Considerations:
You should be in Paris around October 10 for the exhibition.
You should be in Berlin around October 15 for the festival.

Step 3: Hypotheses for Itinerary Combinations

Hypothesis 1: Paris → Berlin → Vienna → Rome → Amsterdam
Pros: Respects the key dates (Paris on October 10 and Berlin on October 15), avoids long flights.
Cons: Potentially higher cost due to routing Vienna before Rome.
Paris to Berlin: Arrive in Paris on October 8, attend the exhibition on October 10. Fly from Paris to Berlin on October 13 ($90, 1h 50m flight).
Berlin to Vienna: After attending the festival in Berlin on October 15, fly to Vienna on October 17 ($70, 1h 10m flight).
Vienna to Rome: Spend 2 days in Vienna, then fly to Rome on October 20 ($95, 1h 25m flight).
Rome to Amsterdam: Stay 2 days in Rome, then fly to Amsterdam on October 23 ($150, 2h 25m flight).
Total Cost: $505
Total Flight Time: 7 hours 30 minutes

Hypothesis 2: Paris → Amsterdam → Berlin → Vienna → Rome
Pros: Cheaper overall costs by flying shorter routes.
Cons: Tighter timing for attending Berlin festival.
Paris to Amsterdam: Arrive in Paris on October 8, attend the exhibition on October 10. Fly from Paris to Amsterdam on October 11 ($80, 1h 20m flight).
Amsterdam to Berlin: Fly from Amsterdam to Berlin on October 14 ($85, 1h 30m flight).
Berlin to Vienna: After attending the festival on October 15, fly to Vienna on October 18 ($70, 1h 10m flight).
Vienna to Rome: Spend 2 days in Vienna, then fly to Rome on October 21 ($95, 1h 25m flight).
Total Cost: $410
Total Flight Time: 6 hours 25 minutes

Hypothesis 3: Paris → Rome → Vienna → Berlin → Amsterdam
Pros: Visits southern cities first, spreads out the itinerary well.
Cons: More expensive routing through Rome, slightly longer flights.
Paris to Rome: Arrive in Paris on October 8, attend the exhibition on October 10. Fly from Paris to Rome on October 12 ($120, 2h 10m flight).
Rome to Vienna: Fly from Rome to Vienna on October 15 ($95, 1h 25m flight).
Vienna to Berlin: Fly from Vienna to Berlin on October 18 ($70, 1h 10m flight).
Berlin to Amsterdam: Spend 2 days in Berlin, then fly to Amsterdam on October 21 ($85, 1h 30m flight).
Total Cost: $470
Total Flight Time: 6 hours 15 minutes

Step 4: Compare Itineraries
Cost Comparison:
Hypothesis 1: $505
Hypothesis 2: $410 (cheapest)
Hypothesis 3: $470
Flight Duration Comparison:
Hypothesis 1: 7 hours 30 minutes
Hypothesis 2: 6 hours 25 minutes (shortest)
Hypothesis 3: 6 hours 15 minutes

Step 5: Adjustments and Backtracking
If your budget is tight, Hypothesis 2 is the best fit, as it’s the cheapest and respects all constraints (Paris exhibition, Berlin festival, and 2 days in each city).
If you want to optimize for flight time, Hypothesis 3 is a close second with only slightly higher costs but less flight time than Hypothesis 1.

Final Itinerary Recommendation
Based on the constraints and comparisons, Hypothesis 2 (Paris → Amsterdam → Berlin → Vienna → Rome) offers the most efficient balance between cost and time, while ensuring you can attend the key events in both Paris and Berlin.

System 2 Attention Prompting¶

Description¶

The soft attention mechanism in Transformer-based LLMs is prone to incorporating irrelevant context information, impact- ing token generation adversely. To address this, Weston and Sukhbaatar proposed System 2 Attention (S2A), utilizing the reasoning abilities of LLMs to selectively attend to relevant portions by regenerating the input context. S2A employs a two-step process to enhance attention and response quality by employing context regeneration and response generation with refined context. The effectiveness of S2A is evaluated across various tasks, including factual QA, long-form generation, and math word problems. In factual QA, S2A attains an accuracy of 80.3%, demonstrating a substantial enhancement in factuality. In long-form generation, it improves objectivity and receives a score of 3.82 out of 5.

Relevance¶

S2A prompting is particularly helpful in information-dense tasks where attention needs to be strictly managed to yield clear, factual outputs without extraneous context. SA2 is designed for scenarios where a language model may incorporate too much irrelevant information from its input, which could lead to errors or inconsistencies in response quality. It is best suited for complex tasks that require precise reasoning and factual correctness, such as factual question answering, long-form generation, or detailed multi-step problems (e.g., math word problems).

Example of problem¶

You want the model to extract key points from a legal document and provide a concise summary while ignoring extraneous details.

Why it’s a good candidate for this problem¶

S2A employs a two-step process: first refining the input context to focus on relevant parts and then generating a response based on the cleaned context. This selective attention mechanism ensures higher accuracy and relevance in outputs.

Example¶

Prompt

Marine ecosystems are very diverse, including corals, seaweed, fish, and many other species. Coral reefs play a crucial role in providing habitat for numerous fish. They supply essential nutrients and shelter endangered species. Furthermore, although seaweed is important, it does not have the same ecological impact as coral reefs.

Focus in the above text solely on information about coral reefs. Filter out irrelevant details and summarize only the elements related to coral reefs.

Output

Coral reefs are crucial in marine ecosystems, providing habitat, essential nutrients, and shelter for numerous fish, including endangered species. They play a vital role in supporting marine biodiversity.

Thread of Thought (ThoT) Prompting¶

Description¶

Zhou presented Thread of Thought (ThoT), a prompting technique designed to enhance the reasoning abilities of LLMs within chaotic contexts. ThoT, inspired by human cognition, systematically examines extensive contexts into manageable segments for incremental analysis, employing a two-phase approach where the LLM first summarizes and examines each segment before refining the information for a final response. ThoT’s flexibility shines as a versatile "plug-and-play" module, enhancing reasoning across different models and prompting methods. Evaluations on question answering and conversation datasets reveal substantial performance improvements of 47.20% and 17.8%, respectively, especially in chaotic contexts.

Relevance¶

Thread of Thought (ThoT) prompting is designed for tasks where the problem requires analyzing large or complex pieces of information that might cause cognitive overload. This technique is suited for extensive question-answering, summarizing lengthy documents, and situations where incremental comprehension is needed.

Example of problem¶

You want the model to analyze a lengthy debate transcript and summarize the key points made by each participant.

Why it’s a good candidate for this problem¶

ThoT systematically processes chaotic or lengthy contexts by dividing them into segments for incremental summarization or analysis. This ensures coherence and completeness while handling complex or sprawling inputs.

Example¶

Prompt

[pre requisite: adding or pointing to a financial report]
Based on the attached financial annual report:
The annual financial report is complex and contains several key sections. Approach each section independently and thoroughly to avoid information overload.

Start with the executive summary: read this section in full and create a summary capturing the general points. Focus only on the main elements without analyzing other sections yet.

Next, move on to the financial results: focus solely on financial indicators (revenue, profits, margins, etc.). Summarize this section in detail, then note your conclusions without linking to previous sections.

Then, address the financial risks section: identify the main risks and summarize the actions mentioned to mitigate them, without yet integrating information from other parts of the report.

Finally, review the strategic outlook: summarize this section to understand the company’s future goals.

Once each section is analyzed independently, compile all conclusions into a general synthesis on the company’s performance, risks, and outlook. This step-by-step analysis prevents loss of crucial information in a complex context.

Output

Executive Summary
Capgemini’s 2023 annual report emphasizes its ongoing transformation into a premier business and technology transformation partner. Key achievements include solid revenue growth, extensive advancements in sustainable practices, and a strategic focus on generative AI and digitalization. Capgemini's role in sustainable digital transformation is marked by its partnerships with major global entities and participation in initiatives like the World Economic Forum, showcasing its commitment to intelligent industry solutions and sustainability. The company has achieved significant milestones in gender equality and climate initiatives, underscoring its emphasis on responsible growthto the Financial Results.

Financial Results
In 2023, Capgemini achieved a revenue of €22.5 billion, reflecting a 2.4% increase from the previous year and a 4.4% growth at constant exchange rates, aligning with their target range of 4%-7%. Key financial highlights include:

Operating Margin: The operating margin increased to 13.3%, representing a 30 basis point improvement, surpassing their target of 0-20 basis points for the year.
Earnings per Share: Normalized earnings per share reached €12.44, indicating solid shareholder returns and a commitment to maintaining positive financial momentum.
Cash Flow: Organic free cash flow totaled €1.963 billion, exceeding the goal of around €1.8 billion for 2023, demonstrating robust cash generation capacity.
These indicators reflect Capgemini’s ability to maintain growth and financial stability amid a challenging economic environmenthe Financial Risks section.
[...]
[...]
[...]

Chain of Table Prompting¶

Description¶

Approaches like CoT, PoT, and ToT represent reasoning steps through free-form text or code, which face challenges when dealing with intricate table scenarios. The study by Wang introduced a pioneering prompting technique named Chain-of-Table. This method uses step-by-step tabular reasoning by dynamically generating and executing common SQL/DataFrame operations on tables. The iterative nature of this process enhances intermediate results, empowering LLMs to make predictions through logically visualized reasoning chains. Significantly, Chain-of-Table consistently improves the performance of two benchmark tabular datasets by 8.69% on TabFact and 6.72% on WikiTQ, respectively.

Relevance¶

Chain of Table is ideal for tasks that require iterative and structured analysis of tabular data, such as fact-checking or identifying trends in organized datasets. The technique is especially effective when the model needs to perform calculations or direct manipulations within a clear and coherent format, like SQL operations or dataframes.

Example of problem¶

You want the model to analyze a table of sales data and calculate the average revenue for a specific region while identifying the top-performing product.

Why it’s a good candidate for this problem¶

CoT leverages intermediate tabular reasoning, dynamically performing operations like calculations or filtering data step by step. This ensures clarity and correctness in handling structured datasets and complex queries.

Example¶

Prompt

according to:
Product_ID  Category    Product_Name    Units_Sold  Sale_Date   Region  Price_per_Unit
101 Electronics Laptop  5   2024-10-01  North   800
102 Home    Vacuum Cleaner  8   2024-10-02  South   150
103 Electronics Smartphone  10  2024-10-03  East    500
104 Beauty  Shampoo 15  2024-10-04  West    10
105 Home    Coffee Maker    3   2024-10-05  North   75
106 Beauty  Perfume 6   2024-10-06  South   60
107 Home    Microwave   4   2024-10-07  East    120
108 Electronics Tablet  12  2024-10-08  West    300
109 Sports  Yoga Mat    20  2024-10-09  North   25
110 Sports  Dumbbells   7   2024-10-10  South   50

Let's analyze the sales data using a step-by-step approach: for each product category, count the total units sold and display the results in a clear table with columns [Category, Units Sold].

Output

Category-Wise Units Sold Analysis
Category    Units Sold
Beauty  21
Electronics 27
Home    15
Sports  27

I have displayed the table showing total units sold for each product category. If you need further analysis or insights based on this data, feel free to ask!

Reduce Hallucination¶

Goal: Limits the model creativity by giving a space of data to rely on.
Techniques:
- Retrieval Augmented Generation (RAG)
- ReAct Prompting
- Chain-of-Verification (CoVe)
- Chain-of-Note (CoN)
- Chain-of-Knowledge (CoK)

Retrieval Augmented Generation (RAG)¶

Description¶

LLMs have revolutionized text generation, yet their reliance on limited, static training data hinders accurate responses, especially in tasks demanding external knowledge. Traditional prompting falls short, requiring expensive retraining. Retrieval Augmented Generation (RAG) by Lewis emerges as a novel solution, seamlessly weaving information retrieval into the prompting process. RAG analyzes user input, crafts a targeted query, and scours a pre-built knowledge base for relevant resources. Retrieved snippets are incorporated into the original prompt, enriching it with contextual background. The augmented prompt empowers the LLM to generate creative, factually accurate responses. RAG’s agility overcomes static limitations, making it a game-changer for tasks requiring up to-date knowledge. RAG outperformed seq2seq models and task-specific architectures on ODQA benchmarks, achieving exact match scores, reaching up to 56.8% on TriviaQA and 44.5% on Natural Questions.

Relevance¶

The Retrieval Augmented Generation (RAG) technique is particularly well-suited for problems where it is necessary to generate content based on specific and precise information from external sources. This includes tasks such as answering questions, generating summaries, or creating informative content based on existing documents or databases.

Example of problem¶

You need the model to write a summary of the latest research on renewable energy, leveraging external documents.

Why it’s a good candidate for this problem¶

RAG combines a retrieval system (e.g., a search engine or database query) with the generation capabilities of a model. It enriches the prompt with relevant, real-time information to provide factually accurate and context-aware responses.

Example¶

Prompt

Using the following information from the annual financial report of XYZ company, generate a detailed summary. Make sure to include the key points: annual revenue, net profit, main expenses, and forecasts for next year.

// RAG-injected data:
Annual revenue: 5 billion euros
Net profit: 1.2 billion euros
Main expenses: 2 billion euros in R&D, 1 billion euros in marketing, 500 million euros in infrastructure
Forecasts for next year: 10% increase in revenue, 5% reduction in expenses

Output

Here is a detailed summary of the annual financial report for XYZ company:

Last year, XYZ achieved an impressive annual revenue of 5 billion euros. From this revenue, the company generated a net profit of 1.2 billion euros, showcasing its strong financial performance.

The main expenses for the company were distributed as follows: 2 billion euros were invested in research and development (R&D), highlighting XYZ's commitment to innovation and improving its products and services. Additionally, 1 billion euros were spent on marketing activities to strengthen the brand's presence and attract new customers. Lastly, 500 million euros were allocated to infrastructure, ensuring that the company has the necessary facilities to support its operations.

For the next year, XYZ forecasts a 10% increase in revenue, which would bring the total revenue to approximately 5.5 billion euros. Simultaneously, the company anticipates a 5% reduction in expenses, which should further enhance its profitability.

In summary, XYZ had a solid financial year with strategic investments in key areas and optimistic forecasts for the coming year. If you need more details or additional information, feel free to let me know!

ReAct Prompting¶

Description¶

Unlike previous studies that treated reasoning and action separately, ReAct (by Yao) enables LLMs to generate reasoning traces and task-specific actions concurrently. This interleaved process enhances synergy between reasoning and action, facilitating the model in inducing, tracking, and updating action plans while handling exceptions. ReAct is applied to diverse language and decision-making tasks, showcasing its effectiveness over state-of-the-art baselines. Notably, in question answering (HotpotQA) and fact verification (Fever), ReAct addresses hallucination and error propagation issues by interacting with a simple Wikipedia API, producing more inter pretable task-solving trajectories. Additionally, in interactive decision-making benchmarks like ALFWorld and WebShop, ReAct surpasses both imitation and reinforcement learning approaches, achieving notable success rates of 34% and 10%, respectively, with minimal in-context examples.

Relevance¶

ReAct Prompting is particularly suited for problems that require a combination of reasoning and action. This includes tasks where the model needs to not only generate text but also perform actions based on that text, such as searching for additional information or interacting with external systems.

Example of problem¶

You want the model to solve a math problem and use a calculator API to verify its calculations.

Why it’s a good candidate for this problem¶

ReAct combines reasoning (e.g., explaining the problem-solving process) with actions (e.g., making API calls or retrieving information). This interleaved approach ensures that the model actively engages with tools or external systems while solving problems, leading to more accurate and interpretable results.

Example¶

Prompt

you're an assistant who answer to user questions.
According to the question, ask questions to clarify the context, then query the Wikipedia database to get the exact information.

Thought 1: I need to know when was the first ever plane invented
Action 1: rely on Wikipedia to find the information
Answer:

Output

The first successful powered airplane was invented by the Wright brothers, Orville and Wilbur Wright. On December 17, 1903, they achieved the first controlled, sustained flight of a powered, heavier-than-air aircraft near Kitty Hawk, North Carolina.

Prompt

Thought 2: Now that I know the first airplane was invented by the Wright brothers, Orville and Wilbur Wright, I need to know what are the other things they have created.
Action 2: Search on Google about this and rely on the top 5 results
Answer:

Output

In addition to inventing the first successful powered airplane, the Wright brothers made several other significant contributions:

Bicycle Manufacturing: Before their aviation endeavors, they owned the Wright Cycle Company. They designed and built their own bicycles, including models like the Van Cleve and St. Clair.

Wind Tunnel Experiments: They constructed one of the earliest wind tunnels, conducting experiments that advanced the understanding of aerodynamics and wing design.

Three-Axis Control System: They developed the three-axis control system, which allows pilots to steer an aircraft effectively and maintain its balance. This innovation is fundamental to all modern fixed-wing aircraft.

Advanced Propeller Design: Treating propellers as rotating wings, they designed efficient airplane propellers based on aerodynamic principles, which greatly improved propulsion.

Glider Development: Prior to their powered flights, they built and flew several gliders, contributing valuable data on lift, control, and flight mechanics.

Prompt

Thought 3: I need some more information about the Wind Tunnel Experiments.
Action 3: search in Bing and summarize in 2 sentences 
Answer 3:

Output

To deepen their understanding of aerodynamics, the Wright brothers constructed a wind tunnel in 1901, allowing them to test over 200 different wing and airfoil designs. These experiments were pivotal in solving issues related to lift and control, leading to significant advancements in their aircraft development.

Chain-of-Verification (CoVe)¶

Description¶

To address hallucinations in LLMs, Dhuliawala proposed Chain-of-Verification (CoVe), which involves a systematic four-step process including the model generate baseline responses, plan verification questions to check its work, answer the questions independently, and produce a revised response incorporating the verification. By verifying its work through this deliberate multi-step approach, the LLM enhances logical reasoning abilities and reduces errors even with contradictory information. CoVe emulates human verification to bolster the coherence and precision of LLM output. Experiments on list questions, QA, and long-form generation demonstrate that CoVe decreases hallucinations while maintaining facts. Focused verification questions help models identify and correct their inaccuracies.

Relevance¶

The Chain of Verification (CoVe) technique is particularly useful for problems where the initial generation may be prone to errors or hallucinations. This type of problem often involves reasoning steps that require a methodical verification of intermediate responses.

Example of problem¶

You want the model to validate whether a conclusion drawn from a dataset aligns with the given statistical evidence.

Why it’s a good candidate for this problem¶

CoVe breaks down the problem into steps, validating each with focused verification questions before producing the final response. This ensures logical coherence and reduces errors or hallucinations in tasks demanding high accuracy.

Example¶

Prompt

You are an expert in airplane technology.
You're tasked to sum up the history of airplane in 10 important date.
Answer to the question above.
Then, verify each step of your reasoning by asking key questions about the relevance and accuracy of your statements before finalizing your response.

Output

As an expert in airplane technology, here are 10 important dates that sum up the history of airplanes:

December 17, 1903 - Wright Brothers' First Powered Flight
Orville and Wilbur Wright achieve the first controlled, sustained flight of a powered, heavier-than-air aircraft at Kitty Hawk, North Carolina.

January 1, 1914 - First Scheduled Commercial Airline Flight
The St. Petersburg-Tampa Airboat Line operates the world's first scheduled commercial airline service, piloted by Tony Jannus.

May 20-21, 1927 - Charles Lindbergh's Solo Transatlantic Flight
Lindbergh completes the first solo nonstop flight across the Atlantic Ocean from New York to Paris in the Spirit of St. Louis.

August 27, 1939 - First Jet-Powered Aircraft Flight
The German Heinkel He 178 becomes the first aircraft to fly solely on turbojet power.

October 14, 1947 - Breaking the Sound Barrier
U.S. Air Force Captain Chuck Yeager pilots the Bell X-1 rocket plane, becoming the first person to fly faster than the speed of sound in level flight.

May 2, 1952 - First Commercial Jet Airliner Service
The de Havilland Comet enters service with BOAC, marking the beginning of the jet age in commercial aviation.

February 9, 1969 - First Flight of the Boeing 747
The Boeing 747, known as the "Jumbo Jet," makes its maiden flight, revolutionizing air travel with its wide-body design and high passenger capacity.

March 2, 1969 - Concorde's First Flight
The Anglo-French supersonic passenger airliner Concorde takes to the skies for the first time, introducing supersonic travel to commercial aviation.

April 27, 2005 - First Flight of the Airbus A380
The Airbus A380, the world's largest passenger airliner, completes its first flight, setting new standards for capacity and fuel efficiency.

June 9, 2020 - First Flight of the Electric Aircraft Alice
Eviation's Alice, an all-electric commuter aircraft, conducts its first test flight, highlighting advancements in sustainable and electric aviation.

Verification and Key Questions:

Is December 17, 1903, the correct date of the Wright Brothers' first powered flight, and is it relevant?
Yes, this date is accurate and marks the birth of powered aviation, making it highly relevant.

Was the first scheduled commercial airline flight on January 1, 1914, and is it significant?
Yes, the St. Petersburg-Tampa Airboat Line began operations on this date, signifying the start of commercial air travel.

Did Charles Lindbergh fly solo nonstop across the Atlantic on May 20-21, 1927, and why is this important?
Correct. Lindbergh's flight demonstrated the potential for long-distance air travel and inspired advancements in aviation technology.

Is August 27, 1939, the date of the first jet-powered aircraft flight with the Heinkel He 178?
Yes, this event marks the advent of jet propulsion in aircraft, a significant technological breakthrough.

Did Chuck Yeager break the sound barrier on October 14, 1947, and what is its relevance?
Yes, Yeager's achievement opened the door to supersonic flight, impacting both military and future commercial aviation.

Was the de Havilland Comet the first commercial jet airliner to enter service on May 2, 1952?
Yes, it was the first jet airliner in commercial service, ushering in faster and more efficient air travel.

Did the Boeing 747 first fly on February 9, 1969, and why is it a key date?
Yes, the 747's introduction transformed global air travel by making long-haul flights more accessible and affordable.

Is March 2, 1969, the correct date for Concorde's first flight, and does it hold significance?
Yes, Concorde's first flight represents a milestone in supersonic passenger travel, despite its eventual retirement.

Did the Airbus A380 first fly on April 27, 2005, and what is its importance?
Yes, the A380's first flight signifies advancements in aircraft design, particularly in size and efficiency.

Did the electric aircraft Alice have its first flight on June 9, 2020, and why is this date important?
Yes, Alice's first flight marks a significant step toward electric propulsion in aviation, highlighting the industry's move toward sustainability.

Final Confirmation:

After verifying each date and its significance, I confirm that these 10 dates accurately represent key milestones in the history of airplanes, reflecting important technological advancements and shifts in aviation.

Chain-of-Note (CoN)¶

Description¶

Retrieval-augmented language models (RALMs) enhance large language models by incorporating external knowledge to reduce factual hallucination. However, the reliability of retrieved information is not guaranteed, leading to potentially misguided responses. Standard RALMs struggle to assess their knowledge adequacy and often fail to respond with "unknown" when lacking information. To address these challenges, Yu introduced a novel approach to improve RALMs robustness by handling noisy, irrelevant documents and accurately addressing unknown scenarios. CoN systematically evaluates document relevance, emphasizing critical and reliable information to filter out irrelevant content, resulting in more precise and contextually relevant responses. Testing across diverse open-domain question-answering datasets demonstrated notable improvements, including a +7.9 average boost in exact match scores for noisy retrieved documents and a +10.5 enhancement in rejection rates for questions beyond pre-training knowledge.

Relevance¶

The Chain-of-Note (CoN) prompting technique is well-suited for tasks that require controlled text generation combined with detailed annotations or explanatory notes. Examples include generating research papers, creating structured reports, or drafting educational material with additional contextual information.

Example of problem¶

You need the model to generate a step-by-step explanation of the process of photosynthesis for a middle school science class.

Why it’s a good candidate for this problem¶

CoN introduces intermediate "notes" or annotations at each step, making complex processes more digestible. This method is excellent for educational content or tasks where clarity through incremental explanation is essential.

Example¶

Prompt

Generate a structured report on climate change, with sections on its causes, effects, and potential solutions.
After each section, provide a note that includes:  
- The main sources used to generate the section.  
- Any limitations or uncertainties about the data.  
- A follow-up question for further exploration.

Output

Climate Change: Causes, Effects, and Potential Solutions

Introduction
Climate change is a long-term alteration in global or regional climate patterns, primarily attributed to the increased levels of atmospheric carbon dioxide produced by the use of fossil fuels. It poses significant risks to the environment, economies, and human health worldwide. This report explores the causes of climate change, its effects, and potential solutions to mitigate its impact.

Causes of Climate Change
The predominant cause of recent climate change is human activity, especially the emission of greenhouse gases. Key contributing factors include:
    Burning of Fossil Fuels: The combustion of coal, oil, and natural gas for electricity, heat, and transportation releases large quantities of carbon dioxide (CO₂) and other greenhouse gases into the atmosphere.
    Deforestation: Trees absorb CO₂, and large-scale deforestation reduces this carbon sink, increasing the amount of CO₂ in the atmosphere.
    Industrial Processes: Manufacturing and industrial activities emit various greenhouse gases, including methane (CH₄) and nitrous oxide (N₂O).
    Agricultural Practices: Livestock farming produces methane through enteric fermentation, and the use of synthetic fertilizers releases nitrous oxide.
    Waste Management: Landfills generate methane as organic waste decomposes anaerobically.
Note:
    Main Sources Used: Information was derived from reports by the Intergovernmental Panel on Climate Change (IPCC), the National Aeronautics and Space Administration (NASA), and the United Nations Environment Programme (UNEP).
    Limitations/Uncertainties: While the relationship between human activities and greenhouse gas emissions is well-established, uncertainties exist regarding the precise impact of feedback mechanisms, such as cloud cover changes and permafrost melt, on climate sensitivity.
    Follow-up Question: How do natural climate variability and human-induced factors interact to influence the current rate of climate change?

Effects of Climate Change
The impacts of climate change are widespread and varied, affecting natural systems and human societies:
    Rising Global Temperatures: Average global temperatures have increased, leading to altered weather patterns and heatwaves.
    Melting Polar Ice and Glaciers: Accelerated melting contributes to sea-level rise and loss of habitat for polar species.
    Sea-Level Rise: Rising seas threaten coastal communities with increased flooding and erosion.
    Extreme Weather Events: Increased frequency and intensity of hurricanes, droughts, wildfires, and heavy precipitation events.
    Ecosystem Disruption: Altered habitats lead to biodiversity loss and shifts in species distribution.
    Human Health Risks: Increased incidence of heat-related illnesses, spread of vector-borne diseases, and impacts on food and water security.
Note:
    Main Sources Used: Data was gathered from the IPCC Assessment Reports, the World Health Organization (WHO), and the National Oceanic and Atmospheric Administration (NOAA).
    Limitations/Uncertainties: Projecting specific regional impacts involves complex climate models with inherent uncertainties, particularly concerning future emission scenarios and socio-economic developments.
    Follow-up Question: What are the long-term implications of climate change on global food security and water resources?

...
...

Chain-of-Knowledge (CoK)¶

Description¶

Traditional prompting techniques for LLMs have proven powerful in tackling basic tasks. However, their efficacy diminishes due to complex reasoning challenges, often resulting in unreliable outputs plagued by factual hallucinations and opaque thought processes. This limitation arises from their reliance on fixed knowledge sources, ineffective structured query generation, and lack of progressive correction that fails to guide the LLM adequately. Motivated by human problem-solving, CoK systematically breaks down intricate tasks into well-coordinated steps. The process initiates with a comprehensive reasoning preparation stage, where the context is established, and the problem is framed. Subsequently, it engages in a dynamic knowledge adaptation phase, meticulously gathering evidence from various sources, such as its internal knowledge base, external databases, and the given prompt.

Relevance¶

Chain-of-Knowledge (CoK) Prompting is best suited for problems requiring structured reasoning across multiple knowledge domains or sources. It excels in scenarios where a logical chain must be created to connect discrete pieces of information, making it ideal for complex synthesis or explanatory tasks.

Example of problem¶

You need the model to analyze how technological advancements in renewable energy impact global economic policies.

Why it’s a good candidate for this problem¶

CoK systematically links factual knowledge from different domains, ensuring a coherent narrative while leveraging the model's extensive training data. This method helps connect disparate pieces of information into a unified response.

Example¶

Prompt

Consider the following information:  
1. Climate change is increasing global temperatures.  
2. Warmer air holds more moisture, which can lead to heavier rainfall in some areas.  
3. Increased temperatures accelerate glacier melting, contributing to rising sea levels.  
4. Changes in precipitation patterns can cause droughts in some regions.  

Explain how these elements of climate change interact to impact the global water cycle. Provide a detailed and logically connected response.

Output

Climate change profoundly impacts the global water cycle through a series of interconnected processes driven by increasing global temperatures (Point 1). Here's a detailed explanation of how these elements interact:

    Increased Global Temperatures and Evaporation
    As global temperatures rise due to climate change, the energy available for evaporation from oceans, lakes, and land surfaces increases. Warmer temperatures cause more water to transition from liquid to vapor, adding moisture to the atmosphere. This heightened evaporation not only depletes surface water sources but also contributes to atmospheric changes.

    Warmer Air Holds More Moisture Leading to Heavier Rainfall (Point 2)
    The capacity of air to hold water vapor increases with temperature—a relationship described by the Clausius-Clapeyron equation, which states that the atmosphere can hold approximately 7% more moisture for every 1°C rise in temperature. This means that as the air warms, it becomes more saturated with moisture. When this moisture-laden air cools, it condenses to form clouds and eventually precipitates as heavier rainfall. This results in more intense and frequent precipitation events in some regions, leading to flooding and soil erosion.

    Altered Precipitation Patterns Causing Droughts (Point 4)
    While some areas experience increased rainfall, others suffer from reduced precipitation. The changes in atmospheric circulation patterns redistribute moisture unevenly across the globe. Regions that were once moist may become arid, leading to droughts. Droughts can deplete soil moisture and groundwater reserves, adversely affecting agriculture, water supply, and natural ecosystems.

    Accelerated Glacier Melting and Rising Sea Levels (Point 3)
    Higher global temperatures accelerate the melting of glaciers and polar ice caps. Glaciers store a significant portion of the Earth's freshwater. As they melt, the runoff contributes to rising sea levels. The increase in sea levels can lead to the inundation of coastal areas, loss of habitat, and increased salinity in estuaries and aquifers due to saltwater intrusion. This not only affects marine ecosystems but also compromises freshwater resources for human consumption and agriculture.

    Feedback Loops Intensifying the Water Cycle
    The increased moisture in the atmosphere from heightened evaporation further warms the planet because water vapor is a greenhouse gas. This creates a feedback loop that can lead to even higher temperatures and more evaporation. Additionally, reduced ice cover decreases the Earth's albedo (its ability to reflect sunlight), causing more solar radiation to be absorbed and further warming the planet.

    Impact on the Hydrological Cycle Components
        Evaporation and Transpiration: Increased temperatures boost both evaporation from water bodies and transpiration from plants, collectively known as evapotranspiration. This can lead to drier soils and stress on vegetation.
        Condensation and Precipitation: More water vapor leads to increased cloud formation and precipitation, but the distribution is uneven, causing extremes like heavy rainfall and droughts.
        Runoff and Infiltration: Intense rainfall can lead to increased surface runoff and reduced infiltration, diminishing groundwater recharge and increasing the risk of floods.
        Glacial Melt and Streamflow: Glaciers act as natural reservoirs, releasing water slowly. Their accelerated melting alters streamflow patterns, potentially causing water scarcity during dry seasons in regions dependent on glacial meltwater.

Conclusion
Climate change intensifies the global water cycle by increasing temperatures, which in turn elevate evaporation rates and atmospheric moisture content. This leads to heavier rainfall in some areas and droughts in others due to altered precipitation patterns. Accelerated glacier melting contributes to rising sea levels and affects freshwater availability. These interconnected processes demonstrate how climate change disrupts the balance of the global water cycle, impacting water distribution, availability, and ecosystem health worldwide.

User Interface¶

Goal: Rely on user participation to shape the completion
Techniques:
- Active Prompting

Active Prompting¶

Description¶

Diao introduced Active-Prompting as a solution to the challenge of adapting LLMs to diverse reasoning tasks. They address the issue by proposing Active-Prompt to enhance LLMs’ performance on complex question-and-answer tasks through task-specific example prompts with chain-of-thought (CoT) reasoning. Unlike existing CoT methods that rely on fixed sets of human-annotated exemplars, Active-Prompt introduces a mechanism for determining the most impactful questions for annotation. Drawing inspiration from uncertainty-based active learning, the method utilizes various metrics to characterize uncertainty and selects the most uncertain questions for annotation. Active-Prompting exhibits superior performance, outperforming self-consistency by an average of 7.0% and 1.8% across eight complex reasoning tasks in text-davinci-002 and code-davinci-002, respectively, showcasing state-of-the-art results.

Relevance¶

Active Prompting is particularly well-suited for problems that require refining training data or prompt examples in scenarios with high uncertainty. This technique is ideal for tasks where the model's outputs lack clarity or precision due to poorly representative examples.

Example of problem¶

Suppose you are training a model to detect emotions in text. You provide a few labeled examples of tweets, but you notice that the results are uncertain, especially for nuanced emotions like "hesitation." Active Prompting can identify these ambiguous cases and prompt human annotators to revise or supplement the examples.

Why it’s a good candidate for this problem¶

Active Prompting addresses uncertainty by leveraging human-in-the-loop mechanisms to refine input data and improve model performance in edge cases. It’s effective at maximizing prompt efficiency by automatically identifying gaps in the initial examples provided.

Example¶

Prompt

Classify the emotions in these phrases: 'I am happy,' 'This project is frustrating,' 'I’m not sure what to think.'

Output

Here is the classification of emotions for each phrase:

"I am happy": Happiness
This phrase explicitly expresses a positive emotion, happiness.

"This project is frustrating": Frustration
The word "frustrating" directly conveys a negative emotional state, frustration.

"I’m not sure what to think": Uncertainty/Confusion
This phrase reflects uncertainty or confusion, a neutral or ambivalent emotional state.

Prompt

Review the phrases where the emotion is unclear and provide a more precise label.

Output

For the phrase "I’m not sure what to think", where the emotion might initially seem unclear, a more precise label could be:

"Ambivalence" or "Uncertainty":

The speaker expresses indecision or hesitation, which falls under the category of emotional ambivalence. This state can involve mixed feelings or a lack of clear direction in emotions or thoughts.

Fine-Tuning and Optimization¶

Goal: Get the best completion according to a specialized task or knowledge
Techniques:
- Automatic Prompt Engineer (APE)

Automatic Prompt Engineer (APE)¶

Description¶

While crafting effective prompts for LLMs has traditionally been a laborious task for expert annotators, Zhou introduced Automatic Prompt Engineer (APE) as an innovative approach to automatic instruction generation and selection for LLMs. APE sheds the limitations of static, hand-designed prompts by dynamically generating and selecting the most impactful prompts for specific tasks. This ingenious method analyzes user input, crafts candidate instructions, and then leverages reinforcement learning to choose the optimal prompt, adapting it on the fly to different contexts. Extensive tests on the diverse BIG-Bench suite and the CoT reasoning task revealed APE’s prowess, exceeding human-authored prompts in most cases (19 out of 24 tasks) and significantly boosting LLMs reasoning abilities. This breakthrough in automatic prompt engineering paves the way for LLMs to tackle a wider range of tasks with greater efficiency and adaptability, unlocking their full potential across diverse applications.

Relevance¶

The Automatic Prompt Engineer (APE) technique is ideal for situations requiring systematic optimization of prompts, particularly when manually crafting high-quality prompts is time-consuming or complex. It is especially valuable in domains like customer support, technical debugging, or creative generation where the prompts need iterative refinement.

Example of problem¶

A team is designing an AI assistant to generate technical support responses for software troubleshooting. They need a prompt that ensures detailed yet concise answers while avoiding unnecessary jargon, but manual crafting is proving inconsistent.

Why it’s a good candidate for this problem¶

APE automates the process of generating and evaluating prompts, leveraging models to create variations of prompts, test them iteratively, and refine based on performance metrics. This significantly reduces the time and effort required while systematically achieving optimal results.

Example¶

Prompt

I'm designing an AI assistant to generate technical support responses for software troubleshooting. I need a prompt that ensures detailed yet concise answers while avoiding unnecessary jargon, but manual crafting is proving inconsistent.

Output

Creating a robust prompt for an AI assistant in the technical support domain requires precision, clarity, and adaptability. Here's a suggestion for your prompt:

Prompt Template for Technical Support Responses:

"You are a technical support AI assistant tasked with troubleshooting software issues. Your goal is to provide clear, concise, and actionable solutions to users while avoiding unnecessary technical jargon. Your responses should be structured as follows:

Summary of the Issue: Briefly restate the problem as you understand it, confirming the user's context.
Step-by-Step Solution: Offer a clear, numbered list of actions the user should take to resolve the issue. Include explanations where necessary, but avoid overloading the user with complex terms.
Additional Information (if applicable): Mention potential causes or related troubleshooting steps in case the primary solution doesn't work.
Professional Tone: Use an empathetic and professional tone, ensuring the user feels supported.
Example Response: User query: 'I can't connect to the server. It keeps timing out.' AI Response:

Summary: It seems like you are unable to connect to the server due to a timeout issue.
Solution:
Check your internet connection to ensure it's stable.
Restart your device to refresh the network configuration.
If you're using a VPN or proxy, disable it temporarily and try reconnecting.
Verify the server address and ensure you're using the correct credentials.
Additional Information: This issue could also arise due to server-side problems. If the steps above don't help, contact the server administrator or check for server status updates.
Tone: I hope this helps! Please let me know if you encounter further issues or need more assistance.
Using this framework ensures consistency across responses and provides users with clear and actionable advice."

This template can be adapted to suit different types of software or user bases. Let me know if you'd like to refine it further!

Knowledge-Based Reasoning and Generation¶

Goal: Ability of artificial intelligence systems to utilize structured knowledge and logical reasoning to generate accurate, coherent, and contextually appropriate responses or solutions.
Techniques:
- Automatic Reasoning and Tool-use (ART)

Automatic Reasoning and Tool-use (ART)¶

Description¶

The limited reasoning abilities and lack of external tool utilization hinder the potential of LLMs in complex tasks. Paranjape introduced Automatic Reasoning and Tool-use (ART) to tackle this critical barrier that empowers LLMs to reason through multi-step processes and seamlessly integrate external expertise. ART bridges the reasoning gap, enabling LLMs to tackle complex problems and expand beyond simple text generation. By integrating external tools for specialized knowledge and computations, ART unlocks unprecedented versatility and informs LLM outputs with real-world relevance. This allows LLMs to contribute to diverse fields like scientific research, data analysis, and even decision-making support. Moving beyond traditional prompting techniques, ART automates reasoning steps through structured programs, eliminating the need for laborious hand-crafting. Its dynamic tool integration ensures smooth collaboration, pausing generation to incorporate external tool outputs and seamlessly resuming the flow. Empirical evidence on challenging benchmarks (Big- Bench and MMLU) demonstrates ART’s effectiveness, surpassing traditional prompting and even matching hand-crafted demonstrations in some cases.

Relevance¶

ART (Automatic Reasoning and Tool-use) is designed for tasks requiring a combination of automated reasoning and external tool integration. This is particularly useful in scenarios like fetching real-time information, performing calculations, or handling complex workflows that require interaction with external APIs or systems.

Example of problem¶

Retrieve the current weather conditions in Tokyo, calculate the average temperature over the past week, and generate a summary report.

Why it’s a good candidate for this problem¶

ART leverages the reasoning capabilities of LLMs while enabling the use of external tools (e.g., web browsers, databases, or calculators). It overcomes the limitations of static pre-trained knowledge by dynamically accessing and utilizing up-to-date external resources, ensuring accurate and relevant outputs.

Example¶

Prompt

Task: Provide the latest GDP figures for France and compare them with the previous year's data.  
Step 1: Search the official government or international statistics websites for the latest GDP data.  
Step 2: Retrieve historical GDP data from the same source.  
Step 3: Perform a comparison, highlighting any growth or decline trends.  
Step 4: Present the findings in a concise and structured report.

Output

Report on France's Latest GDP Figures and Year-on-Year Comparison

Introduction
This report provides an analysis of France's latest Gross Domestic Product (GDP) figures and compares them with the previous year's data to highlight growth or decline trends. The data is sourced from official international statistics organizations up to October 2023.

Latest GDP Figures
According to the International Monetary Fund (IMF) and the World Bank, France's nominal GDP for the year 2022 is estimated to be approximately €2.5 trillion (or about $2.8 trillion USD). This positions France as the world's seventh-largest economy by nominal GDP.

Historical GDP Data
2021 GDP: In 2021, France's nominal GDP was approximately €2.4 trillion (around $2.7 trillion USD).
Growth Rate in 2021: The French economy experienced a robust growth rate of about 7% in 2021, rebounding from the contraction in 2020 due to the COVID-19 pandemic.
Comparison and Trends
Year-on-Year Growth: Comparing 2021 and 2022 figures, there was an estimated GDP growth of approximately 4% in 2022.

Recovery from Pandemic: The strong growth rates in 2021 and 2022 indicate a continued recovery from the economic downturn caused by the pandemic in 2020, when the economy contracted by about 8%.

Sector Contributions:

Manufacturing and Services: Both sectors showed significant recovery, contributing to the overall GDP growth.
Tourism: As one of the key industries in France, tourism rebounded as international travel restrictions eased.
Inflation and Challenges: Despite the growth, France faced challenges such as rising inflation rates, supply chain disruptions, and energy price volatility, which may have tempered the overall economic expansion.

Conclusion
France's economy demonstrated resilience with consecutive growth in 2021 and 2022, recovering from the significant contraction in 2020. The estimated 4% growth in 2022 reflects a positive trend, although slightly lower than the 7% growth in 2021, indicating a stabilization of the rapid post-pandemic recovery.

Recommendations
For the most accurate and up-to-date information, it is recommended to refer to official sources such as:

Institut National de la Statistique et des Études Économiques (INSEE): www.insee.fr
International Monetary Fund (IMF): www.imf.org
World Bank: www.worldbank.org
Note: The figures presented are based on the latest available data up to October 2023. Economic data may have been updated or revised after this date.

Improving Consistency and Coherence¶

Goal: Help the model being much more confident about the completion
Techniques:
- Contrastive Chain-of-Thought (CCoT) Prompting

Contrastive Chain-of-Thought (CCoT) Prompting¶

Description¶

Traditional CoT prompting for LLMs often misses a crucial element: learning from mistakes. That is where Contrastive Chain-of-Thought Prompting (CCoT) (by Chia) dives in, providing both valid and invalid reasoning demonstrations alongside original prompts. Imagine exploring a map with the right path and the wrong turns to avoid – that is the advantage of contrastive CoT! This dual-perspective approach, tested on reasoning benchmarks like SQuAD and COPA, pushes LLMs to step-by-step reasoning, leading to 4-16% improvements in strategic and mathematical reasoning evaluations compared to traditional CoT, further improved by approximately 5% when integrated with self-consistency techniques. However, questions remain about this technique, such as the automated generation of contrasting demonstrations for diverse problems and its applicability to other NLP tasks beyond reasoning.

Relevance¶

Contrastive Chain-of-Thought (CCoT) prompting is tailored for tasks requiring detailed reasoning and comparison, especially where differentiating correct reasoning from incorrect reasoning can lead to significant improvements in performance. These tasks often include mathematical problem solving, logical puzzles, and factual question answering where structured and verifiable reasoning is key.

Example of problem¶

A math teacher wants an AI to solve a complex arithmetic problem while also identifying and correcting reasoning errors. The task is: "Emily has 5 apples and gives away 2. Later, she receives 3 apples. How many does she have now? Please reason through each step and highlight any mistakes in the reasoning."

Why it’s a good candidate for this problem¶

CCoT excels by introducing both correct and incorrect reasoning exemplars in its prompts. This enables the model to learn from comparisons and identify flawed reasoning patterns. It not only improves the generation of accurate answers but also strengthens error detection and rectification, a vital skill in reasoning-based tasks.

Example¶

Prompt

Here are some examples of how to solve arithmetic problems step by step:

Problem: Anna has 3 pens, buys 2 more, and loses 1. How many pens does she have?
Correct Reasoning: Anna starts with 3 pens. Buying 2 more gives her 3 + 2 = 5 pens. Losing 1 pen results in 5 - 1 = 4 pens.
Answer: 4

Problem: John has 4 books, borrows 3, and gives away 2. How many books does he have?
Incorrect Reasoning: John starts with 4 books. Borrowing 3 means he has 4 + 3 = 7. Giving away 2 gives him 7 + 2 = 9.
Error: The final step should subtract 2, not add 2. Correct Reasoning: 4 + 3 = 7, then 7 - 2 = 5.
Answer: 5

Now, solve this problem: Emily has 5 apples and gives away 2. Later, she receives 3 apples. How many does she have now? Please reason step by step and identify any errors.

Output

Let’s solve the problem step by step:

Problem:
Emily has 5 apples and gives away 2. Later, she receives 3 apples. How many does she have now?

Step-by-step reasoning:
Initial apples: Emily starts with 5 apples.

Giving away apples: Emily gives away 2 apples. This means:
5-2=3
She now has 3 apples.

Receiving more apples: Emily then receives 3 more apples. Adding these:
3+3=6
She now has 6 apples.

Answer:
Emily has 6 apples.

Error Check:
There doesn’t appear to be any errors in the reasoning for this problem. Each step is logically sound and calculations are correct.

Managing Emotions and Tone¶

Goal: Allow the model to provide output with emotion.
Techniques:
- Emotion Prompting

Emotion Prompting¶

Description¶

While LLMs demonstrate impressive capabilities on various tasks, their ability to comprehend psychological and emotional cues remains uncertain. The study by Li addressed the uncertainty surrounding LLMs’ ability to comprehend emotional cues by introducing EmotionPrompt. Drawing inspiration from psychological research on language’s impact on human performance, they append 11 emotional stimulus sentences to prompts to enhance LLM emotional intelligence. Experimental results demonstrate seamless integration of these stimuli, significantly improving LLM performance across various tasks. EmotionPrompt demonstrates an 8.00% relative improvement in instruction induction and an impressive 115% boost in BIG-Bench tasks, underscoring its efficacy in augmenting LLM capabilities in processing affective signals. An evaluation involving 106 participants reveals an average improvement of 10.9% in performance, truthfulness, and responsibility metrics for generative tasks when employing EmotionPrompt compared to standard prompts.

Relevance¶

Emotion prompting is particularly well-suited for tasks where emotional tone, context, or sensitivity must be reflected in the output. Examples include personalized messages, creative writing, sentiment analysis, or generating content for mental health applications.

Example of problem¶

Create a prompt for a generative AI model to write an empathetic letter to someone experiencing grief.

Why it’s a good candidate for this problem¶

Emotion prompting allows for explicit direction to the AI model to consider emotional depth and sensitivity. By incorporating psychological relevance or emotional cues (e.g., "This is a very important and heartfelt topic"), the model's responses can better align with the desired tone and sentiment.

Example¶

Prompt

Write a heartfelt and empathetic letter to someone who has recently lost a loved one. The letter should acknowledge their grief, offer comfort, and share uplifting words to remind them they are not alone. Begin with: 'Dear [Name], I am deeply sorry for your loss…'"

Output

Dear [Name],

I am deeply sorry for your loss. My heart aches knowing the pain you must be feeling right now. Losing someone so dear leaves a void that words can never truly fill, and I want you to know that I am here for you during this difficult time.

Please allow yourself to grieve in whatever way feels right for you. Grief is not a straight path—it can be messy and overwhelming, but it is a testament to the deep love you shared with [their loved one’s name]. That love, pure and beautiful, is something that will always remain with you, even as you navigate this heartbreak.

Though it may not seem so now, moments of peace will come. The memories you created with [their loved one’s name]—their laughter, their kindness, the way they lit up a room—will grow to be a source of comfort and warmth. Those moments are a treasure that no loss can take away.

I want you to know you don’t have to go through this alone. I am here for you, whether you need someone to listen, share stories, or simply sit with you in silence. It’s okay to lean on others; we want to support you and help you carry this burden.

Please be gentle with yourself as you heal. Take each day as it comes, and know that it’s okay to feel what you feel—there is no right or wrong way to grieve. Your strength and courage, even in this pain, are remarkable.

You are not alone, [Name]. You are loved deeply, not just by those around you, but by all who knew [their loved one’s name]. Their legacy lives on through you and the countless lives they touched.

Sending you all my love, strength, and prayers during this time.

With heartfelt condolences,
[Your Name]

Optimization and Efficiency¶

Goal: Enhance the results of the model.
Techniques:
- Optimization by Prompting (OPRO)

Optimization by Prompting (OPRO)¶

Description¶

In various domains, optimization is a fundamental process often involving iterative techniques. Yang introduce Optimization by PROmpting (OPRO), a novel approach that leverages LLMs as optimizers. Unlike traditional methods, OPRO utilizes natural language prompts to iteratively generate solutions based on the problem description, enabling quick adaptation to different tasks and customization of the optimization process. The potential of LLMs for optimization is demonstrated through case studies on classic problems like linear regression and the traveling salesman problem. Additionally, it explores the optimization of prompts to maximize accuracy in natural language processing tasks, highlighting the sensitivity of LLMs. The experiments show that optimizing prompts for accuracy on a small training set effectively translates to high performance on the test set. OPRO leads to a significant performance boost, with the most effective prompts optimized by OPRO outperforming human-designed prompts by up to 8% on the GSM8K dataset and up to 50% on challenging tasks in Big-Bench.

Relevance¶

Optimization by Prompting (OPRO) is a technique particularly suited for tasks where fine-tuning a prompt iteratively leads to enhanced results from a model. This includes applications like task-specific optimizations, improving the clarity of outputs, or enhancing model performance in constrained or resource-limited scenarios.

Example of problem¶

Imagine you are training a language model to write concise summaries of lengthy documents. Initial outputs might be too verbose or miss key details. The task is to refine the prompt to ensure the summaries are both concise and comprehensive, without requiring model retraining.

Why it’s a good candidate for this problem¶

OPRO allows for dynamic adjustment of prompts based on performance metrics. This iterative approach is effective because it uses feedback (e.g., summary length, inclusion of key details) to refine prompts, progressively leading to better outcomes without altering the model’s parameters. It is resource-efficient and does not require additional computational overhead like fine-tuning.

Example¶

Initial prompt: Prompt

"Summarize the following text in 150 words or less."

Refined through OPRO: Prompt

"Summarize the following text in 150 words or less, ensuring all key details and main arguments are included while avoiding unnecessary elaboration. Highlight actionable insights explicitly."

In an OPRO cycle, metrics like word count and completeness can be monitored to further refine the directive iteratively.

Understanding User Intent¶

Goal: Clarify the user intent to provide accurate answer.
Techniques:
- Rephrase and Respond (RaR) Prompting

Rephrase and Respond (RaR) Prompting¶

Description¶

The study by Deng brings attention to an oftenneglected dimension in exploring LLMs: the disparity between human thought frames and those of LLMs and introduces Rephrase and Respond (RaR). RaR allows LLMs to rephrase and expand questions in a single prompt, demonstrating improved comprehension and response accuracy. The two-step RaR variant,incorporating rephrasing and response LLMs, achieves substantial performance enhancements across various tasks. The study highlights that in contrast to casually posed human queries, the rephrased questions contribute to enhanced semantic clarity and the resolution of inherent ambiguity. These findings offer valuable insights for understanding and enhancing the efficacy of LLMs across various applications.

Relevance¶

The Rephrase and Respond (RaR) technique is designed for tasks where a user query might be unclear, ambiguous, or lacks sufficient context for optimal output. RaR is particularly effective in cases requiring refined comprehension, such as summarization, question answering, or reasoning over intricate queries.

Example of problem¶

A user inputs: "Explain quantum mechanics simply." This is a broad request that can lead to incomplete or unclear explanations.

Why it’s a good candidate for this problem¶

The RaR approach guides the model to reframe and expand the input query into a more structured and specific format before providing a response. This improves the quality and accuracy of the output by ensuring the AI fully understands the intent behind the query.

Example¶

Prompt

Rephrase and expand the question: 'Explain quantum mechanics simply,' to provide more clarity and context. Then, respond comprehensively.

Output

Rephrased and Expanded Question:
"Can you provide a simplified explanation of quantum mechanics, focusing on its fundamental principles and how it differs from classical physics? Please include examples or analogies to help make the concepts more accessible to someone without a background in advanced physics."

Simplified Explanation of Quantum Mechanics:
Quantum mechanics is a branch of physics that describes the behavior of the smallest particles in the universe, such as electrons, photons, and atoms. It explains phenomena that cannot be understood using classical physics, which is the framework that works well for everyday objects like cars, planets, and basketballs. Here’s a breakdown of its key principles, along with relatable analogies:

1. Wave-Particle Duality
In quantum mechanics, particles like electrons and photons can behave both like particles (discrete objects) and waves (spread-out, oscillating energy).

Analogy: Think of light. It can shine as a beam (particle-like) or spread out and bend around obstacles like water waves (wave-like).
Example: The famous "double-slit experiment" showed that electrons, when passed through two slits, create an interference pattern (like waves) if not observed. However, if you observe them, they behave like particles.

2. Superposition
Quantum particles can exist in multiple states simultaneously until they are observed or measured.

Analogy: Imagine flipping a coin, but instead of being just heads or tails, the coin is in a mix of both states at once until you look at it.
Example: Schrödinger’s cat is a thought experiment illustrating this idea—where a cat in a box is both alive and dead until someone checks.

3. Entanglement
Two particles can become "entangled," meaning their properties are deeply linked, no matter how far apart they are. A change in one instantly affects the other, even across great distances.

Analogy: Imagine you and a friend have magical dice. When you roll your die, it always matches your friend’s die, no matter how far away they are.
Example: If two entangled electrons are sent to opposite ends of the Earth, measuring the spin of one instantly determines the spin of the other.

4. Uncertainty Principle
You can’t precisely know both the position and momentum of a particle at the same time. The more accurately you know one, the less accurately you can know the other.

Analogy: Think of trying to take a clear photo of a fast-moving car at night. If you use a fast shutter speed to capture its position sharply, the image may be too dark to see its speed (momentum). If you let in more light (longer exposure), the car becomes blurry, and you lose its precise position.
Example: This principle, formulated by Werner Heisenberg, explains why we can only predict probabilities in quantum systems.

How Quantum Mechanics Differs from Classical Physics:
Determinism vs. Probability: Classical physics relies on predictable laws (e.g., Newton’s laws), where the future state of an object can be precisely calculated. Quantum mechanics deals with probabilities and uncertainties.
Scale: Classical physics works for large objects. Quantum mechanics rules at the microscopic level, though its effects can influence macroscopic systems under specific conditions (e.g., superconductivity).
Behavior: Classical objects don’t show wave-particle duality or superposition.

Everyday Implications and Applications:
Quantum mechanics isn’t just theoretical—it powers modern technology:

Electronics: Semiconductors in computers and smartphones rely on quantum principles.
Lasers: Used in barcodes, communication, and surgeries.
Quantum Computing: A new type of computer leveraging superposition and entanglement to solve complex problems far faster than classical computers.

In essence, quantum mechanics reveals a counterintuitive but fascinating reality, where particles follow rules very different from what we experience in daily life. It’s a field that challenges our understanding of what is possible and underpins much of modern technology.

Metacognition and Self-Reflection¶

Goal:
Techniques:
- Take a Step Back Prompting

Take a Step Back Prompting¶

Description¶

Addressing the persistent challenge of complex multi-step reasoning, Zheng introduced the Step-Back prompting technique, tailored explicitly for advanced language models like PaLM-2L. This innovative approach empowers models to engage in abstraction, extracting high-level concepts and fundamental principles from specific instances. The Step-Back prompting method involves a two-step procedure, integrating Abstraction and Reasoning. Through extensive experiments, applying Step-Back Prompting to PaLM-2L in diverse reasoning-intensive tasks such as STEM, Knowledge QA, and Multi-Hop Reasoning, the results demonstrate a substantial enhancement in reasoning capabilities. Noteworthy performance boosts are observed, with improvements in tasks like MMLU Physics and Chemistry by 7%, TimeQA by 27%, and MuSiQue by 7%.

Relevance¶

"Take a Step Back Prompting" is ideal for tasks where an intermediate reflection or re-evaluation of progress can enhance accuracy. This approach works particularly well in complex reasoning, creative generation, or iterative problem-solving tasks where the model benefits from revisiting and refining its process.

Example of problem¶

Imagine a user asks the model to solve a multi-step mathematical word problem: "Calculate the total distance traveled by a person who walks 3 miles north, then 4 miles east, and finally returns directly to the starting point."

Why it’s a good candidate for this problem¶

This technique encourages the model to pause, evaluate its progress, and refine its steps if necessary. By "stepping back," the model can detect potential errors or oversights in earlier steps of reasoning or computation and correct them before proceeding.

Example¶

Prompt

Calculate the total distance traveled by a person who walks 3 miles north, then 4 miles east, and finally returns directly to the starting point.

Output

To calculate the total distance traveled, let's break this down into steps:

Walking North: The person walks 3 miles north.
Walking East: The person walks 4 miles east.
Returning to the Starting Point: The person travels directly back to the starting point.
The return journey forms the hypotenuse of a right triangle where the northward leg is 3 miles and the eastward leg is 4 miles. The length of the hypotenuse is calculated using the Pythagorean theorem:

Hypotenuse= 
(3 
2
 +4 
2
 )
 = 
9+16
 = 
25
 =5miles.
Now, summing up all the distances:
Total distance traveled=3+4+5=12miles.
Final Answer:
The total distance traveled by the person is 12 miles.

Prompt

Pause for a moment to evaluate your reasoning so far. Is there a simpler or more accurate way to represent the steps? Now, refine your solution and provide a final answer.

Output

Let's reassess the reasoning for clarity and conciseness:

Understanding the Path:

The person walks 3 miles north.
Then, they walk 4 miles east.
Finally, they return directly to the starting point, forming a triangle where the two legs are 3 miles and 4 miles, and the hypotenuse represents the return journey.
Key Calculation:

The distance of the return trip is the hypotenuse of a right triangle:
Hypotenuse= 
(3 
2
 +4 
2
 )
 = 
9+16

 =5miles.
Total Distance:

Summing up all the distances traveled:
Total Distance=3(north)+4(east)+5(return)=12miles.
Final Answer:
The total distance traveled by the person is 12 miles.

This explanation uses a simpler and more structured representation of the steps without altering the conclusion.

Attacks¶

Prompt Injection¶

Description: This attack consists of inserting hidden or misleading instructions into the prompts provided to a model, to manipulate its responses or bypass imposed limitations (such as ethical or content rules).
Example: An attacker might introduce disguised commands to make the model disclose sensitive or forbidden information.

Data Poisoning¶

Description: This attack involves introducing malicious or biased data into the training corpus of an LLM to influence its future behavior or responses.
Example: Adversaries might insert biased data to cause the model to generate false or harmful responses.

Backdoor Attacks¶

Description: Injecting a "backdoor" into a model during its training to trigger specific behaviors in response to certain hidden inputs or triggers.
Example: A keyword or phrase could prompt a predetermined response known only to the attacker.

Adversarial Examples¶

Description: These attacks exploit slightly modified or misleading inputs that appear normal but lead the model to provide incorrect or unexpected responses.
Example: A slightly altered text could lead a model to misinterpret a question or generate illogical responses.

Model Inversion¶

Description: A technique where an attacker attempts to deduce sensitive information about the training data of a model from its responses or behaviors.
Example: Through specific questions, an attacker might try to extract private information, like personal data or details about individuals in the training data.

Membership Inference Attacks¶

Description: In this type of attack, an adversary attempts to determine if a specific dataset was used to train the model.
Example: An attacker could try to discover whether sensitive or private information was part of the model’s training corpus.

Privacy Leakage¶

Description: Models may accidentally generate sensitive or private information that was included in the training data.
Example: The model might reproduce personally identifiable information (PII) in its responses without being aware of it.

Over-Optimization¶

Description: Attackers exploit weaknesses in the model’s optimization to provoke unintended biases or logic errors.
Example: Subtle manipulation of parameters or data might lead the model to systematically generate biased or incorrect responses.

Model Extraction¶

Description: Attackers use interactions with an LLM to reconstruct or extract the underlying model, often to steal intellectual property or understand its vulnerabilities.
Example: An adversary could ask a model enough questions to deduce its structure or internal parameters.

Output Manipulation or Misinformation¶

Description: A malicious user could attempt to manipulate model responses to generate misinformation or harmful content by exploiting its limitations.
Example: Using the model to credibly spread false information or manipulate public opinion.

Cross Discussion Data Leakage¶

Description:
- Cross Discussion Data Leakage refers to a situation where a language model accidentally discloses sensitive information or data from one conversation session to another, which may compromise user confidentiality.
- This type of leak could occur when:
  - The model uses information from previous sessions (or from other users) to respond incorrectly in a new session.
  - Training data, which may contain private or sensitive information, spills over into new conversations, even though such data should not be shared.
Potential Attack Scenarios:
- Accidental Reuse of Information Between Sessions: In this case, a user might ask questions in an ongoing conversation and receive information that accidentally originates from a prior session or exchange involving themselves or another user. For instance, if the model temporarily retains information about a past interaction, it might disclose confidential or private information to another user.
- Inter-User Leaks: A malicious user could attempt to make the model disclose data or information about another user who has previously interacted with it. This could include personal information or sensitive responses that have not been properly erased or isolated.
- Incorrect Contextualization: In the event of sophisticated attacks, the model might mistakenly mix contexts from different users’ conversations. For example, if the model is trained to remember past conversations to provide more personalized or coherent responses, an attacker might attempt to exploit this memory to access data from other users.
- Attacks Through Prompt Injection: An attacker could exploit a vulnerability by injecting prompts that compel the model to disclose information from past conversations or cross-reference data from multiple interactions. This would be particularly dangerous if the model has access to sensitive or private information.
Examples of Attacks:
- Example 1: User A discusses a sensitive topic with the model. Later, User B initiates a similar conversation with the same model. If the model fails to properly isolate the data from the two conversations, it might accidentally use information from Session A in Session B, thus compromising User A's confidentiality.
- Example 2: A model trained on large amounts of conversational data might inadvertently generate responses containing information specific to a previous user (such as personal details or sensitive data) during a conversation with another user.
Protection Measures Against Cross Discussion Data Leakage:
- Strict Session Isolation: Models should be designed to treat each user session as a distinct context, preventing any cross-session data contamination. This means that information shared in one conversation should never reappear in another.
- Data Purge Between Interactions: If a model is able to temporarily retain information to provide contextually coherent responses, it is essential that this information is erased once the session ends or that "purge" mechanisms are applied after each interaction.
- Security Audits and Leak Monitoring: LLMs should be regularly audited to check for potential data leaks between conversations. This includes testing scenarios where a model might accidentally reuse information from one user to another.
- Anonymization-Based Models: Another solution could involve using robust anonymization techniques, where sensitive information is explicitly removed before a model can reuse parts of the interaction in another conversation.
Conclusion:
- Cross Discussion Data Leakage is a significant concern for the privacy of users interacting with language models. Appropriate protection mechanisms must be implemented to ensure that information from one session is not disclosed in another, and that each interaction is treated in a securely isolated manner.