Prompt Engineering – A Cognitive Approach

A prompt is a clue or instruction that triggers a response or action and is often used to encourage writing, thinking or speaking. It can be very straightforward (e.g. ‘List all the main characters in this book’), require additional information (e.g. ‘What ended in 1964?’) or be open-ended (e.g. ‘Why do you think people want to live near power plants?’).

In (school) teaching, especially in the subject of didactics, the curriculum includes how to write effective prompts. Among other things, writing good prompts is crucial for teachers to stimulate students' thinking, foster their creativity and enable in-depth learning. Effective prompts help to guide responses, create clear expectations and open up opportunities for critical reflection or exploration.

When formulating a prompt, teachers are often advised to:

1. Provide the context, definitions or background information needed for a full understanding of the prompt.
2. Formulate the prompt clearly and unambiguously. State clearly what is expected and use precise language.
3. Specify the scope by indicating the expected length or depth of the answer. This helps learners to assess how detailed their answer should be.
4. Break down complex prompts into smaller, more manageable parts or guide learners through the answer step by step.
5. Test the prompts with a small group of students or colleagues to see how they interpret and respond to the prompts. The feedback can then be used to improve clarity, complexity and engagement.

Prompt engineers reading these lines might be thinking: What? That's exactly what I do when I'm trying to get a precise response from a language model. Yes, there does indeed appear to be overlap between writing prompts for humans and for artificial agents. Well-designed prompts can significantly influence the performance of both language models (LMs) and humans.

What is prompt engineering?

A prompt here is a piece of text input provided to a language model to elicit a response. It provides the context or direction for the model output. Prompts are therefore a viable strategy for quickly adapting language models to new domains and tasks. Prompt engineering is the process of developing and refining these prompts. Laria Reynolds and Kyle McDonell even describe prompt engineering in 2021 as ‘programming in natural language’. The task of prompt engineering is to create the design and content of the prompt. This can be done manually by designing a custom prompt or automatically. In general, the goal of a prompt engineer is to create prompts that cause the language model to produce useful, relevant and accurate results for the task at hand.

What kind of tasks are we talking about?

We are referring here to prompt engineering in the context of natural language tasks. Language plays a fundamental role in communication between humans and their interaction with machines. Therefore, there is a growing need to develop Language Models that can perform complex natural language tasks. LMs are computer programmes that can process and generate text, known as Natural Language Processing (NLP). Typical NLP tasks include:

Text classification: either by sentiment (positive, negative, etc.), topic (e.g. sports, politics) or spam (yes or no).
Named Entity Recognition (NER): identifying and classifying key entities in the text, such as brands, places, characters and so on.
Part-of-Speech Tagging: assigning parts of speech (such as noun, verb, adjective) to each word in a sentence.
Translations: translating text from one language to another (such as from Latin to English).
Text summarisation: Creating a summary by extracting or abstracting important information from a longer text.
Text generation: Creating human-sounding texts based on a given input, such as completing sentences, creating newspaper articles or writing fantasy stories.

We encounter these tasks every day, whether at university, at work or in research. At school, teachers often gave us very specific tasks and an expected outcome. As soon as we are on our own, we are either prompted by our colleagues, our boss or ourselves. In the latter case, we often have to collect ideas, contextual information, define a goal and perhaps search for examples ourselves. This can be very labour-intensive. Consider how many requirements are necessary for a good user story before it can be successfully implemented. The quality of the prompt has a major influence on how well we solve the associated task.

How do you design a good prompt?

While teachers learn how to design good prompts in didactic courses, general and helpful guidelines for writing prompts for machines can be found either on the websites of AI providers, in general prompting guides such as www.promptingguide.ai or in scientific research. We did the latter. In doing so, we found that some prompts produce more precise results than others, but that there are seemingly uncontrollable factors. Powerful prompts do not work equally well in all language models. Each of them (including GPT-4 or LLaMa) seems to be an agent with specific characteristics and needs. Even the order of the semantic building blocks can influence the result.

Is designing a prompt for LMs the same as for humans?

In 2021, Reynolds and McDonell evaluated prompts from the perspective of natural language and how they are formulated. When designing a prompt, the same considerations of tone, implication, plausibility, style and ambiguity must be taken into account as for human addressees, since language models such as GPT-3 are trained on the basis of natural language. The various components that a prompt can contain are summarised at www.promptingguide.ai.

Structure, types and structuring of prompts

You will notice that you have already seen most of these components in assignments at school or even at work. An instruction is a clear and concise command that tells the language model what to do (for example, ‘Calculate how many chickens and donkeys live on the farm’). The context provides additional background information or details (for example, ‘Chickens and donkeys live on the farm. Alma counts 245 heads and 144 legs."). Practice examples are specific input-output pairs that illustrate the desired result – this could be an example from Heidi, who previously counted goats and marmots on her own farm. There has been a lot of research recently on the order, number and fit of such examples, especially in the context of contextual learning. A prompt without practice examples is often referred to as a zero-shot prompt, one example as a one-shot and several examples as a few-shot prompt. In addition, there are instructional prompts that work well without examples because a clear instruction provides a functional keyword: for example, ‘Translate this text into Dutch’ (keyword: translate) or ‘Sort animals by average size’ (keyword: sort). Specifications or task summaries contain detailed and descriptive prompts that guide the language model more precisely (e.g. ‘Create a job ad and make sure the tone is formal’). To clearly separate individual sections in a prompt, separators can be used (e.g. ‘### context ###’). An output indicator can also be specified, which includes the expected format or structure of the response (including ‘Output as chickens: [number of chickens], donkeys: [number of donkeys]’). Finally, it is possible to define a target group for the response to the prompt. An example of this would be: ‘What is cloud computing? Explain it as if I were five years old.’

The role of token restriction

Prompts can only contain a maximum number of tokens. A token can be a complete word, a part of a word, or even just a single character. 100 tokens correspond approximately to 75 words in the English language. Depending on the GPT model, up to 128,000 tokens can be distributed for the prompt and the corresponding answer. Careful consideration of each character within a prompt may be necessary. For humans, the size of the prompt also plays a role. Humans, like machines, have a limited working memory. They can only process a certain number of words at a time. Anyone who works in an agile environment is familiar with those stories on a pinboard that seem to have no end and hide a series of unidentifiable subtasks in a jumble of text. If tasks are not connected, we advise separating them. However, breaking a complex task into several steps, for example through step-by-step processes such as debating, planning or sequential reasoning, can lead to a more efficient use of analogue working memory. How exactly you divide the prompt into steps depends on how much the next steps are based on the context of the previous ones. The maximum number of tokens does not have to be used in one input. In the example of ChatGPT, you can enter the text and press the send button multiple times – in a discussion on a topic, this counts as one prompt. However, as soon as the maximum number of tokens is reached, the first tokens start to run out of the frame and no longer have any influence on the ongoing conversation.

Metaprompt programming: More efficiency through self-generated prompts

In addition, Reynolds and McDonell present metaprompt programming, which enables language models to independently generate further useful prompts for solving the respective task with the help of so-called metaprompts. The use of metaprompts can allow language models to solve problems effectively by using their own task-specific instructions without the need for training examples. The process, in turn, consumes fewer tokens. Examples of a meta prompt would be: ‘Let's solve the problem by breaking it down into several steps,’ ‘List the pros and cons before making a decision,’ or ‘Ask questions about the topic before trying to answer the question.’

Self-review and reflection for improved results

Another typical approach that we probably all remember from the last few minutes before submitting an exam is verification mechanisms. We carefully review our answers, reflect, add to them, and adjust them. Among others, Chung Wang and other authors have shown 2023 performance improvements when using self-review methods, where the learner has to think about their answers afterwards. Imagine that the language model has extracted OASIS as a sight in London because apparently everyone is going there recently and buying tickets. One might wonder whether OASIS is really a sight. On closer inspection, the language model might realise that OASIS is more of a band. Another example of reflection before final output is the ‘Take A Deep Breath’ strategy used in natural language processing prompting. This consists of encouraging an LM to produce a more detailed or well-thought-out response by explicitly asking it to ‘take a deep breath’ before replying, thus prompting it to think or reflect. This technique aims to improve the quality and consistency of the language model's output by simulating a moment of reflection that leads to more precise and comprehensive answers.

Analogy and change of perspective in prompt design

Furthermore, we use analogies in human communication by using memetic concepts such as characters as proxies for an intention. For example: What would your grandma think if she knew about it? Each person will give us an individual answer to the same question based on their subjective experiences, opinions and thoughts. For this reason, we continue to collect answers from different people to the same question or try to take different perspectives to find a well-thought-out answer. According to Reynolds and McDonell, GPT-3 has the ability to simulate famous personalities such as Mahatma Gandhi or Margaret Atwood, which provides access to various prejudices and cultural information on, for example, moral issues. Tip: Just tell ChatGPT that you are Barney Stinson and ask him for dating tips.

Conclusion

Designing precise, unambiguous, and efficient prompts is not easy. Designing prompts for artificial agents that push them to peak performance is at least as difficult as designing prompts for humans. In principle, educators could be considered highly trained ‘natural language programmers’. Although machines are not in school like students, there are important similarities where research into how to write good prompts in education, communication and cognitive studies can, in our opinion, guide the study of prompt engineering.

Would you like to learn more about exciting topics from the adesso world? Then take a look at our previously published blog posts.

Author Milena Fluck

Milena Fluck has been a software engineer at adesso since 2020 and has extensive project experience in the healthcare sector. Her current focus is on the use of JavaScript and TypeScript for front- and back-end development. She favours test-driven development. Of course, meaningful unit tests are a must.

Author Andy Schmidt

Andy is a certified software architect. He deals with the latest findings on architecture and software development in the healthcare sector.

Category:	AI
Tags:	Artificial Intelligence (AI)