Project Introduction

The Chinese version of the course ChatGPT Prompt Engineering for Developers by Wu Enda, the main content of which is to guide developers how to build Prompt and build new LLM based applications based on OpenAI API, including the principles of writing Prompt; Text summary (such as summarizing user comments); Text inference (such as emotion classification, topic extraction); Text conversion (such as translation and automatic error correction); Extension (such as writing email);

 Chinese notes of ChatGPT Prompt Engineering for Developers, I

Project significance

LLM is gradually changing people's lives. For developers, how to quickly and conveniently develop some applications with stronger capabilities and integrate LLM based on the API provided by LLM to easily realize some more novel and practical capabilities is an important ability that needs to be learned. The ChatGPT Prompt Engineering for Developers tutorial, jointly launched by Mr. Wu Enda and OpenAI, is a classic tutorial for beginner LLM developers. It introduces in simple terms how to construct a Prompt for developers and implement multiple common functions based on the API provided by OpenAI, including summary, inference, conversion, etc. It is a classic tutorial for beginner LLM development. Therefore, we translated the course into Chinese, reproduced its sample code, and added Chinese subtitles to the original video to support domestic Chinese learners to use it directly, so as to help Chinese learners better learn LLM development.

Project Audience

It is applicable to all developers who have basic Python capabilities and want to start LLM.

Project highlights

ChatGPT Prompt Engineering for Developers, as an official course jointly launched by Mr. Wu Enda and OpenAI, will become an important introductory course for LLM in the foreseeable future. However, at present, only the English version is supported and domestic access is limited. It is of great significance to create a Chinese version of the course with smooth domestic access.

content syllabus

catalog:

  1. Introduction
  2. Guidelines for Compiling Prompt
  3. Iterative prompt development Prompt Iterative
  4. Text Summary
  5. Text Inferring
  6. Text Transforming
  7. Text Expanding
  8. Chat Chatbot
  9. summary

Chapter I Introduction

Welcome to this course. We will introduce the ChatGPT prompt project to developers. This course is taught by Professor Isa Fulford and me. Isa Fulford is a member of OpenAI's technical team. He has developed the popular ChatGPT retrieval plug-in, and has made great contributions to teaching people how to use LLM or LLM technology in products. She was also involved in writing the OpenAI cookbook that teaches people to use Prompt.

There are many materials about tips on the Internet, such as articles such as "30 prompts every one has to know". These articles mainly focus on the ChatGPT Web user interface, which many people are using to perform specific, usually one-time tasks. However, I think the more powerful function of LLM or large language model as a developer is to use API to call LLM to quickly build software applications. I don't think this aspect has received sufficient attention. In fact, we are in DeepLearning The team of AI Fund, a sister company of AI, has been working with many start-ups to apply these technologies to many different applications. It's exciting to see that LLM API enables developers to build applications very quickly.

In this course, we will share with you some possibilities and best practices on how to implement them.

With the development of Large Language Model (LLM), LLM can be roughly divided into two types, namely Basic LLM and command fine tuning LLM Basic LLM is a model based on text training data to train the ability to predict the next word. It is usually trained on a large amount of data from the Internet and other sources. For example, if you take "there was a unicorn before" as a hint, the basic LLM may continue to predict "living in a magical forest with all unicorn friends". However, if you take "what is the capital of France" as a hint, the basic LLM may predict the answer as "what is the largest city in France? What is the population of France?" according to the articles on the Internet, because the articles on the Internet are probably a list of questions and answers about France.

The research and practice of many LLMs are driven by the command adjusted LLMs. Instruction adjusted LLMs have been trained to follow instructions. Therefore, if you ask it, "What is the capital of France?", it is more likely to export "Paris is the capital of France". The training of instruction adjusted LLMs usually starts from the basic LLMs that have been trained. This model has been trained on a large number of text data. Then, use the data set whose input is instructions and output is the result it should return to fine tune it, and ask it to follow these instructions. Then, a technology called RLHF (reinforcement learning from human feedback) is usually used for further improvement to enable the system to follow instructions more helpfully.

Because instruction adjusted LLMs have been trained to be beneficial, honest and harmless, they are less likely to output problematic text, such as harmful output, than basic LLMs. Many actual use scenarios have turned to LLMs for command adjustment. Some of the best practices you find on the Internet may be more applicable to basic LLMs, but for most practical applications today, we recommend focusing on the LLMs for instruction adjustment. These LLMs are easier to use, and because of the work of OpenAI and other LLMs, they become more secure and coordinated.

Therefore, this course will focus on the best practices of LLM tuning for instructions, which we recommend for most applications. Before continuing, I would like to thank OpenAI and DeepLearning.ai for their contributions to Izzy and the materials I have provided. I am very grateful to Andrew Main, Joe Palermo, Boris Power, Ted Sanders and Lillian Weng of OpenAI who participated in the development and review of our brainstorming materials and prepared the curriculum for this short-term course. I also appreciate the work of Geoff Ladwig, Eddy Shyu and Tommy Nelson in Deep Learning.

When you use instructions to adjust LLM, it is similar to considering providing instructions to another person, assuming that it is a smart person who does not know the specific details of your task. When LLM fails to work normally, it is sometimes because the instructions are not clear enough. For example, if you say "Please write something about Alan Turing for me", it clearly indicates that you hope that the text will be more helpful if it focuses on his scientific work, personal life, historical role or other aspects. More importantly, you can also specify that the text adopt the tone of writing like a professional reporter, or more like an essay you write to a friend.

Of course, if you imagine that a newly graduated college student will complete this task for you, you can even specify in advance which text fragments they should read to write the text about Alan Turing, which will help the newly graduated college student to successfully complete this task. In the next chapter, you will see how to make prompt clear, an important principle for creating prompt. You will also learn from the second principle of prompt to give LLM time to think.

Chapter II Principles of Compiling Prompt

The main content of this chapter is the principles of writing Prompt. In this chapter, we will give two principles of writing Prompt and some related strategies. You will practice writing effective Prompt based on these two principles, so as to use LLM conveniently and effectively.

1、 Environment configuration

This tutorial uses the ChatGPT API opened by OpenAI, so you need to first have a ChatGPT API_KEY (you can also directly visit the official website for online testing), and then you need to install the third-party library of OpenAI

First, you need to install the required third-party libraries:

openai:

 pip install openai

dotenv:

 pip install -U python-dotenv
 #Import your API-KEY into the system environment variable ! export OPENAI_API_KEY='api-key'
 import openai import os from dotenv import load_dotenv, find_dotenv #Import third-party library _ = load_dotenv(find_dotenv()) #Reading environment variables in the system openai.api_key  = os.getenv('OPENAI_API_KEY') #Set API_KEY

We will further explore the use of the ChatCompletion API provided by OpenAI in subsequent courses. Here, we first encapsulate it into a function. You don't need to know its internal mechanism, just know that calling the function and entering Prompt will give you the corresponding Completion.

 #A function that encapsulates the OpenAI interface. The parameter is Prompt, and the corresponding result is returned def get_completion(prompt, model="gpt-3.5-turbo"): ''' Prompt: corresponding prompt Model: The called model is gpt-3.5-turbo (ChatGPT) by default. Users qualified for internal testing can select gpt-4 ''' messages = [{"role": "user", "content": prompt}] response = openai.ChatCompletion.create( model=model, messages=messages, Temperature=0, # temperature coefficient of model output, control the randomness of output ) #Call the ChatCompletion interface of OpenAI return response.choices[0].message["content"]

2、 Two basic principles

Principle 1: Write clear and specific instructions

You should express what you want the model to do by providing as clear and specific instructions as possible. This will guide the model to give correct output and reduce the possibility of irrelevant or incorrect responses. Writing clear instructions does not mean short instructions, because in many cases, longer prompts are actually clearer and provide more context, which may actually lead to more detailed and relevant output.

Strategy 1: Use separators to clearly represent different parts of input , the separator can be: ` , "",<>, <tag>,< tag>, etc

You can use any obvious punctuation to separate specific parts of the text from the rest of the prompt. This can be any tag that makes it clear to the model that this is a separate part. Using separators is a useful technique to avoid prompt injection. Prompt injection means that if the user adds some input to the prompt, it may provide the model with instructions that conflict with the operations you want to perform, so that it follows the conflicting instructions instead of performing the operations you want. That is, the input may contain other instructions, which will overwrite your instructions. In this regard, using separators is a good strategy.

The following is an example. We give a paragraph and ask GPT to summarize. In this example, we use ` As a separator:

 text = f""" You should provide as clear and specific instructions as possible to express the tasks you want the model to perform\ This will guide the model toward the desired output and reduce the likelihood of receiving irrelevant or incorrect responses\ Don't confuse writing clear tips with writing short ones\ In many cases, longer prompts can provide more clarity and contextual information for the model, resulting in more detailed and relevant output. """ #Text content to be summarized prompt = f""" Summarize the text enclosed in three back quotes into one sentence. ```{text}``` """ #Instruction content, use '' to separate instructions and content to be summarized response = get_completion(prompt) print(response)
 Provide clear and specific instructions, avoid irrelevant or incorrect responses, and do not confuse clear and short writing. Longer prompts can provide more clarity and contextual information, leading to more detailed and relevant output.

Strategy 2: Require a structured output , can be Json, HTML, etc

The second strategy is to generate a structured output, which can make the output of the model easier to parse. For example, you can read it into a dictionary or list in Python..

In the following example, we require GPT to generate the title, author and category of three books, and require GPT to return them to us in Json format. To facilitate parsing, we specify the key of Json.

 prompt = f""" Please generate a list of three fictional books, including the title, author and category\ It is provided in JSON format, including the following keys: book_id title、author、genre。 """ response = get_completion(prompt) print(response)
 { "books": [ { "book_id": 1, "title": "The Shadow of the Wind", "author": "Carlos Ruiz Zafón", "genre": "Mystery" }, { "book_id": 2, "title": "The Name of the Wind", "author": "Patrick Rothfuss", "genre": "Fantasy" }, { "book_id": 3, "title": "The Hitchhiker's Guide to the Galaxy", "author": "Douglas Adams", "genre": "Science Fiction" } ] }

Strategy 3: Require the model to check whether it meets the conditions

If the assumptions made by the task are not necessarily satisfied, we can tell the model to check these assumptions first, and if not, instruct and stop execution. You can also consider potential edge situations and how the model should handle them to avoid unexpected errors or results.

In the following example, we will give the model two pieces of text, namely, the steps of making tea and a text without specific steps. We will ask the model to judge whether it contains a series of instructions. If yes, we will rewrite the instructions in the given format. If not, we will answer that no steps are provided.

 #Stepped text text_1 = f""" It's easy to make a cup of tea. First, we need to boil the water\ While waiting, take a cup and put the tea bag in\ Once the water is hot enough, pour it on the tea bag\ Wait for a while and let the tea soak. After a few minutes, take out the tea bag\ If you like, you can add some sugar or milk to taste\ In this way, you can enjoy a cup of delicious tea. """ prompt = f""" You will get the text enclosed in three quotation marks\ If it contains a series of instructions, these instructions need to be rewritten in the following format: Step 1 - Step 2 -  Step N - If the text does not contain a series of instructions, write "No steps provided" directly. " \"\"\"{text_1}\"\"\" """ response = get_completion(prompt) Print ("Summary of Text 1:") print(response)
 Summary of Text 1: Step 1 - boil the water. Step 2 - Take a cup and put the tea bag in. Step 3 - Pour boiling water on the tea bag. Step 4 - Wait a few minutes for the tea to soak. Step 5 - Take out the tea bag. Step 6 - If you like, you can add some sugar or milk to taste. Step 7 - In this way, you can enjoy a cup of delicious tea.
 #Stepless text text_2 = f""" The sun is shining and the birds are singing today\ It's a nice day to go for a walk in the park\ The flowers are in full bloom, and the branches are swaying gently in the breeze\ People go out and enjoy the beautiful weather. Some people are having a picnic. Some people are playing games or relaxing on the grass\ This is a perfect day to spend outdoors and enjoy the beauty of nature. """ prompt = f""" You will get the text enclosed in three quotation marks\ If it contains a series of instructions, these instructions need to be rewritten in the following format: Step 1 - Step 2 -  Step N - If the text does not contain a series of instructions, write "No steps provided" directly. " \"\"\"{text_2}\"\"\" """ response = get_completion(prompt) Print ("Summary of Text 2:") print(response)
 Summary of Text 2: No steps provided.

Strategy 4: Provide a few examples

That is, before asking the model to perform actual tasks, provide it with a few examples of successful task execution.

For example, in the following example, we tell the model that its task is to answer questions in a consistent style, and first give it an example of a dialogue between a child and a grandfather. The child said, "Teach me patience," and the grandfather replied with these metaphors. Therefore, since we have told the model to answer in a consistent tone, now we say "teach me resilience". Since the model already has this small sample example, it will answer the next task in a similar tone.

 prompt = f""" Your task is to answer questions in a consistent style. <Child>: Teach me patience. <Grandparents>: The river where the deepest canyon was dug originates from an inconspicuous spring; The most magnificent symphony begins with a single note; The most complex tapestry begins with a lonely thread. <Children>: Teach me resilience. """ response = get_completion(prompt) print(response)
 <Grandparents>: Resilience is like a tree. It needs to experience wind and rain, winter and summer to grow stronger. In life, we also need to experience various setbacks and difficulties to exercise resilience. Remember, don't give up easily, stick to it, you will find yourself stronger.

Principle 2: Give the model time to think

If the model hastily draws the wrong conclusion, you should try to re conceive the query and request the model to make a series of related reasoning before providing the final answer. In other words, if you give the model a task that cannot be completed in a short time or with a small amount of text, it may guess incorrectly. This situation is the same for people. If you ask someone to complete a complex mathematical problem without having time to calculate the answer, they may also make mistakes. Therefore, in these cases, you can instruct the model to spend more time thinking about problems, which means that it spends more computing resources on tasks.

Strategy 1: Specify the steps required to complete the task

Next, we will show the effect of this strategy by giving a complex task and a series of steps to complete the task

First, we describe the story of Jack and Jill, and give a command. The instruction is to perform the following operations. First, summarize the text defined by three backquotes in one sentence. Second, translate the abstract into French. Third, list each name in the French summary. Fourth, output JSON objects containing the following keys: French summary and number of names. Then we will separate the answers with line breaks.

 text = f""" In a charming village, brother and sister Jack and Jill set out to fetch water from a mountaintop well\ They climbed up while singing happy songs\ Unfortunately, Jack stumbled over a stone and rolled down the mountain, followed by Jill\ Although they were slightly injured, they still returned to their warm home\ Despite such an accident, their spirit of adventure has not weakened and they continue to explore happily. """ # example 1 prompt_1 = f""" Do the following: 1 - Summarize the following text enclosed in three back quotes in one sentence. 2 - Translate the abstract into French. 3 - List each person's name in the French summary. 4 - Output a JSON object containing the following keys: French_summary, num_names. Please separate your answers with line breaks. Text: ```{text}``` """ response = get_completion(prompt_1) print("prompt 1:") print(response)
 prompt 1: 1 - Brother and sister had an accident while fetching water in the well at the top of the mountain, but they still kept an adventurous spirit. 2-Dans un charmant village, les frère et sœur Jack et Jill partent chercher de l'eau dans un puits au sommet de la montagne.  Malheureusement, Jack trébuche sur une pierre et tombe de la montagne, suivi de près par Jill.  Bien qu'ils soient légèrement blessés, ils retournent chez eux chaleureusement. Malgré cet accident, leur esprit d'aventure ne diminue pas et ils continuent à explorer joyeusement. 3-Jack, Jill 4-{ "French_summary": "Dans un charmant village, les frère et sœur Jack et Jill partent chercher de l'eau dans un puits au sommet de la montagne.  Malheureusement, Jack trébuche sur une pierre et tombe de la montagne, suivi de près par Jill.  Bien qu'ils soient légèrement blessés, ils retournent chez eux chaleureusement. Malgré cet accident, leur esprit d'aventure ne diminue pas et ils continuent à explorer joyeusement. ", "num_names": 2 }

There are still some problems with the above output. For example, the key "name" will be replaced by French. Therefore, we give a better Prompt, which specifies the output format

 prompt_2 = f""" 1 - Summarize the following text enclosed with<>in one sentence. 2 - Translate the abstract into English. 3 - List each name in the English summary. 4 - Output a JSON object containing the following keys: English_summary, num_names. Please use the following format: Text:<Text to summarize> Summary:<Summary> Translation:<translation of abstract> Name:<name list in English summary> Output JSON:<JSON with English_summary and num_names> Text: <{text}> """ response = get_completion(prompt_2) print("\nprompt 2:") print(response)
 prompt 2: Abstract: Brother and sister Jack and Jill took an adventure in the charming village. Unfortunately, they returned home after falling, but they were still full of adventure spirit. Translator: In a charging village, siblings Jack and Jill set out to fetch water from a mountaintop well. While climbing and singing, Jack trips on a stone and tumbles down the mountain, with Jill following closely behind. Despite some bruises, they make it back home safely. Their adventurous spirit remains undiminished as they continue to explore with joy. Name: Jack, Jill Output JSON: {"English_summary": "In a charging village, siblings Jack and Jill set out to fetch water from a mountaintop well. While climbing and singing, Jack trips on a stone and tumbles down the mountain, with Jill following closely behind. Despite some bruises, they make it back home safely. Their adventurous spirit remains undiminished as they continue to explore with joy.", "num_names": 2}

Strategy 2: guide the model to find a solution before making a conclusion

Sometimes, we will get better results when we make clear that the guidance model needs to think about solutions before making decisions.

Next, we will give a question and a student's answer, asking the model to judge whether the answer is correct

 prompt = f""" Judge whether the student's solution is correct. Question: I'm building a solar power station, and I need help with financial calculations. The cost of land is $100 per square foot I can buy solar panels for 250 dollars/square foot I have negotiated a maintenance contract, which requires a fixed annual payment of $100000 and an additional payment of $10 per square foot As a function of square feet, what is the total cost of the first year's operation. Student's solution: Let x be the size of the power station, in square feet. cost: Land cost: 100x Solar panel cost: 250x Maintenance cost: $100000+100x Total cost: 100x+250x+100000 US dollars+100x=450x+100000 US dollars """ response = get_completion(prompt) print(response)
 The student's solution is correct.

But notice that the student's solution is actually wrong.

We can solve this problem by guiding the model to find a solution.

In the next Prompt, we ask the model to solve the problem by itself first, and then compare its solution with the student's solution to determine whether the student's solution is correct. At the same time, we have given the output format requirements. By clarifying the steps, the model has more time to think, and sometimes more accurate results can be obtained. In this example, the student's answer is wrong, but if we don't let the model calculate first, we may be misled into thinking that the student is right.

 prompt = f""" Please judge whether the student's solution is correct. Please solve this problem through the following steps: Step: First, solve problems by yourself. Then compare your solution with the student's solution and evaluate whether the student's solution is correct. Do not decide whether the student's solution is correct until you complete the problem yourself. Use the following format: Question: Question Text Student's solution: student's solution text Actual Solution and Step: Actual Solution and Step Text Whether the student's solution is the same as the actual solution: Yes or No Student's grade: correct or incorrect Question: I'm building a solar power station, and I need help with financial calculations. -The cost of land is $100 per square foot -I can buy solar panels at $250 per square foot -I have negotiated a maintenance contract, which requires a fixed annual payment of $100000 and an additional payment of $10 per square foot As a function of square feet, what is the total cost of the first year's operation. Student's solution: Let x be the size of the power station, in square feet. cost: 1. Land cost: 100x 2. Solar panel cost: 250x 3. Maintenance cost: 100000+100x Total cost: 100x+250x+100000+100x=450x+100000 Actual solutions and steps: """ response = get_completion(prompt) print(response)
 Correct solutions and steps: 1. Calculate the land cost: $100/square foot * x square foot=$100x 2. Calculate the cost of solar panel: 250 dollars/square foot * x square foot=250x dollars 3. Calculate the maintenance cost: $100000+$10/square foot * x square foot=$100000+$10x 4. Calculate the total cost: 100x USD+250x USD+100000 USD+10x USD=360x+100000 USD Whether the student's solution is the same as the actual solution: No Student's score: incorrect

3、 Limitations

False knowledge : The model occasionally generates some knowledge that seems to be true but is actually fabricated

If the model is exposed to a lot of knowledge in the training process, it does not fully remember the information it sees, so it is not very clear about the boundaries of its knowledge. This means that it may try to answer questions about obscure topics and fabricate answers that sound reasonable but are not actually correct. We call these fabricated ideas illusions.

For example, in the following example, we are required to tell us the information about the AeroGlide UltraSlim Smart Toothrush product produced by Boie. In fact, the company is real, but the product is fabricated, and the model will tell us the fabricated knowledge seriously.

 prompt = f""" Tell me about AeroGlide UltraSlim Smart Toothrush produced by Boie Company """ response = get_completion(prompt) print(response)
 AeroGlide UltraSlim Smart Toothbrush produced by Boie Company is a smart toothbrush with the following characteristics: 1. Ultra thin design: the brush head is only 0.8mm thick, which can easily enter the deep mouth and clean more thoroughly. 2. Intelligent induction: The toothbrush is equipped with intelligent induction technology, which can automatically identify the position and direction of the brush head to ensure that each part is fully cleaned. 3. Efficient cleaning: The toothbrush adopts high-speed vibration technology, which can reach 40000 times per minute, and can effectively remove dental plaque and stains. 4. Easy to carry: The toothbrush adopts a portable design, which can be easily put into the pocket or travel bag for oral cleaning anytime and anywhere. 5. Environmental protection materials: toothbrushes are made of environmental protection materials, free of harmful substances and environmentally friendly. In a word, AeroGlide UltraSlim Smart Toothbrush produced by Boie Company is an efficient, intelligent and environmentally friendly toothbrush that can help users easily maintain oral health.

The model will output very real fake knowledge, which is sometimes dangerous. Therefore, be sure to use some of the techniques we introduced in this section to try to avoid this when building your own applications. This is a known weakness of the model and a problem that we are actively working to solve. When you want the model to generate answers based on the text, another strategy to reduce illusion is to first require the model to find any relevant references in the text, and then require it to use these references to answer questions. This method of tracing source documents is usually very helpful to reduce illusion.

Note: In this tutorial, we use to adapt the text to the screen size to improve the reading experience. GPT is not affected by , but when you call other large models, you need to consider whether will affect the model performance

Chapter III Iterative Prompt Development

When building applications using LLM, I never successfully used the Prompt required in the final application on the first attempt. But this is not important. As long as you have a good iterative process to constantly improve your Prompt, you can get a Prompt that is suitable for the task. I think in terms of tips, the probability of success for the first time may be higher, but as mentioned above, whether the first tip is effective is not important. The most important is the process of finding effective tips for your application.

Therefore, in this chapter, we will show some frameworks to remind you how to analyze and improve your Prompt iteratively with the example of generating marketing copy from the product specification.

If you have taken machine learning courses with me before, you may have seen a chart I used to illustrate the process of machine learning development. Usually, you have an idea first, and then implement it: write code, obtain data, train models, which will give you an experimental result. Then you can view the output results, conduct error analysis, find out where it works or does not work, and even change the exact idea or method of the problem you want to solve. Then you can change the implementation and run another experiment, and iterate repeatedly to obtain an effective machine learning model. When you write a Prompt to develop an application using LLM, the process may be very similar. You have an idea about the task to be completed. You can try to write the first Prompt to meet the two principles mentioned in the previous chapter: clear and definite, and give the system enough time to think. You can then run it and view the results. If the effect is not good at the first time, the iterative process is to find out why the instructions are not clear enough or why the algorithm has not been given enough time to think, so as to improve ideas, tips, and so on. Repeat several times until you find a suitable Prompt for your application.

Environment configuration

As in the previous chapter, we first need to configure the environment using the OpenAI API

 import openai import os from dotenv import load_dotenv, find_dotenv #Import third-party library _ = load_dotenv(find_dotenv()) #Reading environment variables in the system openai.api_key  = os.getenv('OPENAI_API_KEY') #Set API_KEY
 #A function that encapsulates the OpenAI interface. The parameter is Prompt, and the corresponding result is returned def get_completion(prompt, model="gpt-3.5-turbo"): ''' Prompt: corresponding prompt Model: The called model is gpt-3.5-turbo (ChatGPT) by default. Users qualified for internal testing can select gpt-4 ''' messages = [{"role": "user", "content": prompt}] response = openai.ChatCompletion.create( model=model, messages=messages, Temperature=0, # temperature coefficient of model output, control the randomness of output ) #Call the ChatCompletion interface of OpenAI return response.choices[0].message["content"]

Task - generate a marketing product description from the product description

Here is a product manual of the chair, which describes that it is part of a medieval inspiration family, and discusses the structure, size, chair options, materials, etc. The origin is Italy. Suppose you want to use this manual to help the marketing team write a marketing description for the online retail website:

 #Example: Product manual fact_sheet_chair = """ summary Part of the beautiful medieval style office furniture series, including filing cabinets, desks, bookcases, conference tables, etc. Various housing colors and base coatings are available. Plastic front and rear backrest decoration (SWC-100) or full decoration of 10 kinds of fabrics and 6 kinds of leather (SWC-110) are available. Base coating options are: stainless steel, matte black, gloss white or chrome. Chairs may or may not have arms. Applicable to family or business places. Meet the contract use qualification. structure Plastic coated aluminum base for five wheels. Pneumatic chair adjustment, convenient for lifting. size Width 53 cm | 20.87 in Depth 51 cm | 20.08 in Height 80 cm | 31.50 in Seat height 44 cm | 17.32 in Seat depth 41 cm | 16.14 in option Soft or hard floor wheel options. Two seat foam densities are available: medium (1.8 lbs/cubic foot) or high (2.8 lbs/cubic foot). No handrail or 8 position PU handrail. Material Science Housing base slide Cast aluminum with modified nylon PA6/PA66 coating. Shell thickness: 10 mm. chair HD36 foam Country of origin Italy """
 #Tip: Create a marketing description based on the manual prompt = f""" Your task is to help the marketing team create a product marketing description based on the technical specification. Write a product description according to the information provided in the marked technical specification. Technical description: ` ` ` {fact_sheet_hair}``` """ response = get_completion(prompt) print(response)
 Product description: We are proud to launch a beautiful series of medieval office furniture, including filing cabinets, desks, bookcases, conference tables, etc. Our products use a variety of shell colors and base coatings to meet your personalized needs. You can choose the plastic front and rear backrest decoration (SWC-100) or the comprehensive decoration of 10 kinds of fabrics and 6 kinds of leather (SWC-110) to make your office more comfortable and fashionable. Our base coating options include stainless steel, matte black, gloss white or chrome to meet your different needs. The chair can be used with or without arms, and is suitable for family or business places. Our products are qualified for contract use, providing you with more reliable protection. Our products use five wheels of plastic coated aluminum base, pneumatic chair adjustment, easy lifting. The dimensions are 53 cm | 20.87 inch wide, 51 cm | 20.08 inch deep, 80 cm | 31.50 inch high, 44 cm | 17.32 inch high and 41 cm | 16.14 inch deep, providing you with a comfortable use experience. Our products also provide soft floor or hard floor roller options. Two kinds of seat foam density are available: medium (1.8 pounds/cubic foot) or high (2.8 pounds/cubic foot), and no armrest or eight position PU armrest to meet your different needs. Our products use modified nylon PA6/PA66 coating cast aluminum shell base slider, the shell thickness is 10 mm, and the seat uses HD36 foam, providing you with a more comfortable use experience. The country of origin of our products is Italy, providing you with more high-quality quality assurance.

Problem 1: The generated text is too long

It seems to have written a good description, introduced an amazing medieval inspired office chair, and well fulfilled the requirements, that is, to write a product description from the technical specification. But when I see this, I think it is too long.

So I have an idea. I wrote a hint and got the result. However, I am not very satisfied with it, because it is too long, so I will clarify my tips and say that it can use up to 50 words.

Therefore, I solve this problem by requiring it to limit the length of generated text

 #The optimized Prompt is required to generate no more than 50 words of description prompt = f""" Your task is to help a marketing team create a  description for a retail website of a product based  on a technical fact sheet. Write a product description based on the information  provided in the technical specifications delimited by  triple backticks. Use at most 50 words. Technical specifications: ```{fact_sheet_chair}``` """ response = get_completion(prompt) print(response)
 Introducing our beautiful medieval-style office furniture collection, including filing cabinets, desks, bookcases, and conference tables. Choose from a variety of shell colors and base coatings,   with optional plastic or fabric/leather decoration. The chair features a plastic-coated aluminum base with five wheels and pneumatic height adjustment. Perfect for home or commercial use. Made in Italy.

Take out the answer and split it according to the space. The answer is 54 words, which completes my requirements well

 lst = response.split() print(len(lst))
 fifty-four
 #The optimized Prompt is required to generate no more than 50 words of description prompt = f""" Your task is to help the marketing team create a retail website description of the product based on the technical specification. Write a product description according to the information provided in the marked technical specification. Use up to 50 words. Technical Specifications: ` ` ` {fact_sheet_hair}``` """ response = get_completion(prompt) print(response)
 Medieval style office furniture series, including file cabinet, desk, bookcase, conference table, etc. Various colors and coatings are available, with or without handrails. Base coating options are stainless steel, matte black, gloss white or chrome. It is applicable to family or business places and meets the contract use qualification. Made in Italy.
 #Since Chinese requires word segmentation, the overall length is calculated here directly len(response)
 ninety-seven

LLM is good at following the very precise word limit, but not so good. Sometimes it will output 60 or 65 words of content, but this is reasonable. The reason for this is that LLM interprets text using something called a word breaker, but they tend to behave mediocrely in calculating characters. There are many different ways to try to control the length of the output you get.

Problem 2: The text focuses on the wrong details

The second problem we will find is that this website does not sell furniture directly to consumers. It actually aims to sell furniture to furniture retailers, who will pay more attention to the technical details and materials of chairs. In this case, you can modify the prompt to describe the technical details of the chair more accurately.

Solution: Ask it to focus on aspects related to the target audience.

 #The optimized Prompt explains the object-oriented nature and focus prompt = f""" Your task is to help the marketing team create a retail website description of the product based on the technical specification. Write a product description according to the information provided in the marked technical specification. This description is intended for furniture retailers, so it should be of a technical nature and focus on the material structure of the product. Use up to 50 words. Technical Specifications: ` ` ` {fact_sheet_hair}``` """ response = get_completion(prompt) print(response)
 This medieval style office furniture series includes file cabinets, desks, bookcases and conference tables, which are suitable for home or business places. A variety of housing colors and base coatings are available, and the base coating options are stainless steel, matte black, shiny white or chrome. Chairs can be equipped with or without armrests, soft floor or hard floor rollers can be selected, and two kinds of seat foam density can be selected. The sliding part of the shell base is made of cast aluminum coated with modified nylon PA6/PA66, and the seat is made of HD36 foam. The country of origin is Italy.

I may further want to include the product ID at the end of the description. Therefore, I can further improve this prompt by requiring that each seven character product ID included in the technical description be included at the end of the description.

 #Further prompt = f""" Your task is to help the marketing team create a retail website description of the product based on the technical specification. Write a product description according to the information provided in the marked technical specification. This description is intended for furniture retailers, so it should be of a technical nature and focus on the material structure of the product. At the end of the description, include the product ID of each 7 characters in the technical specification. Use up to 50 words. Technical Specifications: ` ` ` {fact_sheet_hair}``` """ response = get_completion(prompt) print(response)
 This medieval style office furniture series includes file cabinets, desks, bookcases and conference tables, which are suitable for home or business places. A variety of housing colors and base coatings are available, and the base coating options are stainless steel, matte black, shiny white or chrome. The chair can be equipped with or without armrests, and can be decorated with plastic front and rear backrest or 10 kinds of fabrics and 6 kinds of leather. The seat adopts HD36 foam, which can be medium or high density. The height of the seat is 44cm, and the depth is 41cm. The sliding part of the housing base is made of cast aluminum coated with modified nylon PA6/PA66, and the thickness of the housing is 10mm. The country of origin is Italy. Product ID: SWC-100/SWC-110.

Question 3: Need a description in tabular form

The above is a short example of iterative prompt development that many developers usually experience. My suggestion is that, as demonstrated in the previous chapter, Prompt should be clear and explicit, and give the model some time to think when necessary. On the basis of these requirements, it is usually worth trying to write a Prompt first to see what will happen, and then iteratively improve the Prompt from there to gradually approach the desired results. Therefore, many successful Prompts are obtained through this iterative process. I will show you a more complex prompt example, which may give you a deeper understanding of the capabilities of ChatGPT.

Here I add some additional instructions, requiring it to extract information and organize it into tables, specify the table columns, table names and formats, and also require it to format all content into HTML that can be used on the web page.

 #It is required to extract information and organize it into a table, and specify the column, table name and format of the table prompt = f""" Your task is to help the marketing team create a retail website description of the product based on the technical specification. Write a product description according to the information provided in the marked technical specification. This description is intended for furniture retailers, so it should be of a technical nature and focus on the material structure of the product. At the end of the description, include the product ID of each 7 characters in the technical specification. After the description, include a table to provide the dimensions of the product. The table should have two columns. The first column contains the name of the dimension. The second column contains measurements in inches only. Name the table Product Size. Format all content in HTML format that can be used for the website. Place the description in the<div>element. Technical Specifications: ` ` ` {fact_sheet_hair}``` """ response = get_completion(prompt) print(response)
 <div> <h2>Medieval style office furniture series chairs</h2> <p>This chair is part of the Medieval style office furniture series, suitable for home or business places. It is available in a variety of shell colors and base coatings, including stainless steel, matte black, gloss white or chrome. You can select chairs with or without armrests, and soft or hard floor wheel options. In addition, you can choose between two seat foam densities: medium (1.8 pounds/cubic foot) or high (2.8 pounds/cubic foot)</ p> <p>The sliding part of the chair shell base is cast aluminum coated with modified nylon PA6/PA66, and the shell thickness is 10mm. The seat adopts HD36 foam, and the base is a plastic coated aluminum base with five wheels, which can be adjusted by pneumatic chair for easy lifting. In addition, the chair meets the contract qualification and is your ideal choice</ p> <p>Product ID: SWC-100</p> </div> <table> <caption>Product size</caption> <tr> <th>Width</th> <td>20.87 in</td> </tr> <tr> <th>Depth</th> <td>20.08 in</td> </tr> <tr> <th>Height</th> <td>31.50 in</td> </tr> <tr> <th>Seat height</th> <td>17.32 in</td> </tr> <tr> <th>Seat depth</th> <td>16.14 in</td> </tr> </table>
 #The table is rendered in HTML format and loaded from IPython.display import display, HTML display(HTML(response))

Medieval style office furniture series chairs

This chair is part of the Medieval style office furniture series, suitable for home or business places. It is available in a variety of shell colors and base coatings, including stainless steel, matte black, gloss white or chrome. You can select chairs with or without armrests, and soft or hard floor wheel options. In addition, you can choose between two seat foam densities: medium (1.8 pounds/cubic foot) or high (2.8 pounds/cubic foot).

The sliding part of the chair shell base is cast aluminum coated with modified nylon PA6/PA66, and the shell thickness is 10mm. The seat adopts HD36 foam, and the base is a plastic coated aluminum base with five wheels, which can be adjusted by pneumatic chair for easy lifting. In addition, the chair meets the contract qualification and is your ideal choice.

Product ID: SWC-100

Product size
width 20.87 in
depth 20.08 in
height 31.50 in
Seat height 17.32 in
Seat depth 16.14 in

The main content of this chapter is the iterative prompt development process of LLM in developing applications. Developers need to try to write prompts first, and then gradually improve them through iterations until they get the desired results. The key is to have an effective process of developing Prompt, rather than knowing the perfect Prompt. For some more complex applications, multiple samples can be prompted for iterative development and evaluated. Finally, you can test the average or worst performance of multiple Prompts on multiple samples in more mature applications. When using the Jupyter code notebook example, try different changes and see the results.

Chapter IV Text Summary

introduction

There is so much text information in the world today that almost no one has enough time to read all the things we want to know. However, it is gratifying to note that LLM has demonstrated a strong level in text summary tasks, and many teams have already inserted this function into their software applications.

This chapter will introduce how to use the programming method to call the API interface to realize the "text summary" function.

First, we need the OpenAI package, load the API key, and define the getCompletion function.

 import openai import os OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY") openai.api_key = OPENAI_API_KEY def get_completion(prompt, model="gpt-3.5-turbo"):  messages = [{"role": "user", "content": prompt}] response = openai.ChatCompletion.create( model=model, messages=messages, Temperature=0, the lower the # value, the lower the randomness of the output text ) return response.choices[0].message["content"]

Single text summary Prompt experiment

Here we give an example of product reviews. For e-commerce platforms, there are often a large number of product reviews on the website, which reflect the ideas of all customers. If we have a tool to summarize these massive and lengthy comments, we can quickly browse more comments and understand customers' preferences, so as to guide the platform and merchants to provide better services.

Input text (Chinese translation)

 prod_review_zh = """ This panda doll is my birthday gift for my daughter. She likes it very much and takes it everywhere. The doll is soft, super cute, and has a nice facial expression. But compared with the price, It's a little small. I feel I can buy a bigger one at the same price elsewhere. The express arrived a day earlier than expected, so I had a party before giving it to my daughter. """

Limit output text length

We tried to limit the text length to a maximum of 30 words.

Chinese translation

 prompt = f""" Your task is to generate a short summary of product reviews from e-commerce websites. Please summarize the comment text between the three back quotes, up to 30 words. Comments: ` ` ` {prod_review_zh}``` """ response = get_completion(prompt) print(response)
 The cute soft panda doll is liked by her daughter. Her facial expression is kind, but the price is a little expensive. The express delivery arrived one day in advance.

Key perspectives

Sometimes, for different businesses, our emphasis on text will be different. For example, for the product review text, logistics will pay more attention to transportation timeliness, businesses will pay more attention to price and product quality, and the platform will pay more attention to the overall service experience.

We can focus on a specific angle by adding Prompt hints.

Focus on transportation

Chinese translation

 prompt = f""" Your task is to generate a short summary of product reviews from e-commerce websites. Please summarize the comment text between the three back quotes, with a maximum of 30 words, and focus on product transportation. Comments: ` ` ` {prod_review_zh}``` """ response = get_completion(prompt) print(response)
 The delivery arrived in advance. The panda doll is soft and cute, but it is a little small, and the price is not very cost-effective.

As you can see, the output result starts with "delivery arrived one day ahead of schedule", which reflects the focus on delivery efficiency.

Focus on price and quality

Chinese translation

 prompt = f""" Your task is to generate a short summary of product reviews from e-commerce websites. Please summarize the comment text between the three back quotes, with a maximum of 30 words, and focus on product price and quality. Comments: ` ` ` {prod_review_zh}``` """ response = get_completion(prompt) print(response)
 The cute soft panda doll has a friendly facial expression, but its price is a little high and its size is small. Express delivery arrived one day ahead of schedule.

As you can see, the output results begin with "good quality, small price and small size", which reflects the emphasis on product price and quality.

Key information extraction

In section 2.2, although we add a Prompt that focuses on key aspects to make the text summary more focused on a specific aspect, we can find that some other information will also be retained in the results, such as the summary of price and quality still retains the information of "delivery in advance". Sometimes this information is helpful, but if we only want to extract information from a certain angle and filter out all other information, we can ask LLM to do "Extract" instead of "Summarize".

Chinese translation

 prompt = f""" Your task is to extract relevant information from product reviews on e-commerce websites. Please extract the information related to product transportation from the comment text between the following three back quotes, up to 30 words. Comments: ` ` ` {prod_review_zh}``` """ response = get_completion(prompt) print(response)
 The express delivery arrived a day earlier than expected.

Multiple Text Summarization Prompt Experiment

In the actual workflow, we often have many comments. The following shows an example of calling the "Text Summary" tool based on the for loop and printing in turn. Of course, in actual production, it is unrealistic to use the for loop for millions or even tens of millions of comment texts. It may be necessary to consider ways such as integrating comments and distributing to improve computing efficiency.

 review_1 = prod_review  # review for a standing lamp review_2 = """ Needed a nice lamp for my bedroom, and this one \ had additional storage and not too high of a price \ point. Got it fast - arrived in 2 days. The string \ to the lamp broke during the transit and the company \ happily sent over a new one. Came within a few days \ as well. It was easy to put together. Then I had a \ missing part, so I contacted their support and they \ very quickly got me the missing piece! Seems to me \ to be a great company that cares about their customers \ and products.  """ # review for an electric toothbrush review_3 = """ My dental hygienist recommended an electric toothbrush, \ which is why I got this. The battery life seems to be \ pretty impressive so far. After initial charging and \ leaving the charger plugged in for the first week to \ condition the battery, I've unplugged the charger and \ been using it for twice daily brushing for the last \ 3 weeks all on the same charge. But the toothbrush head \ is too small. I’ve seen baby toothbrushes bigger than \ this one. I wish the head was bigger with different \ length bristles to get between teeth better because \ this one doesn’t.  Overall if you can get this one \ around the $50 mark, it's a good deal. The manufactuer's \ replacements heads are pretty expensive, but you can \ get generic ones that're more reasonably priced. This \ toothbrush makes me feel like I've been to the dentist \ every day. My teeth feel sparkly clean!  """ # review for a blender review_4 = """ So, they still had the 17 piece system on seasonal \ sale for around $49 in the month of November, about \ half off, but for some reason (call it price gouging) \ around the second week of December the prices all went \ up to about anywhere from between $70-$89 for the same \ system. And the 11 piece system went up around $10 or \ so in price also from the earlier sale price of $29. \ So it looks okay, but if you look at the base, the part \ where the blade locks into place doesn’t look as good \ as in previous editions from a few years ago, but I \ plan to be very gentle with it (example, I crush \ very hard items like beans, ice, rice, etc. in the \  blender first then pulverize them in the serving size \ I want in the blender then switch to the whipping \ blade for a finer flour, and use the cross cutting blade \ first when making smoothies, then use the flat blade \ if I need them finer/less pulpy). Special tip when making \ smoothies, finely cut and freeze the fruits and \ vegetables (if using spinach-lightly stew soften the \  spinach then freeze until ready for use-and if making \ sorbet, use a small to medium sized food processor) \  that you plan to use that way you can avoid adding so \ much ice if at all-when making your smoothie. \ After about a year, the motor was making a funny noise. \ I called customer service but the warranty expired \ already, so I had to buy another one. FYI: The overall \ quality has gone done in these types of products, so \ they are kind of counting on brand recognition and \ consumer loyalty to maintain sales. Got it in about \ two days. """ reviews = [review_1,  review_2, review_3, review_4]
 for i in range(len(reviews)): prompt = f""" Your task is to generate a short summary of a product \  review from an ecommerce site.  Summarize the review below, delimited by triple \ backticks in at most 20 words.  Review: ```{reviews[i]}``` """ response = get_completion(prompt) print(i, response, "\n")
 0 Soft and cute panda plush toy loved by daughter,  but a bit small for the price. Arrived early.  1 Affordable lamp with storage,  fast shipping, and excellent customer service. Easy to assemble and missing parts were quickly replaced.  2 Good battery life,  small toothbrush head, but effective cleaning. Good deal if bought around $50.  3 The product was on sale for $49 in November,  but the price increased to $70-$89 in December. The base doesn't look as good as previous editions, but the reviewer plans to be gentle with it. A special tip for making smoothies is to freeze the fruits and vegetables beforehand. The motor made a funny noise after a year, and the warranty had expired. Overall quality has decreased.

Please see the next blog post for the preface.