update prompt

Leolty · Aug 18, 2023 · 25bdcb5 · 25bdcb5
1 parent 28f7417
commit 25bdcb5
Show file tree

Hide file tree

Showing 4 changed files with 88 additions and 55 deletions.
diff --git a/prompt/__init__.py b/prompt/__init__.py
@@ -1,5 +1,5 @@
 from .header_check import header_check_prompt
-from .sort import sort_prompt
+from .sort import base_sort_prompt, smart_sort_prompt, choice_sort_prompt, \
+                  omit_statement, cot_sort_prompt, direct_sort_prompt
 from .cot import cot_prompt
-from .agent import agent_prefix
-from .smarter_agent import smarter_agent_prefix
+from .agent import agent_prefix, smart_agent_prefix
diff --git a/prompt/agent.py b/prompt/agent.py
@@ -1,25 +1,81 @@
 agent_prefix = """
-You are working with a pandas dataframe in Python. The name of the dataframe is `df`. Your task is to use `python_repl_ast` to answer the question posed of you.
+You are working with a pandas dataframe in Python. The name of the dataframe is `df`. Your task is to use `python_repl_ast` to answer the question posed to you.
 
-python_repl_ast: A Python shell. Use this to execute python commands. Input should be a valid python command. When using this tool, sometimes output is abbreviated - make sure it does not look abbreviated before using it in your answer.
+Tool description:
+- `python_repl_ast`: A Python shell. Use this to execute python commands. Input should be a valid python command. When using this tool, sometimes the output is abbreviated - ensure it does not appear abbreviated before using it in your answer.
 
-Use the following format:
+Guidelines:
+- **Aggregated Rows**: Be cautious of rows that aggregate data such as 'total', 'sum', or 'average'. Ensure these rows do not influence your results inappropriately.
+- **Data Verification**: Before concluding the final answer, always verify that your observations align with the original table and question.
+
+Strictly follow the given format to respond:
 
 Question: the input question you must answer
-Thought: you should always think about what to do
-Action: can ONLY be `python_repl_ast`
-Action Input: the input to the action
+Thought: you should always think about what to do to interact with `python_repl_ast`
+Action: can **ONLY** be `python_repl_ast`
+Action Input: the input code to the action
 Observation: the result of the action
 ... (this Thought/Action/Action Input/Observation can repeat N times)
-Thought: I now know the final answer
-Final Answer: the final answer to the original input question 
+Thought: after verifying the table, observations, and the question, I am confident in the final answer
+Final Answer: the final answer to the original input question (AnswerName1, AnswerName2...)
+
+Notes for final answer:
+- Ensure the final answer format is only "Final Answer: AnswerName1, AnswerName2..." form, no other form. 
+- Ensure the final answer is a number or entity names, as short as possible, without any explanation.
+- Ensure to have a concluding thought that verifies the table, observations and the question before giving the final answer.
+
+You are provided with a table regarding "[TITLE]". This is the result of `print(df.to_markdown())`:
+
+[TABLE]
+
+**Note**: All cells in the table should be considered as `object` data type, regardless of their appearance.
+
+Begin!
+Question: [QUESTION]
+
+"""
+
+smart_agent_prefix = """
+You are working with a pandas dataframe in Python. The name of the dataframe is `df`. Your task is to use `python_repl_ast` to answer the question posed to you. Depending on the nature of the question, you must carefully decide the approach to answering it.
+
+Tool description:
+- `python_repl_ast`: A Python shell. Use this to execute python commands. Input should be a valid python command. When using this tool, sometimes the output is abbreviated - ensure it does not appear abbreviated before using it in your answer.
+
+Guidelines:
+- **Calculations & Counting**: You are **NOT** good at calculations and counting. **Always** use `python_repl_ast` for those questions, as it will provide more accurate and reliable results.
+- **Large Tables & Data Interpretation**: With expansive tables or complex table layouts, prefer using `python_repl_ast` to ensure accuracy.
+- **Similar Columns/Rows**: In case of columns or rows with close resemblance, the risk of confusion is high. Rely on `python_repl_ast` for precise data extraction.
+- **Beware of Aggregated Rows**: Before performing operations, be extra cautious of rows that aggregate data such as 'total', 'sum', or 'average'. Always ensure that these rows do not skew the results.
+- **Complex or Multi-Step Queries**: For questions that require multiple steps or intricate analysis, prefer to use `python_repl_ast` even if the steps might seem direct. This ensures that no step is overlooked.
+- **Direct Answers**: Only try `answer_directly` when the answer to the question is extremely straightforward, with none of the aforementioned potential risks. Otherwise, use `python_repl_ast` to ensure accuracy.
 
-Ensure the final answer format is only "Final Answer: AnswerName1, AnswerName2..." form, no other form. And ensure the final answer is a number or entity names, as short as possible, without any explanation.
+Strictly follow the given format to respond:
+
+Question: the input question you must answer
+Initial Thought and Decision: read the guidelines above, think and decide whether to use `python_repl_ast` or answer directly
+If the decision is to answer directly:
+   Action: must **ONLY** be `answer_directly`
+   Explanation: explain your step-by-step thought to approach the final answer, **MUST** start with "Explanation: Let's think step-by-step".
+   Final Answer: the final answer to the original input question (AnswerName1, AnswerName2...)
+If the decision is to use `python_repl_ast`:
+   Thought: you should always think about what to do to interact with `python_repl_ast`
+   Action: must **ONLY** be `python_repl_ast`
+   Action Input: the input to the action
+   Observation: the result of the action
+   ... (this Thought/Action/Action Input/Observation can repeat N times)
+   Thought: after verifying the table, observations and the question, I am confident in the final answer
+   Final Answer: the final answer to the original input question (AnswerName1, AnswerName2...)
+
+Notes for final answer:
+- Ensure the final answer format is only "Final Answer: AnswerName1, AnswerName2..." form, no other form. And ensure the final answer is a number or entity names, as short as possible, without any explanation. 
+- Ensure to have a concluding thought that verifies the table, observations and the question before giving the final answer.
 
 You are provided with a table regarding "[TITLE]". This is the result of `print(df.to_markdown())`:
 
 [TABLE]
 
+**Note**: All cells in the table should be considered as `object` data type, regardless of their appearance.
+
 Begin!
 Question: [QUESTION]
 """
diff --git a/prompt/smarter_agent.py b/prompt/smarter_agent.py
diff --git a/prompt/sort.py b/prompt/sort.py
@@ -1,16 +1,30 @@
-sort_prompt = """
+direct_sort_prompt = """
 You are an advanced AI capable of analyzing and understanding information within tables. Read the table below regarding "[TITLE]":
 
 [TABLE]
 
-We know that the headings for this table are as follows, separated by semicolons:
+The table column headings are provided below, separated by semicolons:
 
 [HEADINGS]
 
-Your task is to identify whether sorting the table by a particular column would add clarity or enhance understanding. If so, please indicate which column would be best to sort by. If not, specify that no sorting is necessary.
+In order to optimize the interpretability and readability of the data, follow these guidelines to determine the most suitable sorting method:
 
-- If sorting would help, respond with: "Sort by: [COLUMN_NAME]", replacing [COLUMN_NAME] with the specific name of the appropriate column.
-- If the table doesn’t contain information that would benefit from sorting (such as numerical, alphabetical, or chronological data), or if sorting wouldn't add value for other reasons, respond with: "Sort by: N/A".
+Sorting Guidelines:
 
-Please follow one of these responses without additional explanation.
+1. Evaluate columns based on data types such as numerical, alphabetical, chronological, categorical, or other relevant sorting methods.
+2. Identify any patterns or relationships in the data that would be highlighted by certain sorting methods.
+3. Consider column position, as those on the left may sometimes have sorting priority.
+4. If applicable, consider sorting by multiple columns in a prioritized sequence.
+
+Provide your decision using one of the following statements:
+
+- For sorting using a single column: "Sort by: [Name of Column]".
+- For sorting using multiple columns: "Sort by: [Primary Column Name], [Secondary Column Name], ...".
+- If no specific sorting seems advantageous: "Sort by: N/A".
+
+Your response should strictly follow the formats provided.
+"""
+
+omit_statement = """
+Note: Only selected rows from the beginning and end of the table are displayed for brevity. Intermediate rows are omitted and represented by "..." for clarity.
 """