Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/datascience assistant #562

Merged
merged 37 commits into from
Aug 6, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
96e6a31
add plan update logic
dahaipeng Jul 4, 2024
0373d07
adding update task logic
dahaipeng Jul 5, 2024
6c418a3
update datascience assistant logic to achieve better results
dahaipeng Jul 12, 2024
61d6953
Merge branch 'master' into feature/datascience_assistant
dahaipeng Jul 12, 2024
7c1c20b
Merge branch 'refs/heads/master' into feature/datascience_assistant
dahaipeng Jul 15, 2024
70d844c
add ds tools
dahaipeng Jul 18, 2024
2e54983
add ds tools
dahaipeng Jul 19, 2024
0f42462
update prompt
dahaipeng Jul 22, 2024
dd7d05d
update utils
dahaipeng Jul 22, 2024
8b752a5
update init
dahaipeng Jul 22, 2024
800fc06
Merge branch 'refs/heads/master' into feature/datascience_assistant
dahaipeng Jul 22, 2024
4f7bd6a
update log
dahaipeng Jul 23, 2024
62c2bed
Merge branch 'refs/heads/master' into feature/datascience_assistant
dahaipeng Jul 23, 2024
cb05393
delete yml
dahaipeng Jul 23, 2024
5b70701
update ds_assistant
dahaipeng Jul 24, 2024
49c50d2
Merge branch 'refs/heads/master' into feature/datascience_assistant
dahaipeng Jul 25, 2024
428401a
update ds_assistant
dahaipeng Jul 25, 2024
f6bd4e2
update ds_assistant
dahaipeng Jul 25, 2024
ee8857f
Merge branch 'refs/heads/master' into feature/datascience_assistant
dahaipeng Jul 26, 2024
1f1be09
update ds_assistant
dahaipeng Jul 26, 2024
e2310b5
fix openapi tool
dahaipeng Jul 26, 2024
458d4d5
Merge branch 'refs/heads/feature/datascience_assistant'
dahaipeng Jul 30, 2024
999d914
add data science assistant example
dahaipeng Jul 31, 2024
ea16607
add data science assistant example
dahaipeng Jul 31, 2024
fbe91b9
add data science assistant example
dahaipeng Jul 31, 2024
c536e6d
Merge remote-tracking branch 'origin/master'
dahaipeng Aug 1, 2024
6f9ae6a
Merge branch 'refs/heads/master' into feature/datascience_assistant
dahaipeng Aug 1, 2024
894afca
fix requirements
dahaipeng Aug 1, 2024
733ef50
fix requirements
dahaipeng Aug 1, 2024
292c5b9
Merge remote-tracking branch 'origin/master'
dahaipeng Aug 1, 2024
323a529
Merge branch 'refs/heads/master' into feature/datascience_assistant
dahaipeng Aug 1, 2024
67fa1ba
Merge remote-tracking branch 'origin/master'
dahaipeng Aug 5, 2024
8fd1176
Merge branch 'refs/heads/master' into feature/datascience_assistant
dahaipeng Aug 5, 2024
368e36a
add docs
dahaipeng Aug 5, 2024
037b3e8
add docs
dahaipeng Aug 5, 2024
8ea6a80
add docs
dahaipeng Aug 5, 2024
d73f260
add docs
dahaipeng Aug 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
update ds_assistant
  • Loading branch information
dahaipeng committed Jul 24, 2024
commit 5b707017035e9ab0f55c48ef5b73e9b37ecf114e
61 changes: 42 additions & 19 deletions modelscope_agent/agents/data_science_assistant.py
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@

Even if the code has been executed successfully, doesn't mean it's totally correct. You need to carefully \
check the code logic to ensure the code can accomplish the task correctly. Ignore the warning messages. \
You don't need to check the metrics of the model. you\
You don't need to check the metrics of the model.

these are the previous code blocks, which have been executed successfully in the previous jupyter notebook code blocks \
{previous_code_blocks}
Expand All @@ -233,6 +233,9 @@
- [your step by step thought], incorrect

don't generate code , just give the reason why the code is correct or incorrect.

## Attention
don't use the word 'incorrect' in your step by step thought.
"""

CHECK_DATA_PROMPT = """
Expand Down Expand Up @@ -315,12 +318,15 @@ def _update_plan(self, user_request: str, curr_plan: Plan = None) -> Plan:
call_llm_success = False
call_llm_count = 0
tasks_text = ''
messages = [{
'role':
'user',
'content':
PLAN_TEMPLATE.format(
context='User Request: ' + user_request + '\n', )
}]
while not call_llm_success and call_llm_count < 10:
resp = self._call_llm(
prompt=PLAN_TEMPLATE.format(
context='User Request: ' + user_request + '\n', ),
messages=None,
stop=None)
resp = self._call_llm(prompt=None, messages=messages, stop=None)
tasks_text = ''
for r in resp:
tasks_text += r
Expand Down Expand Up @@ -568,7 +574,8 @@ def _judge_code(self, task, previous_code_blocks, code,
if not call_llm_success:
raise Exception('call llm failed')
logger.info(f'judge result for task{task.task_id}: \n {judge_result}')
if 'incorrect' in judge_result:

if 'incorrect' in judge_result.split('\n')[-1]:
success = False
failed_reason = (
'Though the code executes successfully, The code logic is incorrect, here is the reason: '
Expand Down Expand Up @@ -604,12 +611,24 @@ def _run(self, user_request, save: bool = True, **kwargs):
while not success and code_counter < max_try:
code_execute_success = False
code_logic_success = False
temp_code_interpreter = CodeInterpreter()

temp_code_interpreter.call(
params=json.dumps({
'code':
self._get_previous_code_blocks_without_outputs()
}),
nb_mode=True,
silent_mode=True)
# generate code
code = self._generate_code(code_counter, task,
user_request)

code_execute_success, code_interpreter_resp = self.code_interpreter.call(
params=json.dumps({'code': code}), nb_mode=True)
code_execute_success, code_interpreter_resp = temp_code_interpreter.call(
params=json.dumps({'code': code}),
nb_mode=True,
silent_mode=True)
# 删除临时 jupyter环境
temp_code_interpreter.terminate()
judge_resp = ''
if not code_execute_success:
logger.error(
Expand All @@ -631,10 +650,9 @@ def _run(self, user_request, save: bool = True, **kwargs):
result=code_interpreter_resp + '\n' + judge_resp,
is_success=False))

if not success:
# delete the last cell if the code execution failed
del self.code_interpreter.nb.cells[-1]
else:
if success:
self.code_interpreter.call(
params=json.dumps({'code': code}), nb_mode=True)
task.code = code
task.result = code_interpreter_resp
code_counter += 1
Expand All @@ -650,7 +668,7 @@ def _run(self, user_request, save: bool = True, **kwargs):
else:
self.plan = self._update_plan(
user_request=user_request, curr_plan=self.plan)
self.code_interpreter.nb.cells.clear()
self.code_interpreter.reset()
# save the plan into json file
if save:
after_time = time.time()
Expand All @@ -662,10 +680,15 @@ def _run(self, user_request, save: bool = True, **kwargs):
'total_token': total_token,
'plan': self.plan.tasks
}
with open(
dir_name + 'plan.json', 'w', encoding='utf-8') as file:
file.write(
json.dumps(plan_dict, indent=4, cls=TaskEncoder))
print(f'plan_dict: {str(plan_dict)}')
try:
with open(
dir_name + 'plan.json', 'w',
encoding='utf-8') as file:
file.write(
json.dumps(plan_dict, indent=4, cls=TaskEncoder))
except Exception as e:
print(f'json write error: {str(e)}')

except Exception as e:
logger.error(f'error: {e}')
Expand Down
10 changes: 7 additions & 3 deletions modelscope_agent/tools/code_interpreter/code_interpreter_nb.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ def __init__(self, cfg={}):
self.nb_client = NotebookClient(self.nb, timeout=180)
self.console = Console()
self.interaction = ''
self.silent_mode = False
# timeout: int = 600

def __del__(self):
Expand Down Expand Up @@ -138,8 +139,9 @@ def parse_outputs(self,
output_text = output['text']
elif output['output_type'] == 'display_data':
if 'image/png' in output['data']:
self.show_bytes_figure(output['data']['image/png'],
self.interaction)
if not self.silent_mode:
self.show_bytes_figure(output['data']['image/png'],
self.interaction)

elif output['output_type'] == 'execute_result':
output_text = output['data']['text/plain']
Expand Down Expand Up @@ -227,6 +229,7 @@ def call(self,
params: str,
timeout: Optional[int] = 30,
nb_mode: bool = False,
silent_mode: Optional[bool] = False,
**kwargs) -> (bool, str):
try:
try:
Expand All @@ -249,7 +252,8 @@ def call(self,
)
fixed_code = '\n'.join(fixed_code)
if nb_mode:
result, success = self.run(code=fixed_code)
self.silent_mode = silent_mode
result, success = self.run(code=fixed_code, )
return success, result
except Exception as e:
return False, str(e)
Expand Down
9 changes: 5 additions & 4 deletions modelscope_agent/tools/metagpt_tools/task_type.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,20 +9,20 @@
- Remember to `import numpy as np` before using Numpy functions.
- don't plot any columns
- don't calculate correlations
- Avoid showing all data, always use df.head() to show the first 5 rows.

"""

# Prompt for taking on "data_preprocess" tasks
DATA_PREPROCESS_PROMPT = """
The current task is about data preprocessing, please note the following:
- Monitor data types per column, applying appropriate methods.
- make sure train and test data MUST have the same columns except for the label column.
- Remove ID columns if exist.
- Handle missing values with suitable strategies.
- Ensure operations are on existing dataset columns.
- Avoid writing processed data to files.
- Avoid any change to label column, such as standardization, etc.
- Each step do data preprocessing to train, must do same for test separately at the same time.
- Always copy the DataFrame before processing it and use the copy to process.
- Avoid writing processed data to files.
"""

# Prompt for taking on "feature_engineering" tasks
Expand All @@ -31,9 +31,9 @@
- Avoid creating redundant or excessively numerous features in one step.
- Each feature engineering operation performed on the train set must also applies to the \
test separately at the same time.
- Avoid using the label column to create features, except for cat encoding.
- Use the data from previous task result if exist, do not mock or reload data yourself.
- Always copy the DataFrame before processing it and use the copy to process.
- Use Label Encoding for non-numeric columns, avoid One-Hot Encoding.
"""

# Prompt for taking on "model_train" tasks
Expand All @@ -42,6 +42,7 @@
- If non-numeric columns exist, perform label encode together with all steps.
- If the model caused timeout error, please don't use this model again.
- Use the data from previous task result directly, do not mock or reload data yourself.
- Never save the model to a file.
"""

# Prompt for taking on "model_evaluate" tasks
Expand Down
11 changes: 11 additions & 0 deletions modelscope_agent/tools/metagpt_tools/tool_recommend.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,20 @@
- You can utilize pre-defined tools in any code lines from 'Available Tools' in the form of Python class or function.
- You can freely combine the use of any other public packages, like sklearn, numpy, pandas, etc..


## Available Tools:
Each tool is described in JSON format. When you call a tool, import the tool from its path first.
{tool_schemas}

## Attention
Ensure that the tool is imported from the correct path, the tool path is \
./modelscope_agent/tools/metagpt_tools/libs/xxx.py
if you want to use the tool in your code, you need to import the tool first, like this:
```python
from modelscope_agent.tools.metagpt_tools.libs.xxx import ToolName
```


"""

TOOL_RECOMMENDATION_PROMPT = """
Expand Down
Loading