Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(json_parser): remove the tab character that can be present in the… #260

Conversation

fabricehong
Copy link
Contributor

remove the tab character that can be present in the generated json. The tab character makes the json.loads function throw an Invalid JSON error.

… generated json. It makes the json.loads function throw an Invalid JSON error
@dschonholtz
Copy link
Contributor

Is this why we fail to parse JSON so often?
This sounds fantastic. I would just want to know how you know that this fix works before we merge it.

@fabricehong
Copy link
Contributor Author

fabricehong commented Apr 6, 2023

@dschonholtz Because I run the continuous mode in debug for 1 hour, and stored all string that cause invalid JSON error. There is only 2 cases. This one (the tab character), and this one #256 caused by the presence of unwanted commas (due to mis interpretation of gpt3.5 being a python function).

When I had the strings which caused the error, I tried the string is this tool: https://jsonformatter.curiousconcept.com/ , and by trial and error, the string ended up being valid when I removed a tab inside one of the JSON string value. An hexidecimal editor helped me to see that the space I was removing was a TAB character.

To be sure, I tested the code the fixed and faulty string with project code and confirmed it fixed the issue.

Here is the JSON that caused the error:

{
    "command": {
        "name": "browse_website",
        "args": {
            "url": "https://www.apta.com/resources/reportsandpublications/Pages/default.aspx",
            "question": "What are the specific needs/problems identified in the APTA publications that can be addressed with public transportation data visualization and machine learning predictions?"
        }
    },
    "thoughts": {
        "text": "The next step now is to extract the data related to public transportation data needs from the shortlisted reports obtained in the previous step. This will shed light on the specific areas where public transport data visualization or machine learning-based solution could be proposed.",
        "reasoning": "Analyzing the information in selected APTA publications and reports will help me identify specific areas where public transport data analytics can be leveraged to solve problems related to public transportation. By using data visualization techniques, it is possible to make trends and patterns visible to public transport users, enabling them to understand complex data effectively. While developing specific product ideas, using machine learning-based predictions can help identify passenger's demographic, travel patterns, and transportation preferences, which can help in delivering a better experience to them.",
        "plan": "-	Use the browse_website command to navigate to the APTA Resources page.\n-	Visit the TCRP Reports section and select relevant research reports and publications.\n-	Read through these reports and identify areas of inefficiency or potential market gaps in public transportation.\n-	Record the ideas in a file named 'public_transport_data_needs.txt'",
        "criticism": "One important thing to keep in mind here is only to extract data related to public transportation data needs; irrelevant information can lead to increased danger of confusion and can become difficult to tackle later on.",
        "speak": "After analyzing the selected APTA publications and reports, I will now record the areas where issues can be identified and solved with the help of public transportation data visualization and machine learning in a file named public_transport_data_needs.txt."
    }
}

give this string to the json.loads function, it will fail.
Remove the tab characters, it will succeed.

After merging this, since the json_fixer will always be successful, it will cause another error that I corrected here: #259

@fabricehong
Copy link
Contributor Author

I sugggest logging the errors and messages who caused it in a error.log, it will highly decrease the debugging time this improve the overall quality of the project

@Torantulino
Copy link
Member

Because I run the continuous mode in debug for 1 hour, and stored all string that cause invalid JSON error. There is only 2 cases.

Wow. That's what I like to see, excellent work @fabricehong!

@Torantulino Torantulino merged commit 447cb4b into Significant-Gravitas:master Apr 6, 2023
waynehamadi added a commit that referenced this pull request Sep 5, 2023
Signed-off-by: Merwane Hamadi <merwanehamadi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants