A Python-based tool that processes questions from a CSV file and uses LLM (Large Language Model) to generate answers, research, and citations. The tool integrates with Glean's API to index and manage the responses.
- Prerequisites
- Installation
- Configuration
- Usage
- Examples
- Troubleshooting
- Contributing
- License
- Code of Conduct
- Python 3.8 or higher
- Access to a Glean instance
- Glean API token with appropriate permissions
- Required Python packages (will be listed in requirements.txt)
- Clone the repository:
git clone [repository-url]
cd querycsv
- Install dependencies:
pip install -r requirements.txt
- Copy the example environment file:
cp _.env-example .env
Create a .env
file with the following variables:
DEBUG="true" # Enable/disable debug mode
GLEAN_INSTANCE="dev" # Your Glean instance name
GLEAN_API_TOKEN="your-token-here" # Your Glean API token
QUESTIONS_CSV="" # Path to your questions CSV file
To use this tool, you need to:
- Enable the REST API in your Glean environment
- Generate an API token with appropriate permissions
- Add the token to your
.env
file
Create a CSV file with the following headers:
qid,question,answer,research,citations,datetime
Example (_questions.csv-example
):
qid,question,answer,research,citations,datetime
"1","what is artificial intelligence?","","","",""
"2","how much wood can a woodchuck chuck?","","","",""
"3","how does glean index data?","","","",""
"4","how does glean keep the data secure?","","","",""
Column descriptions:
qid
: Unique identifier for each questionquestion
: The question to be answeredanswer
: (Output) The generated answerresearch
: (Output) Supporting research informationcitations
: (Output) Reference citationsdatetime
: (Output) Timestamp of when the answer was generated
The tool can be run using the run.sh
script with various options:
./run.sh -d false # Run in production mode
./run.sh -d true # Run in debug mode
-d
: Debug mode (true/false)-v
: Verbose output- Additional options can be found in gleanConstants.py
./run.sh -d true
Output:
vars: -d true
2025-04-10 02:12:58,368 - querycsv - INFO - Reading questions from CSV file: test/questions.csv
2025-04-10 02:12:58,368 - querycsv - INFO - Processing question: what is artificial intelligence?
2025-04-10 02:12:59,373 - querycsv - INFO - Processing question: how much wood can a woodchuck chuck?
2025-04-10 02:13:00,378 - querycsv - INFO - Processing question: how does glean index data?
2025-04-10 02:13:01,382 - querycsv - INFO - Processing question: how does glean keep the data secure?
2025-04-10 02:13:02,385 - querycsv - INFO - Writing question log to: test/questions_20250410_021258.csv
2025-04-10 02:13:02,387 - querycsv - INFO - Processing complete.
Common issues and solutions:
-
API Token Issues
- Ensure your API token has the correct permissions
- Verify the token is correctly copied to the
.env
file - Check if the token has expired
-
CSV Format Issues
- Verify the CSV file has all required headers
- Ensure the file is properly formatted (no extra spaces, correct quotes)
- Check file encoding (should be UTF-8)
-
Debug Mode
- If encountering issues, run in debug mode (
-d true
) for more detailed logs - Use verbose mode (
-v
) for additional information
- If encountering issues, run in debug mode (
We welcome contributions! Please read our Contributing Guidelines for details on our code of conduct and the process for submitting pull requests.
This project is licensed under the terms included in the LICENSE file.
Please read our Code of Conduct to understand our community guidelines and expectations.