The purpose of this SDK is to publish code that will help data scientists to query MongoDB using python so as to bulk download data and files directly from the SEARS backend for aggregated analysis. Case studies 6.1 and 6.2 from our main paper were conducted using this SDK.
Please refer to our main SEARS platform repository here.
- Copy the
.envfile to the root directory of the project. Update the connection string to use your own MongoDB Atlas connection string. Also update the AWS S3 parameters as per your AWS settings. - Install all requirements using
pip3 install -r requirements.txt - Run
python3 mongo_connect.pyto download data from MongoDB to a CSV file. Setsearch_criteriaandoutput_file_namein the program file. Please note thatmongo_connect.pyhas been customized to access our own schema in SEARS. As you adapt SEARS to your own needs, you may need to modify the code to suit your schema. Therefore please use this file as a reference to write your own MongoDB query code. We encourage use of AI agents to achieve this goal. Easiest way to get a copy of the schema is to go to SEARS dashboard. Next to any experiment appearing in the dashboard, click on the "W"-icon button. This will download the schema of the experiment in JSON format. You can use this schema to write your own MongoDB queries. - Run
python3 AWS_Download.pyto download files from AWS S3 to a local directory./file_fetch/. All files related to experiments meeting the search criteria will be downloaded. - Run your ML model on the downloaded data and files.
#Steps
- Notice the folder
./uploadsin the root directory of the project. This folder is used to upload data to MongoDB. - Drop data for an experiment in the folder
./uploads. The data should be in the form of a JSON file. - Run the program
python3 auto_upload.pyto upload the data to MongoDB. The program will automatically upload the data to the MongoDB collectionproductData.
- Pool all your upload files in a single folder. The files should be in the form of a CSV file. Run
CSV_Validator.ipynbto validate the CSV files. This will ensure that the files are in the correct format for upload. Address any errors that the validator reports. - Go to the SEARS dashboard.
- Click on the "+" button next to any "Experiment". A dialog box will appear.
- Navigate to the correct experiment tab in the dialog box. e.g. "Thickness".
- Drag and Drop files into the Upload area to upload. You can upload multiple files at once.
- Click on the "Save Data" button to finish.