Skip to content

baskargroup/SEARS-Data-Pull

Repository files navigation

SEARS SDK

The purpose of this SDK is to publish code that will help data scientists to query MongoDB using python so as to bulk download data and files directly from the SEARS backend for aggregated analysis. Case studies 6.1 and 6.2 from our main paper were conducted using this SDK.

Main SEARS platform

Please refer to our main SEARS platform repository here.

Steps to pull data.

  1. Copy the .env file to the root directory of the project. Update the connection string to use your own MongoDB Atlas connection string. Also update the AWS S3 parameters as per your AWS settings.
  2. Install all requirements using pip3 install -r requirements.txt
  3. Run python3 mongo_connect.py to download data from MongoDB to a CSV file. Set search_criteria and output_file_name in the program file. Please note that mongo_connect.py has been customized to access our own schema in SEARS. As you adapt SEARS to your own needs, you may need to modify the code to suit your schema. Therefore please use this file as a reference to write your own MongoDB query code. We encourage use of AI agents to achieve this goal. Easiest way to get a copy of the schema is to go to SEARS dashboard. Next to any experiment appearing in the dashboard, click on the "W"-icon button. This will download the schema of the experiment in JSON format. You can use this schema to write your own MongoDB queries.
  4. Run python3 AWS_Download.py to download files from AWS S3 to a local directory ./file_fetch/. All files related to experiments meeting the search criteria will be downloaded.
  5. Run your ML model on the downloaded data and files.

Process to automate the upload of experiment data to MongoDB

#Steps

  1. Notice the folder ./uploads in the root directory of the project. This folder is used to upload data to MongoDB.
  2. Drop data for an experiment in the folder ./uploads. The data should be in the form of a JSON file.
  3. Run the program python3 auto_upload.py to upload the data to MongoDB. The program will automatically upload the data to the MongoDB collection productData.

Manual upload of data to SEARS on a per experiment basis - SOP

  1. Pool all your upload files in a single folder. The files should be in the form of a CSV file. Run CSV_Validator.ipynb to validate the CSV files. This will ensure that the files are in the correct format for upload. Address any errors that the validator reports.
  2. Go to the SEARS dashboard.
  3. Click on the "+" button next to any "Experiment". A dialog box will appear.
  4. Navigate to the correct experiment tab in the dialog box. e.g. "Thickness".
  5. Drag and Drop files into the Upload area to upload. You can upload multiple files at once.
  6. Click on the "Save Data" button to finish.

About

This repo contains a SDK to access data directly from SEARS backend.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published