Skip to content

beatybiodiversitymuseum/specify-edit-bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Specify Edit Bot

This tool allows for bulk editing of a given table in Specify 7 via id numbers.

Setup

Before using the script, a dedicated specify user should be created in the instance and given a descriptive name that identifies it as a bot (such as bulk_edit_bot). It should be given the lowest permissions required to perform its task. Usually this is update or delete permissions in the target table, and read access to any tables which are required for that table to load.

Clone the repository

To get a copy of all the required code, you can clone this repository to your machine

git clone https://github.com/beatybiodiversitymuseum/specify-edit-bot.git
cd specify-edit-bot

Dependencies

All required dependencies are listed in requirements.txt. They can be installed via pip install -r requirements.txt. A virtual environment is recommended.

Secrets

The bot requires login credentials, however, these should be kept safe and not committed to git. They are thus defined in an .env in the root of the repository. The .env file should look like this, in which {bot_username} and {bot_password} are replaced with the values of the bot account

SP7_USERNAME = {bot_username}
SP7_PASSWORD = {bot_password}

Input data

Input data should be defined as a csv file. The left-most column should contain id numbers, and be called id. The other columns should represent the fields you wish to edit, and be named to match the database schema. For example, editing some records in collectionobject may look like this

id remarks text1 cataloger
4523 Edits made by a bot test 1 2 /api/specify/agent/3

Note that relationships to other tables just need to be defined by their appropriate API route. Edits to fields within the same table just need to adhere to business rule and type requirements for that field.

Depending on if your data file is csv or tab delimited, you may need to change the sep= argument within pd.read_csv in helpers/data.py. Tab delimited files should use sep='\t' while csv delimited files should not need the sep= argument at all.

Usage

All non-secret variables that need to be adjusted to define the task are in main(). They are outlined here:

Variable Description
instance_url The base url for your instance. For example, https://sp7demofish.specifycloud.org. It should not have a trailing slash.
method Either edit or delete. If the method is edit, then the program will edit existing values based on what is defined in the input table. If the method is delete, then a list of ids is all that is required (if data is present it will be ignored), and the program will attempt to delete records with that id from the specified table. Note that Specify has delete blockers which may prevent a record from being deleted if it has dependents. The program will stop with an error if an API call results in a non-successful status code.
table The table to perform the edits on. Must match the table name as defined in the API documentation. For example, collectionobject
collectionid The collection id that defines the collection to edit. Required for logging in.
input_data_filepath The filepath which the program should look to for the input data. Default is data/data.csv. If you do not use this path, you should add the path to .gitignore if it is within the repository.

Once the variables above have been defined, you may run main.py. A progress bar will display in the terminal to show the estimated time to complete the edits. To avoid excess strain on the specify instance, the program is set to wait 4.5 seconds between api calls.

Logging

By default, the program will print logs to the logging directory, relative to the repository. This repository is ignored by git. This will give timestamps for each edit or deletion action performed.

Testing

To run unit tests, simply run pytest from the terminal prompt. The tests will run against the sp7demofish instance and use data defined in tests/data/test-data.csv, not the instance you have configured through environment variables.

About

Make bulk changes to a specify7 instance from a csv file

Topics

Resources

Stars

Watchers

Forks

Languages