This library provides an object-oriented API for Amazon Mechanical Turk operations, written in Python. It has three distinctive features:
- All major components of the MTurk APIs (Workers, HITs, Assignments, Qualifications) are objects, and you perform operations by calling methods on them.
- More importantly, the API exposes the relations between objects; for example, you can call
worker_instance.qualifications
to get that worker's qualifications. - All data is cached locally in a SQLite database. This allows you to query the relations through the ORM API or directly in SQL, and avoid making (relatively) time-consuming queries to the AWS endpoints.
Under the hood, the library uses Amazon's boto3 and the Peewee ORM library.
You might not want this library. You should first consider using Amazon's boto3 library directly, or maybe some scripts that operate on text files.
In my case, I found that, as my data grew, the text files got cumbersome to manage and also made it hard to take advantage of the natural relations inherent to the data (e.g., a worker has a qualification, an assignment belongs to a HIT). This library is my solution to that specific need.
You'll need to set up your AWS credentials.
pip install git+https://github.com/nmalkin/objective-turk.git
There are three main configuration options you need to set before using the library:
- Are you running in the Sandbox or in production?
- Which AWS account/config are you using?
- Where will the database file be stored?
You can set these by passing appropriate values to the objective_turk.init
function, or you can set them using environment variables. The latter is described below.
Set MTURK_PRODUCTION=true
to run in production, otherwise the code will use the sandbox.
Set AWS_PROFILE=<your-profile-name>
or use one of the alternate AWS configuration methods.
If you have a default profile set up, you can skip this step, but note that this variable is also used to determine the database filename: by default it is <AWS_PROFILE>_<environment>.db
.
By default, the database will be stored in the current working directory, but you can change that by setting MTURK_DB_PATH=<path>
.
import objective_turk
objective_turk.init()
# TODO