Scripts for metadata validation
This script listens for messages from ingest API messaging queue and validates each metadata created in the ingest service
python validation-app.py
Metadata JSON documents are validated against specified schemas. A list of the current applicable schemas are located here. Documents must specify the schema against which they are to be validated. The URL of the schema must be located at the JSON path core.schema_url within the JSON document. After validation has completed, a validation report will be attached to the metadata document living in the ingestion-infrastructure database at api.ingest.dev.data.humancellatlas.org
To easily facilitate development, it is recommended to set up a Python virtual environment tool, particularly the
virtualenv
to manage packages and dependencies necessary to develop, test, and run the code. The documentation for setting
up virtualenv
can be found on Python 3's official documentation.
There are 2 general approaches to managing virtual environments on the local machine. One is to set up a common repository
for all virtual environments such as $HOME/.venvs
(there's no widely accepted naming convention on this). Another
approach is to set up one locally in the specific root directory of the project. Further instructions on how exactly to
create and manage virtual environments through virtualenv
can be found on the documentation specified above.
In this repository, the venv
directory has been set up to be ignored by version control. It can be used to contain
virtual environments for the local development machine.
To install required modules needed to develop and run this codebase, the following command is used:
pip install -r requirements.txt
The requirements.txt
file specifies a list of modules that are needed to be installed to successfully run the code.
When new modules are added through pip
, it is advised to keep the list up to date by using freeze
:
pip freeze > requirements.txt
This command will overwrite the text file with all the modules needed including the last ones installed along with the versions used for the local environment.
Test driven development is observed for this project and unit tests are required for any fairly complex part of the
system. Unit tests are written using Python's unittest
module, and can be run using the built-in testing system.
However, for finer grain control on testing, nosetests
can also be used. For one, nosetests
can run a single test
from any given test module. For example, to run test_addition
test method in test_arithmetic_operations
module, the
following can be used:
nosetests tests.test_arithmentic_operations:TestArithmeticOperations.test_addtion
In the sample above, TestArithmeticOperations
is the name of the test case (i.e. the Python class that extends
unittest.TestCase
). More options can be found at the official nose test documentation Web site.
Being able to run a single test from any given test case means it's possible to debug a specific block of code using
any IDE's debugging tool. JetBrains' PyCharm in particular support single test
call through a few selected tools one of which is nosetest
. To set nosetest
as the default test runner in PyCharm,
set the Default test runner option in Tools > Python Integrated Tools pane of the IDE preferences.