This application:
- listens for messages from the elife-bot
- downloads XML from S3 via HTTP
- converts XML to a mostly complete representation of our article-json schema
- sends article-json to Lax to be ingested
$ ./install.sh
The bot-lax-adaptor comes with a simple web interface that allows uploading eLife JATS XML, generating article-json from it and then validating it.
$ ./web.sh
See example-upload-file-to-api.sh
.
$ source venv/bin/activate
$ python src/main.py /path/to/a/jats.xml
Output at time of writing looks like this.
Thin wrapper around the above command:
$ ./scrape-article.sh ./article-xml/articles/elife-09560-v1.xml
Converts a random article to article-json:
$ ./scrape-random-article.sh
Converts all articles in the ./article-xml/articles/
directory, writing the
results to ./article-json/
. This script makes use of all available cores:
$ ./generate-article-json.sh
The article-json generated by this application is structured according to the eLife json-schema article specification.
Because the XML only contains a partial representation of an article, validation also involves filling in certain gaps that can only be provided by Lax.
$ source venv/bin/activate
$ python src/validate.py /path/to/an/article.json
Thin wrapper around above command:
$ ./validate-json.sh ./article-json/elife-09560-v1.xml.json
Validates all article-json in the ./article-json/
directory. This script makes
use of all available cores:
$ ./validate-all-json.sh
This generates, validates and then performs an ingest --force
to lax for each article in the article-xml repository.
$ ./backfill.sh
The generation, validation and ingest actions happen in separate steps for greater parallelism.
This reads a list of article IDs from a file and then generates, validates and performs an ingest --force
to lax for
each article sequentially. It can be quite slow for a large number of articles.
$ ./backfill-many.sh
This is quite eLife-specific but can be modified easily if you're a developer:
$ ./bot-lax-listener.sh
$ ./test.sh
Copyright 2023 eLife Sciences. Licensed under the GPLv3
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.