Dockerized papyri.info stack.
Clone with:
git clone --recurse-submodules https://github.com/dcthree/papyri-docker
First, you need to obtain a GitHub Personal Access Token with package registry permissions (see "Creating a personal access token"), and set it as the environment variable GITHUB_TOKEN
for the docker compose
process. You'll also need to set the environment variable GITHUB_USERNAME
to your GitHub username. There are a variety of ways you can set these environment variables, including using an unversioned .env
file in the directory where you've cloned this repository. These environment variables must be available for the navigator
container to successfully build packages.
Then, from this repository's directory:
docker compose build
docker compose up -d
- Watch logs in a separate terminal in the same directory:
docker compose logs -f -t
- If all is successful, you should be able to access the running copy once
httpd
comes up at: http://127.0.0.1:8000
- Disk Space: after bringing up a complete stack, my
docker system df
shows 40GB of images, 1GB of containers, and 26GB of volumes (67GB total). You may need to increase the default disk allocation if you're running e.g. Docker for Mac. - Network Port: if another service is already bound to port 8000,
httpd
will fail to come up. If this happens, you can just stop the other service and rundocker compose up -d
again. - Memory: I have 16GB of RAM, 1GB of swap, and 6 VCPUs allocated to Docker. Bringing this up makes my system quite slow...
- Initial Indexing: if something goes wrong with the indexing process, you may need to use
docker compose up -d --force-recreate
when re-trying. - Docker Compose Timeout: the default Docker Compose HTTP timeout of 60 seconds can sometimes cause problems with
docker compose up
/docker compose stop
, due to the delay in responsiveness of some services. If you run into this, prefix the commands with e.g.COMPOSE_HTTP_TIMEOUT=10000
. - GitHub Maven Package Registry Auth: if you get a
401 Unauthorized
error from trying to builddispatch.war
orsync.war
when you rundocker compose up navigator
, you may have an invalid GitHub Personal Access Token (basic) due to token expiration or invalid scope. Try using a new token following the instructions above. - Want to start over from scratch?: run
docker compose down -v
.
httpd
: Apache 2.2 server, proxies the Navigator, Editor, XSugar, and Fusekiindexer
: container that runs the indexing process using the below servicesnavigator
: the main "Papyrological Navigator" serverfuseki
: Apache Jena Fuseki 1.x SPARQL Server (aka "Numbers Server")tomcat-pn
: Tomcat server runing "dispatch" and "sync" servletssolr
: Tomcat server running Apache Solr for search
sosol
: Puma server serving the Rails Editor (aka "SoSOL") applicationxsugar
: container that runs XSugar, an XML transformer used bysosol
postgres
: PostgreSQL 13 server, shared bysosol
, andtomcat-pn
repo_clone
: shared Git checkout of the large mainidp.data
repository, shared bynavigator
,fuseki
,tomcat-pn
,sosol
, &httpd
The papyri.info "Top Level Data Flow" diagram may help with understanding:
Services get started in the following order:
ppostgres
: no service/startup dependenciesfuseki
: no service/startup dependenciesxsugar
: no service/startup dependenciesrepo_clone
: no service/startup dependencies, clonescanonical
navigator
: oncecanonical
is cloned andfuseki
is up, sets config forsolr
, builds WAR files fortomcat-pn
, runs "mapping" which loads data intofuseki
solr
: once solr config (/opt/solr/server/solr/solr.xml.lock
) is in place, written bynavigator
indexer
: oncefuseki
andsolr
are up andmapping
is done, runs "indexing" which loads data intosolr
tomcat-pn
: once WAR files are built bynavigator
and "mapping" is donesosol
: oncecanonical
is cloned andmysql
is available, though some functionality depends onfuseki
(as well as "mapping" fromnavigator
) andxsugar
httpd
: once/srv/data/papyri.info/git/navigator/pn-config/pi.conf
is in place and the proxied servicessosol
,xsugar
,tomcat-pn
,fuseki
, andsolr
are available, Apache is started up ashttpd
Service startup order is important, and the current docker-compose.yml
uses several strategies to control it:
wait-for-it.sh
used to wait for network service availability;indexer
uses it to wait forsolr
startup,sosol
uses it to wait formysql
startup- lockfiles on shared volumes are used to enforce processes that only need to run once only running once; these lockfiles are also sometimes used as a wait signal for containers that need the process to finish before they can run (these busy-wait until the lockfile exists)
Some containers also use links
and depends_on
clauses, but these are no longer relied upon to enforce startup order.
You may note that we have some containers which run as continuous servers, and others which are containerized processes for building artifacts needed by those servers. Categorizing them may be useful:
Servers:
http
fuseki
solr
sosol
tomcat-pn
mysql
xsugar
Processes:
repo_clone
navigator
indexer