Skip to content

Configuration and deployment for MarkLogic as part of the National Archives Find Case Law service

License

Notifications You must be signed in to change notification settings

nationalarchives/ds-caselaw-marklogic

Repository files navigation

The National Archives: Find Case Law

This repository is part of the Find Case Law project at The National Archives.

Marklogic Database Configuration

This folder specifies the configuration of the Marklogic database used by the Case Law public access system. It uses the ml-gradle to manage and maintain a versioned configuration.

For full details of what can be set in the files here, see the ml-gradle documentation. The file layout is explained in the project layout documentation.

Setup

  1. Install gradle. On MacOS, you can use brew install gradle.

  2. If you're running against anything other than development, copy gradle-development.properties to gradle-{environment}.properties and set the credentials and hostname for your Marklogic server.

Deployment

To deploy a marklogic configuration, run gradle mlDeploy -PenvironmentName={environment}.

The development environment will be used by default if you don't specify -PenvironmentName.

Deployment is idempotent, and will automatically configure databases, roles, triggers and modules.

Please also create a Github Release when you deploy.

Local Setup

1. Run a marklogic docker container

A docker-compose.yml file for running Marklogic locally is included. Run docker-compose up -d to start it; it takes a minute or so, and will raise various HTTP errors if you visit localhost:8000 before that point.

Note: There is currently a known issue with marklogic-docker so instead you might need to run development_scripts/run_local_docker

2. Deploy the marklogic configuration

You'll then need to deploy the configuration (see Deployment, above)

3. Make clients point to the local docker container

Ensure that MARKLOGIC_HOST in .env in the editor and public ui is set to host.docker.internal in .env and that the username and password are both admin if you want to use them with the local instance.

4. (Optional) Populate data in the local database

To get some example documents onto the local database, there are development_scripts/populate_top_judgments_and_neighbours.py and development_scripts/populate_from_caselaw.py which copy documents from the live caselaw site (they don't import or fake properties) into your local database. (Check https://caselaw.nationalarchives.gov.uk/terms-of-use and get in touch if you intend to download many more than these.)

There are also other ways other importing data as detailed further down the readme but haven't been tested for a while.

5. (Optional) Run unit tests

You can run the unit tests with gradle mlUnitTest. This relies on the tests being deployed; use gradle mlDeploy in the first instance, and make sure that you have gradle mlWatch -i running to automatically deploy changes as you make them.

gradle mlGenerateUnitTestSuite will create a new stub test suite, and gradle mlClearModulesDatabase might be needed if you create tests and then later delete them.

Release versioning

The releases are currently manually tagged. Please do not deploy to production without tagging a release. Currently there is no auto-deployment of releases, but we are using releases & tags to keep track of what has been deployed to production.

To create a versioned release, use Github's release process to create a tag and generate release notes.

When deploying to production, check out the tag you want to deploy using (for example) git checkout tags/v1.0.0 then deploy from there. Git will put you into a "detatched head" state, and once you have finished deploying you can switch back to the main branch (or any branch) by using git checkout branchname as normal.

TODO: Automatically deploy main to staging, and tags to production using CodeBuild.

Bulk import

(This hasn't been used in a long time)

Place the XML files you want to import in the import folder of this repo, then run gradle importDocuments. The documents will be imported, and the URI will be set as the full file path and name within import.

You may want to run gradle publishAllDocuments (see below) afterwards. All files are automatically put under management on import, so there is no need to run the manage task.

Bulk export

To export the latest versions of all documents, for instance for bulk processing, you can use: gradle mlExportToZip -PwhereUrisQuery="const dls = require('/MarkLogic/dls'); cts.uris('', [], dls.documentsQuery())" -PenvironmentName=<env> -PexportPath=export.zip

Document processing

Two gradle tasks are available for bulk management of documents in a database using CoRB. In production these should not be necessary to use, but are provided in order to automate some development tasks and provide examples for future data migrations.

  • gradle manageAllDocuments: Enables version management for all documents
  • gradle publishAllDocuments: Sets the published flag for all documents
  • gradle addAllDocumentsToJudgmentsCollection: Adds all documents to the 'judgments' collection.

Loading data from a backup on S3 (deprecated)

Rather than running an import of a set of files, you can restore from a shared backup. Note that this bucket is currently only available to dxw developers.

  1. First, navigate to http://localhost:8001/, which will ask for basic auth. Username and password are both admin.
  2. Then add AWS credentials to MarkLogic (under Security > Credentials), so it can pull the backup from a shared S3 bucket. The credentials (AWS access ID & secret key) should be for your dxwbilling account. You will need to create them in AWS if you haven't already.
  3. In the Backup/Restore tab in Marklogic for your the caselaw-content Judgments database, initiate a restore, using the following as the "directory": s3://tna-judgments-marklogic-backup/. Set Forest topology changed to true.
  4. Uncheck the security database when restoring or your passwords will be wiped.

Assuming you have entered the S3 credentials correctly, this will kick off a restore from s3. Once you have the data locally, you can then back it up locally using the path /var/opt/backup in the management console. It will be backed up to your local machine in docker/db/backup

Depending on the backup state, you may need to run gradle manageAllDocuments and gradle publishAllDocuments after the restore has finished.

Marklogic URL Guide

All four URLs use basic auth, username and password are both admin.