used for blackbox testing, data-ingestion procedures
Make sure that your email server is NOT running because some of the endpoints that are used are sending emails to the input email addresses. For example, when using the endpoint for creating new registration data, there exists automatic function that sends email, what we don't want because we use this endpoint for importing existing data.
grep mail.server.disabled local.cfg
mail.server.disabled=true
docker compose -p d7ws exec dspace /dspace/bin/dspace dsprop -p mail.server.disabled
true-
Install CLARIN-DSpace7.*. (postgres, solr, dspace backend) - you can use
docker compose. At this point keep yourlocal.cfgminimal, you'll modify it when the migration is done. -
clone these sources
2.1. Clone python-api: https://github.com/ufal/dspace-python-api (branch
main)2.2. Clone submodules:
git submodule update --init libs/dspace-rest-python/ -
Get database dump (old CLARIN-DSpace) and unzip it into
input/dumpdirectory indspace-python-apiproject.
-
Go to the
dspace/binin dspace7 installation and run the commanddspace database migrate force(force because of local types). NOTE:dspace database migrate forcecreates default database data that may be not in database dump, so after migration, some tables may have more data than the database dump. Data from database dump that already exists in database is not migrated. -
Create an admin by running the command
dspace create-administratorin thedspace/bin
:~$ ls -R ./input
input:
dump
input/dump:
clarin-dspace.sql clarin-utilities.sql
- Create CLARIN-DSpace5.* databases (dspace, utilities) from dump. Either:
- run
scripts/start.local.dspace.db.bator usescipts/init.dspacedb5.shdirectly with your database. - do the import manually
docker compose -p d7ws exec dspacedb createdb -p 5430 --username=dspace --owner=dspace --encoding=UNICODE clarin-dspace docker compose -p d7ws exec dspacedb createdb -p 5430 --username=dspace --owner=dspace --encoding=UNICODE clarin-utilities cat input/dump/clarin-utilities.sql | docker compose -p d7ws exec -T dspacedb psql -p 5430 --username=dspace clarin-utilities cat input/dump/clarin-dspace.sql | docker compose -p d7ws exec -T dspacedb psql -p 5430 --username=dspace clarin-dspace
- run
- install dependencies for this project (ideally in a python venv)
pip install -r requirements.txt
- update
project_settings.pywith the db connection and admin user details- You can use an internal backend IP
- With
"testing": Truethe mechanism described in ufal#4 (comment) should kick in (if you also have the mentioned file)You don't want this in the final migrationdocker compose -p d7ws exec dspace bash -c "mkdir -p /tmp/asset && pushd /tmp/asset && curl -LJO https://github.com/user-attachments/files/21145749/57024294293009067626820405177604023574.zip && mkdir -p /dspace/assetstore/57/02/42 && zcat 570242* > /dspace/assetstore/57/02/42/57024294293009067626820405177604023574 && popd && rm -rf /tmp/asset" - Think about the
ignoresection, usually you don't want to ignore eperson198(but you might have other that you won't be migrating) - If you were using many custom licenses and their text was under
xmlui/pageyou can use thelicenseshash to do an automatic udpate of the urls
-
Make sure, your backend configuration (
local.cfg) includes all handle prefixes, that you use, in thehandle.additional.prefixesproperty, e.g.,handle.additional.prefixes = 11858, 11234, 11372, 11346, 20.500.12801, 20.500.12800. Get them from the old database:clarin-dspace=# select distinct(split_part(handle, '/', 1)) as prefix from handle; -
This project only migrates the data stored in the database(s) not the actual files. Copy
assetstorefrom dspace5 to dspace7 (for bitstream import).assetstoreis in the folder where you have installed DSpacedspace/assetstore.
- NOTE: database must be up to date (
dspace database migrate forcemust be called in thedspace/bin) - NOTE: dspace server must be running
- run command
cd ./src && python repo_import.py - check the logs (by default) in
__logsfor CRITICAL, ERROR or WARNING and especially when you see 500 errors check also the dspace.log (on backend in /dspace/log/dspace.log) - if you need to rerun the migration you simply drop (including the volumes) the compose project (or just the database as suggested in ufal#4 (comment)) and recreate the admin account etc. (if your dumps are in dspacedb container you'll need to recreate those as well). Also consider wiping the migration logs and temp files.
docker compose -p d7ws down --volumes docker compose --env-file .env -p d7ws -f docker/docker-compose.yml -f docker/docker-compose-rest.yml up -d docker compose --env-file .env -p d7ws -f docker/docker-compose.yml -f docker/docker-compose-rest.yml -f docker/cli.yml run --rm dspace-cli create-administrator -e test@test.edu -f Sys -l Admin -p password -c en -o UFAL docker compose -p d7ws exec dspace bash -c "mkdir -p /tmp/asset && pushd /tmp/asset && curl -LJO https://github.com/user-attachments/files/21145749/57024294293009067626820405177604023574.zip && mkdir -p /dspace/assetstore/57/02/42 && zcat 570242* > /dspace/assetstore/57/02/42/57024294293009067626820405177604023574 && popd && rm -rf /tmp/asset"
rm -rf __logs/ src/__temp/ input/tempdbexport_v*
- The values of table attributes that describe the last modification time of dspace object (for example attribute
last_modifiedin tableItem) have a value that represents the time when that object was migrated and not the value from migrated database dump. - If you don't have valid and complete data, not all data will be imported.
- check if license link contains XXX. This is of course unsuitable for production run!
Use tools/repo_diff utility, see README.