This repository contains a LO Basic macro and a LO Calc template file which can be used to generate DSpace SAF import package.
We are periodically facing the necessity to create DSpace compatible import packages from a given set of text data like CSV or Excel sheets. This necessity aimed to born this script.
The process of the import package creation is the following:
- The issuer (aka the customer) has two options: he gives us the metadata in textual format or fills out our LO Calc template.
- We check the data and correct them according to the metadata rules.
- We run the macro.
- Then we compress the generated folders into a ZIP package and import it via the UI or at DS the console.
The script is able to produce import packages where the items have different metadata schemas. For example, if the items have schema DC and DCTERMS mixed it will then create different XML metadata files according to the DSpace manual. But do not forget to create these metadata entries within the DSpace server before importing the package.
Any number of metadata value can be added to the items. The only one limit is the LibreOffice Calc's limit of the available column number.
We chose LibreOffice because it can be freely available for anybody. Unfortunately it is not compatible with the Excel from the Microsoft Office package but it works on the same level as its counterpart.
Installation of the macro:
- Open the macro organization window: Tools -> Macros -> Organize macros -> Basic...
- Click on the Organizer... button on the right.
- Within the Module list, highlight the Standard item and then press New on the bottom right.
- Enter a name for the module. E.g. Table2SAF or such and click on OK.
- Close the Basic Macro Organizer window and highlight the newly created module and press the Edit button on the right.
- Clear all of the lines in the code window and select the menu File -> Import Basic.
- Choose the file calc2saf.bas and press Open
- Finally save it.
The macro does not use any kind of special classes and it should run from LibreOffice version 6.3 and up without issues.
The basic macro can process the LibreOffice Calc template file. This file consist of two sheets:
-
The sheet storing the actual metadata values.
The first line and the first column contain mandatory data and they should not be (re)moved.
- The first column must contain some kind of identification data which identifies the different items within the sheet. It can be a file name or an increasing sequence of numbers.
- The following columns are optional and fully customizable by the configuration parameters. Within these, there are six special columns and the ones with the actual metadata.
- The first line of the metadata columns must contain the metadata key. E.g. dc.contributor.author . Above this line, you may have additional lines but this metadata line must go before all of the item lines.
-
The sheet for the configuration parameters of the macro.
We may have items in two ways.
- Each item has exactly one attachment (bitstream in DSpace terms). In this case the identification column (column A) must contain the file names and no real file names column could be presented.
- At least one item has two or more different attachments. In this case one column must be filled with the real file names belonging to the items. The very first, identification column must not hold the real file names but any kind of sequence.
Configurable parameters on the sheet Config
- Metadata separator character: the script will split the metadata keys using this character. Defaults to '.'
- Base folder: the full directory path where the attachments of the items are placed. It is recommended to create a folder next to the metadata table and place the files in them.
- Sheet name: the name of the sheet where the metadata are placed.
- Metadata row: the line where the metadata keys are found. Recommended is the very first line.
- First data row: there can be additional lines between the metadata keys line and the actual item lines. E.g. descriptions or explanations may be written for the librarians. This parameters tells the script where to start reading the actual item lines.
- Last data column: the last column where data can be found.
- Skip these columns: comma separated list of column numbers which should be excluded from the processing. These columns are for the process only and not for the items, see below.
Leave the parameter empty if not in use.
- Bundle column: the bundle string. E.g. ORIGINAL
- Permissions column: the permissions string. Persmissions string should be the name of a group registered in DS.
- Permissions string column: which kind of rights has the group for the item? w, r, or r|w
- Description column: this text will be displayed under the bitstream image on the DSpace' UI.
- Primary column: true means the bitstream will be the first one when displaying the item and false means the opposite.
- File name column: this is the actual file name which belongs to the item. Use it only when there are at least one item having multiple attachments.
- Assetstore number column: if this is not empty then the bitstream will be registered for the item as an existing one on the specified assetstore. Leave empty if the bitstream can be stored in the default store. Check out DSpace the manual
- Collection handle: this collection will contain the items.
At the end of the generation, there will be different folders for each items under the base folder. The folders will be named based on the identification column of the metadata sheet. Within these folders there will be few files only:
- contents file: this stores the bitstream information.
- collections file: the collection handle which will contain the items.
- dublin_core.xml, metadata_XXX.xml: the actual metadata entries in XML format. Each namespace has exactly one XML file.
- the attachments of the item. If the item has attachment which is already stored under an assetstore 1 ... N then this file will not be copied into this directory but the entry will be added for the contents file.
Finally the package may be imported by DSpace in two different ways. It can be packed by the ZIP algorithm and uploaded via the user interface or it can be uploaded directly to the DSpace server (w/o packaging) and import from the console. E.g. [dspace-dir]/bin/dspace import -add --eperson=uploader@domain.com --collection=ANY-HANDLE --source=<package-path> -m <mapfile-path> --zip=<zip-path-if-any>