Skip to content

Commit

Permalink
fix spelling
Browse files Browse the repository at this point in the history
  • Loading branch information
torbjornbp committed Oct 9, 2024
1 parent b8169e1 commit 7ed6b05
Show file tree
Hide file tree
Showing 5 changed files with 28 additions and 30 deletions.
8 changes: 4 additions & 4 deletions content/docs/sip-specification/internal-sip-policy/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,13 @@ The majority of preserved data in the National Library is either:
- produced through in-house digitization workflows
- deposited from external producers, but processed and structured through internal workflows

Currently there are 29 such workflows processing data and sending packages to the DPS for long term digital preservation.
Currently, there are 29 such workflows processing data and sending packages to the DPS for long term digital preservation.
Our internal SIP structuring policy thus needs to function and scale for all existing data workflows.

This is not attempt to draw up an ideal systems architecture of how it *should* be, but rather an attempt draw the map as it is (from the point of view of the digital preservation team).
These documents serves as the basis for our ongoing E-ARK standardization work, where we will create different more specific profiles for various content types.
This is not an attempt to draw up an ideal system architecture of how it *should* be, but rather an attempt to draw the map as it is (from the point of view of the digital preservation team).
These documents serve as the basis for our ongoing E-ARK standardization work, where we will create different, more specific profiles for various content types.

The following documents describe the current systems architecture context we are working within, and, define Intellectual Entity and SIP scope that functions in this context.
The following documents describe the current systems architecture context we are working within, and, define an intellectual scope of SIPs that function in this context.

{{< cards >}}
{{< card link="systems-architecture" title="Systems domain architecture" icon="document-text" >}}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ In these systems we tend to operate with a lot of different IEs, usually organiz
In use-case examples of PREMIS and E-ARK, it is usually the highest level entity from these hierarchies, that is referred to as the IE and used to define *intellectual scope of packages/SIPs*, ie. a *work* or *expression*.
However, we have to define scope differently, using an entity that sits at a lower level of description:

- SIP scope is defined by the metadata management system IE that holds the UID linking the IE to the SIP.
- SIP scope is defined by the metadata management system IE that holds the UID linking the IE to the SIP.

This is a necessity for keeping all components of our [systems architecture](/systems-architecture) in sync.
This is a necessity for keeping all components of our [system architecture](/system-architecture) in sync.
The UID sits at specifically defined IEs in our metadata mangement systems.

## Hierarchies and flatness
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Entity management architecture
summary: This post discusses high-level metadata and data handling at the National Library of Norway
date: 2024-10-04
tags: [Systems architecture, PREMIS, Intellectual entities, representations]
tags: [System architecture, PREMIS, Intellectual entities, representations]
authors:
- name: Torbjørn Bakken Pedersen
image: https://avatars.githubusercontent.com/u/113333557?v=4
Expand All @@ -21,15 +21,13 @@ Some of these terms come from PREMIS, which again is a framework mainly used in
IEs tend to describe *intellectual content*.
In our suggested implementation of the SIP there is a 1:1 relationship between SIP and IE.
At the National Library of Norway the IE is the entity that is identified by the UID linking the our three different system environments together.
At the National Library of Norway, the IE is the entity that is identified by the UID linking our three different system environments together.
Typically, this is the **smallest** described size in any of our metadata management systems.
This is expanded upon in the following text about [intellectual entities and unique IDs FIKS LENKE](/intellectual-scope).
This is expanded upon in the following text about [intellectual entities and unique IDs](/intellectual-sip-scope).

The metadata at the core describes the IE that the SIP represents.
The representations are different data renditions of the IE, and thus do not have their own discrete descriptive metadata.
SIPs therefore have metadata about, and representations of, intellectual content.
<!-- In our metadata management systems, we tend to operate with a lot of different IEs, usually organized in some sort of hierarchy.
In use-case examples of PREMIS and E-ARK, it is usually the highest level entity from these hierarchies, that is referred to as the IE and used to define *intellectual scope of packages*, ie. a *work* or *expression*. -->

### Representations
In the E-ARK SIP specification a SIP is a package holding *metadata* and *representations*.
Expand Down Expand Up @@ -71,7 +69,7 @@ These files are *organized* by intellectual entities and representations.

Files are ingested to the DPS through the delivery of SIPs, which again mirror intellectual entities found in the Metadata management systems.
For the majority of SIPs handled in the National Library, there is a single representation per package.
As mentioned in the [systems architecture description](/systems-architecture), access copies automatically derived from the preserved master file are usually not handled by the DPS.
As mentioned in the [system architecture description](/system-architecture), access copies automatically derived from the preserved master file are usually not handled by the DPS.

[^2]: The bitstream level is not yet described in the DPS, but it could be in the future.

Expand All @@ -83,7 +81,7 @@ However, If you are seeking files based solely on their technical properties, yo
The public access services manage and provide access to *access representations* and *files*, in addition to harvested intellectual entity descriptive metadata.
The data and metadata here is a subset of what is found in the metadata management systems and the DPS.

The public access services transforms harvested metadata in a flattened structure of intellectual entities with a single representation each.
The public access services transform harvested metadata into a flattened structure of intellectual entities with a single representation each.
The intellectual entities found online, does not necessarily mirror a single intellectual entity found in the metadata management systems.

## Architecture
Expand All @@ -92,7 +90,7 @@ We can draw up another idealized architecture diagram, using PREMIS entities, to
{{< figure src="premis.svg" alt="Diagram showing various systems' responsibility for PREMIS entities" caption="PREMIS entities across our systems" >}}

The representation entity is somewhat complicated to understand here.
There is a a 1:1 relationship between the IE used to define the package scope and its primary representation in the metadata management systems.
There is a 1:1 relationship between the IE used to define the package scope and its primary representation in the metadata management systems.
This means the primary representation per package often *is* described in technical terms in these systems, even though they do not operate with a representation level as a discrete entity.
Any *additional* representations, however, are *not* described in the metadata management systems.
They are only described in the DPS.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ authors:
- name: Torbjørn Bakken Pedersen
image: https://avatars.githubusercontent.com/u/113333557?v=4
images:
- avsip.pn
- avsip.png
weight: 4
aliases: ["/representation-types"]
---
Expand All @@ -28,24 +28,24 @@ They are used as a proxy/stand-in for the primary representation.
They typically contain much smaller, lossy (or lossier) files.

Any representation can in theory have an access representation, but these are primarily managed and stored in the public access environment alone.
Access representations are only kept in the DPS if they are the result of significant labour and/or cannot be easily derived from the primary representation.
The public access environment on the other hand, only supports a single access representatation per UID.
Access representations are only kept in the DPS if they are the result of significant labor and/or cannot be easily derived from the primary representation.
The public access environment, on the other hand, only supports a single access representation per UID.

## Preservation derivate
In the case where the primary data object is normalized or converted to a different format for preservation, you can use the preservation derivate representation type.
This enables us to preserve both the primary data object and its preservation derivate.
Currently this is more of a hypothetical use-case, than something that regularly happens in the organization.
Currently, this is more of a hypothetical use-case, than something that regularly happens in the organization.

Below is a few examples showing typical SIPs with representations:
Below are a few examples showing typical SIPs with representations:

## Examples:
### Example showing a typical SIP with a single representation
{{< figure src="1repsip.svg" alt="Film digitization SIP with 1 representation" >}}

### Example with two representations
In-house digitization of photo negatives currently produce a large TIFF file for preservation and an inverted and heavily post-processed access JP2-file.
In-house digitization of photo negatives currently produces a large TIFF file for preservation and an inverted and heavily post-processed access JP2-file.
Only the TIFF is described using a carrier in the metadata management system, but both digital objects are preserved in the DPS.
The JP2 is the result of extensive manual labour and can not be automatically or easily reproduced from the primary TIFF.
The JP2 is the result of extensive manual labor and cannot be automatically or easily reproduced from the primary TIFF.

The TIFF file is contained in the primary representation in the SIP, while the JP2 is contained in an access derivate representation.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ authors:
- name: Torbjørn Bakken Pedersen
image: https://avatars.githubusercontent.com/u/113333557?v=4
images:
- arkitektur.pn
- arkitektur.png
weight: 1
aliases: ["/systems-architecture"]
---

The digital preservation team uses the [Open Archival Information system functional model](https://en.wikipedia.org/wiki/Open_Archival_Information_System#The_functional_model "Wikipedia page explaining the OAIS functional model") (OAIS) as a reference point.
However, as our systems architecture is complex and has evolved over several years, you can't simply overlay the OAIS model over our organization and make sense of what is happening.
However, as our system architecture is complex and has evolved over several years, you can't simply overlay the OAIS model over our organization and make sense of what is happening.

The digital preservation team develops and manages the Digital Preservation Services (DPS) software, but this is only one aspect of data and metadata management in the National Library.
As we continue to develop the DPS and standardizing information packages, awareness of the different system domains' responsibilities, and how these interact, is essential.
Expand Down Expand Up @@ -47,36 +47,36 @@ These systems hold the key UID that allows for identification of access and pres
[^2]: Some of these systems are also exposed externally.

### Digital Preservation Services (DPS)
Our Digital Preservation Services manages all data in our bit-repository and controls data integrity and access for long-term storage.
Our Digital Preservation Services manage all data in our bit-repository and controls data integrity and access for long-term storage.
It is in this environment we operate with the OAIS concepts of SIP and AIP.
Along with the preserved data we store a *copy* of select descriptive data, to make the digital object identifiable and usable in the long-term.
Along with the preserved data, we store a *copy* of select descriptive data, to make the digital object identifiable and usable in the long term.

SIPs received for preservation in the DPS, represents some described entity in the metadata management systems.
The DPS holds the "truth" for technical *file* metadata.

The DPS can be used to identify and find *files* (through technical metadata), but this is still a rare use-case in our organization.

### Public access services
The public access services manages a *harvested* subset of descriptive metadata from our metadata management systems and access *files*.
The public access services manage a *harvested* subset of descriptive metadata from our metadata management systems and access *files*.
They disseminate this on our public facing webpages [NB.no](https://www.nb.no/search "National library online portal").
The access files are proxy copies derived from preserved high quality files in the DPS.
The access files are proxy copies derived from preserved high-quality files in the DPS.
These are typically smaller and lossy derivates of the much larger preservation files in the DPS.

The public access services are our public facing discovery and access systems.
They disseminate a subset of harvested metadata from the metadata management systems and related access data online.

## Architecture
This is an idealized and simplified version of our architecture, but still helpful to understand the kind of systems interactions we deal with.
This is an idealized and simplified version of our architecture, but still helpful to understand the kind of system interactions we deal with.
While we use the OAIS framework to discuss our architecture, the various OAIS components and flows becomes quite abstract in this context[^3].

[^3]: You could apply the DIP concept to the public access services' dissemination of access copies, but traditionally we have only used the OAIS terminology in the digital preservation domain.
[^3]: You could apply the DIP concept to the public access services' dissemination of access copies, but traditionally, we have only used the OAIS terminology in the digital preservation domain.

{{< figure src="arkitektur.svg" alt="architecture diagram" caption="Data and metadata flow between systems" >}}

The SIP in this drawing contains data for preservation in addition to a copy of metadata from the metadata management systems in a standardized format (e.g. MODS).
Our DPS is currently not exposed to the public.
Any public access to preserved data goes through other internal services built on top of the DPS.
The DPS does not preserve access copies that can be automatically derived from presservation files.
The DPS does not preserve access copies that can be automatically derived from preservation files.
Such copies are managed in the public access services.


0 comments on commit 7ed6b05

Please sign in to comment.