Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access Rights metadata in OpenAIRE metadata export is being misapplied #5920

Open
jggautier opened this issue Jun 6, 2019 · 8 comments
Open

Comments

@jggautier
Copy link
Contributor

jggautier commented Jun 6, 2019

As part of v4.14 (released in May 2019), Dataverse makes available through the UI, API and over OAI-PMH DataCite metadata that complies with OpenAIRE requirements (#4257). Repositories need to follow these requirements in order for their dataset metadata to be made discoverable in OpenAIRE EXPLORE.

The required metadata export called OpenAIRE (in the Dataverse UI) or oai_datacite (over API and OAI-PMH) includes one of four Access Rights terms, which come from the info:eu-repo-Access-Terms vocabulary:

  • Open access
  • Restricted access
  • Closed access
  • Embargoed access

Dataverse chooses these terms based on whether or not any dataset files are set to restricted and whether or not people are able to request access to those restricted files using Dataverse's request access feature:

  • openAccess: If no files are set to restricted, the metadata export uses "openAccess"
  • restrictedAccess: If any of the files in the dataset are set to restricted and the option to request access is enabled (people are allowed to request access using Dataverse's request access feature), the metadata export uses "restrictedAccess"
  • closedAccess: If any of the files in the dataset are set to restricted and the option to request access is disabled, the metadata export uses "closedAccess"
  • embargoedAccess: Is not used because at the time, Dataverse had no way to tell if a dataset has an embargo

There are datasets in Dataverse repositories whose files are set to restricted, and people cannot request access through Dataverse's request access feature. The OpenAIRE metadata export for these datasets uses closedAccess, even when the dataset metadata indicates that people can request access by some process that happens outside of Dataverse's request access feature, e.g. submitting a DUA or contacting the author.
 
Untitled-1
This dataset has restricted files and people aren't able to request access through Dataverse's request access feature, so its OpenAIRE metadata indicates that the dataset is closed access. But people are able to request access by filling out a form (Application For The Use of Data), so the dataset isn't really closed access.

 
When these datasets are harvested by OpenAIRE, because the metadata says they're closedAccess they'll appear and be searchable as closedAccess, grouped with datasets that are more appropriately labelled closedAccess, even though file access is only restricted. This may make these datasets harder to find and use, making OpenAIRE EXPLORE less effective for finding datasets published by Dataverse repositories.

We can think of better ways for Dataverse to assign rights access terms in ways that the Dataverse community thinks are more appropriate (e.g. Zenodo depositors choose from a drop-down menu). But other data publishers are using these rights access terms (or those terms are being applied to the harvested datasets) in a variety of ways that can make using the Access Rights filters unhelpful for searching through OpenAIRE EXPLORE. "Open data" already means many different things to different groups. Since these Access Rights terms are used for the benefit of finding data in OpenAIRE EXPLORE, the scope of this issue might involve learning how OpenAIRE might want to improve the definitions and how repositories can use them in more standardized ways.

@jggautier
Copy link
Contributor Author

jggautier commented Jul 30, 2019

I wonder if it might be safe to never use "Closed Access", use "Restricted Access" for datasets that have restricted files, and use "Open Access" for all other datasets. Does anyone ever publish datasets whose files can't be accessed at all?

If so, it might help if Dataverse allows depositors to indicate, in a standardized and machine-readable way, that access to restricted files can be requested (even if people need to request access outside of Dataverse's request access feature) or cannot be requested through any means

@cmbz
Copy link

cmbz commented Aug 20, 2024

To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'.

If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment.

@philippconzett
Copy link
Contributor

I only recently came aware of this issue. I think resolving this issue eventually depends on #4391 being resolved first. Thus, to me, it seems the Access Rights terms used by OpenAIRE and others (e.g., BASE Bielefeld) depend on Terms of Use being defined at file-level.

With support for file-level Terms of Use being implemented, I think things would work like this: At the metadata record level, thus the registered metadata at dataset or file-level should always be licensed with CC0 and thus have the Access Rights terms defined as "Open access". At file-level, all of the values can be used, based on the Terms of Use of the individual file at stake:

  • openAccess: If the file is not set to restricted or embargoed, the metadata export at file-level should use "openAccess".
  • restrictedAccess: If the file is set to restricted and the option to request access is enabled (people are allowed to request access using Dataverse's request access feature), the metadata export at file-level should use "restrictedAccess".
  • closedAccess: If the file is set to restricted and the option to request access is disabled, the metadata export at file-level should use "closedAccess".
  • embargoedAccess: If the file is set to embargoed, the metadata export at file-level should use "embargoedAccess".

@jggautier jggautier reopened this Aug 30, 2024
@jggautier
Copy link
Contributor Author

jggautier commented Sep 9, 2024

@pdurbin and I talked about this issue in relation to #10737 and #8129. And I agreed that I'd open a new GitHub issue about dc:rights specifically, to help manage these different goals and scopes.

But @philippconzett, what do you think of using this GitHub issue instead, since we're already talking about the use of these "Access Rights terms used by OpenAIRE and others (e.g., BASE Bielefeld)"?

I could re-word this GitHub issue's title so it's clear that the issue is about all uses of these "Access Rights" terms, and edit the first comment for the same reason.

@pdurbin
Copy link
Member

pdurbin commented Sep 9, 2024

I wanted to link to something so I went ahead with the idea that this issue represents the unfinished dc:rights work that was originally part of the scope of #8129, which (if all goes will) will be closed by PR #10737.

The next challenge will be to size it, of course, and figure out what the plan is and when. 😅

@philippconzett
Copy link
Contributor

@jggautier @pdurbin Thanks for moving this forward. I think both approaches could work, thus continuing using this issue or creating a new one.

@jggautier
Copy link
Contributor Author

jggautier commented Sep 10, 2024

Thanks. #4176 is also about changes to what's included in dc:rights and we'll need to consider the points raised there, too.

Next week I'll try to find time to help think about either using this GitHub issue or creating a new one, but with other projects and work travel next week, I'm not sure. I definitely don't have time this week.

@jggautier
Copy link
Contributor Author

So I definitely didn't have time "next week" lol. I'm going to try to sneak some time in today to continue the discussion.

I'm going to keep using this GitHub issue for discussion about how access rights metadata in OpenAIRE metadata is being misapplied.

@philippconzett, I have questions and comments about what you wrote:

I only recently came aware of this issue. I think resolving this issue eventually depends on #4391 being resolved first. Thus, to me, it seems the Access Rights terms used by OpenAIRE and others (e.g., BASE Bielefeld) depend on Terms of Use being defined at file-level.

With support for file-level Terms of Use being implemented, I think things would work like this: At the metadata record level, thus the registered metadata at dataset or file-level should always be licensed with CC0 and thus have the Access Rights terms defined as "Open access". At file-level, all of the values can be used, based on the Terms of Use of the individual file at stake:

  • openAccess: If the file is not set to restricted or embargoed, the metadata export at file-level should use "openAccess".
  • restrictedAccess: If the file is set to restricted and the option to request access is enabled (people are allowed to request access using Dataverse's request access feature), the metadata export at file-level should use "restrictedAccess".
  • closedAccess: If the file is set to restricted and the option to request access is disabled, the metadata export at file-level should use "closedAccess".
  • embargoedAccess: If the file is set to embargoed, the metadata export at file-level should use "embargoedAccess".

OpenAIRE uses their OpenAIRE standard to determine if a dataset is openAccess, restrictedAccess, closedAccess or embargoedAccess.

It sounds like you're proposing that the OpenAIRE XML exports of datasets would always indicate that the metadata of the dataset is CC0 and "openAccess". Am I understanding that right?

If so, as far as I know, the OpenAIRE standard doesn't have a way to indicate the terms or license of the metadata. As we know, it includes a way to indicate the license or terms of the data that the metadata describes, and I think that's all it can do.

And I think that being able to describe the terms or license of the metadata of the dataset is out of scope here. OpenAIRE's system wants to know the access level of the data in the dataset, using just one of those four access levels. This GitHub issue is about challenges with providing that information to OpenAIRE. The use case I described in this issue's first post assumes that a dataset can be usefully described with just one of the four access levels.

But that model doesn't work when one dataset has data with multiple access levels right? I think that's the gist of your comments. And if we want to resolve that, then I think it means also working with the OpenAIRE folks so that their systems can support searching for datasets by access level when those datasets have multiple access levels because the datasets' files have multiple access levels.

Does all of the make sense? I'd like to make sure before we start thinking about solutions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants