-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Access Rights metadata in OpenAIRE metadata export is being misapplied #5920
Comments
I wonder if it might be safe to never use "Closed Access", use "Restricted Access" for datasets that have restricted files, and use "Open Access" for all other datasets. Does anyone ever publish datasets whose files can't be accessed at all? If so, it might help if Dataverse allows depositors to indicate, in a standardized and machine-readable way, that access to restricted files can be requested (even if people need to request access outside of Dataverse's request access feature) or cannot be requested through any means |
To focus on the most important features and bugs, we are closing issues created before 2020 (version 5.0) that are not new feature requests with the label 'Type: Feature'. If you created this issue and you feel the team should revisit this decision, please reopen the issue and leave a comment. |
I only recently came aware of this issue. I think resolving this issue eventually depends on #4391 being resolved first. Thus, to me, it seems the Access Rights terms used by OpenAIRE and others (e.g., BASE Bielefeld) depend on Terms of Use being defined at file-level. With support for file-level Terms of Use being implemented, I think things would work like this: At the metadata record level, thus the registered metadata at dataset or file-level should always be licensed with CC0 and thus have the Access Rights terms defined as "Open access". At file-level, all of the values can be used, based on the Terms of Use of the individual file at stake:
|
@pdurbin and I talked about this issue in relation to #10737 and #8129. And I agreed that I'd open a new GitHub issue about dc:rights specifically, to help manage these different goals and scopes. But @philippconzett, what do you think of using this GitHub issue instead, since we're already talking about the use of these "Access Rights terms used by OpenAIRE and others (e.g., BASE Bielefeld)"? I could re-word this GitHub issue's title so it's clear that the issue is about all uses of these "Access Rights" terms, and edit the first comment for the same reason. |
I wanted to link to something so I went ahead with the idea that this issue represents the unfinished dc:rights work that was originally part of the scope of #8129, which (if all goes will) will be closed by PR #10737. The next challenge will be to size it, of course, and figure out what the plan is and when. 😅 |
@jggautier @pdurbin Thanks for moving this forward. I think both approaches could work, thus continuing using this issue or creating a new one. |
Thanks. #4176 is also about changes to what's included in dc:rights and we'll need to consider the points raised there, too. Next week I'll try to find time to help think about either using this GitHub issue or creating a new one, but with other projects and work travel next week, I'm not sure. I definitely don't have time this week. |
So I definitely didn't have time "next week" lol. I'm going to try to sneak some time in today to continue the discussion. I'm going to keep using this GitHub issue for discussion about how access rights metadata in OpenAIRE metadata is being misapplied. @philippconzett, I have questions and comments about what you wrote:
OpenAIRE uses their OpenAIRE standard to determine if a dataset is openAccess, restrictedAccess, closedAccess or embargoedAccess. It sounds like you're proposing that the OpenAIRE XML exports of datasets would always indicate that the metadata of the dataset is CC0 and "openAccess". Am I understanding that right? If so, as far as I know, the OpenAIRE standard doesn't have a way to indicate the terms or license of the metadata. As we know, it includes a way to indicate the license or terms of the data that the metadata describes, and I think that's all it can do. And I think that being able to describe the terms or license of the metadata of the dataset is out of scope here. OpenAIRE's system wants to know the access level of the data in the dataset, using just one of those four access levels. This GitHub issue is about challenges with providing that information to OpenAIRE. The use case I described in this issue's first post assumes that a dataset can be usefully described with just one of the four access levels. But that model doesn't work when one dataset has data with multiple access levels right? I think that's the gist of your comments. And if we want to resolve that, then I think it means also working with the OpenAIRE folks so that their systems can support searching for datasets by access level when those datasets have multiple access levels because the datasets' files have multiple access levels. Does all of the make sense? I'd like to make sure before we start thinking about solutions. |
As part of v4.14 (released in May 2019), Dataverse makes available through the UI, API and over OAI-PMH DataCite metadata that complies with OpenAIRE requirements (#4257). Repositories need to follow these requirements in order for their dataset metadata to be made discoverable in OpenAIRE EXPLORE.
The required metadata export called OpenAIRE (in the Dataverse UI) or oai_datacite (over API and OAI-PMH) includes one of four Access Rights terms, which come from the info:eu-repo-Access-Terms vocabulary:
Dataverse chooses these terms based on whether or not any dataset files are set to restricted and whether or not people are able to request access to those restricted files using Dataverse's request access feature:
There are datasets in Dataverse repositories whose files are set to restricted, and people cannot request access through Dataverse's request access feature. The OpenAIRE metadata export for these datasets uses closedAccess, even when the dataset metadata indicates that people can request access by some process that happens outside of Dataverse's request access feature, e.g. submitting a DUA or contacting the author.
This dataset has restricted files and people aren't able to request access through Dataverse's request access feature, so its OpenAIRE metadata indicates that the dataset is closed access. But people are able to request access by filling out a form (Application For The Use of Data), so the dataset isn't really closed access.
When these datasets are harvested by OpenAIRE, because the metadata says they're closedAccess they'll appear and be searchable as closedAccess, grouped with datasets that are more appropriately labelled closedAccess, even though file access is only restricted. This may make these datasets harder to find and use, making OpenAIRE EXPLORE less effective for finding datasets published by Dataverse repositories.
We can think of better ways for Dataverse to assign rights access terms in ways that the Dataverse community thinks are more appropriate (e.g. Zenodo depositors choose from a drop-down menu). But other data publishers are using these rights access terms (or those terms are being applied to the harvested datasets) in a variety of ways that can make using the Access Rights filters unhelpful for searching through OpenAIRE EXPLORE. "Open data" already means many different things to different groups. Since these Access Rights terms are used for the benefit of finding data in OpenAIRE EXPLORE, the scope of this issue might involve learning how OpenAIRE might want to improve the definitions and how repositories can use them in more standardized ways.
The text was updated successfully, but these errors were encountered: