Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update get file endpoint to add a datasetVersion optional query parameter and extend its payload #10299

Merged
merged 33 commits into from
Feb 23, 2024
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
f4b9483
Stash: getFileData by datasetVersionId param WIP
GPortas Feb 6, 2024
bf3c2c7
Fixed: if condition and endpoint path
GPortas Feb 7, 2024
9d07eee
Merge branch 'develop' of github.com:IQSS/dataverse into 10280-get-fi…
GPortas Feb 7, 2024
c8f8227
Added: new commands for getting FileMetadata
GPortas Feb 7, 2024
8cda7ce
Added: readability minor change
GPortas Feb 7, 2024
397dbfb
Changed: getFileData endpoint using new commands through DsVersionHan…
GPortas Feb 8, 2024
65de2a5
Refactor: using Bundle string in response
GPortas Feb 8, 2024
153d7d3
Changed: FilesIT testGetFileInfo restructure for upcoming new tests
GPortas Feb 8, 2024
2fb5247
Changed: UtilIT getFileData to support new datasetVersionId optional …
GPortas Feb 8, 2024
e98ea11
Changed: do not overwrite findDataFileOrDie or GetDataFileCommand wit…
GPortas Feb 8, 2024
fc8aac3
Fixed: GetLatestAccessibleFileMetadataCommand
GPortas Feb 14, 2024
153b9ae
Added: FilesIT testGetFileInfo test cases
GPortas Feb 14, 2024
8230058
Merge branch 'develop' of github.com:IQSS/dataverse into 10280-get-fi…
GPortas Feb 14, 2024
2a30f32
Stash: includeDeaccessioned support on get file info endpoint wip
GPortas Feb 14, 2024
b5aeb25
Stash: includeDeaccessioned support on get file info endpoint wip (2)
GPortas Feb 14, 2024
48e71ec
Refactor: DataFile getLatestFileMetadata and getLatestPublishedFileMe…
GPortas Feb 14, 2024
227fe53
Refactor: using DatasetVersion.compareByVersion in getTheNewerFileMet…
GPortas Feb 14, 2024
d7f8040
Merge branch 'develop' of github.com:IQSS/dataverse into 10280-get-fi…
GPortas Feb 15, 2024
95ce492
Changed: includeDeaccessioned optional param in getLatestPublishedFil…
GPortas Feb 16, 2024
61fa571
Fixed: includeDeaccessioned wrong behavior in getFileInfo
GPortas Feb 19, 2024
d0b7454
Added: IT testGetFileInfo cases
GPortas Feb 19, 2024
25cdc00
Merge branch 'develop' of github.com:IQSS/dataverse into 10280-get-fi…
GPortas Feb 19, 2024
a267adc
Removed: commented code in json printer for DatasetVersion
GPortas Feb 19, 2024
ff2e86c
Added: returnDatasetVersion optional parameter to getFileInfo API end…
GPortas Feb 19, 2024
d4eedc2
Added: extended docs for Get JSON Representation of a File
GPortas Feb 19, 2024
ab60747
Added: docs for Get JSON Representation of a File given a Dataset Ver…
GPortas Feb 19, 2024
e5dbfa1
Added: release notes for #10280
GPortas Feb 19, 2024
ffd69e5
Added: #10280 release note tweak
GPortas Feb 20, 2024
c53bba3
Removed: temporal comment
GPortas Feb 22, 2024
2879949
Removed: duplicated check in GetLatestPublishedFileMetadataCommand
GPortas Feb 22, 2024
7a430bf
Changed: docs to point out that files may not have PIDs
GPortas Feb 23, 2024
4703189
Refactor: new dataFile.getDraftFileMetadata() method to avoid extra c…
GPortas Feb 23, 2024
f49a48a
Refactor: new AbstractGetPublishedFileMetadataCommand.getLatestPublis…
GPortas Feb 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions doc/release-notes/10280-get-file-api-extension.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
The API endpoint `api/files/{id}` has been extended to support the following optional query parameters:

- `includeDeaccessioned`: Indicates whether or not to consider deaccessioned dataset versions in the latest file search. (Default: `false`).
- `returnDatasetVersion`: Indicates whether or not to include the dataset version of the file in the response. (Default: `false`).

A new endpoint `api/files/{id}/versions/{datasetVersionId}` has been created. This endpoint returns the file metadata present in the requested dataset version. To specify the dataset version, you can use ``:latest-published``, or ``:latest``, or ``:draft`` or ``1.0`` or any other available version identifier.

The endpoint supports the `includeDeaccessioned` and `returnDatasetVersion` optional query parameters, as does the `api/files/{id}` endpoint.

`api/files/{id}/draft` endpoint is no longer available in favor of the new endpoint `api/files/{id}/versions/{datasetVersionId}`, which can use the version identifier ``:draft`` (`api/files/{id}/versions/:draft`) to obtain the same result.
102 changes: 102 additions & 0 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2724,6 +2724,8 @@ Get JSON Representation of a File

.. note:: Files can be accessed using persistent identifiers. This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``.

This endpoint returns the file metadata present in the latest dataset version.

Example: Getting the file whose DOI is *10.5072/FK2/J8SJZB*:

.. code-block:: bash
Expand Down Expand Up @@ -2790,6 +2792,106 @@ The fully expanded example above (without environment variables) looks like this

The file id can be extracted from the response retrieved from the API which uses the persistent identifier (``/api/datasets/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER``).

By default, files from deaccessioned dataset versions are not included in the search. If no accessible dataset draft version exists, the search of the latest published file will ignore dataset deaccessioned versions unless ``includeDeaccessioned`` query parameter is set to ``true``.

Usage example:

.. code-block:: bash

export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER&includeDeaccessioned=true"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB&includeDeaccessioned=true"

If you want to include the dataset version of the file in the response, there is an optional parameter for this called ``returnDatasetVersion`` whose default value is ``false``.

Usage example:

.. code-block:: bash

export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/?persistentId=$PERSISTENT_IDENTIFIER&returnDatasetVersion=true"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/?persistentId=doi:10.5072/FK2/J8SJZB&returnDatasetVersion=true"

Get JSON Representation of a File given a Dataset Version
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. note:: Files can be accessed using persistent identifiers. This is done by passing the constant ``:persistentId`` where the numeric id of the file is expected, and then passing the actual persistent id as a query parameter with the name ``persistentId``.
qqmyers marked this conversation as resolved.
Show resolved Hide resolved

This endpoint returns the file metadata present in the requested dataset version. To specify the dataset version, you can use ``:latest-published``, or ``:latest``, or ``:draft`` or ``1.0`` or any other style listed under :ref:`dataset-version-specifiers`.

Example: Getting the file whose DOI is *10.5072/FK2/J8SJZB* present in the published dataset version ``1.0``:

.. code-block:: bash

export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
export DATASET_VERSION=1.0
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/versions/$DATASET_VERSION?persistentId=$PERSISTENT_IDENTIFIER"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/versions/1.0?persistentId=doi:10.5072/FK2/J8SJZB"

You may obtain a not found error depending on whether or not the specified version exists or you have permission to view it.

By default, files from deaccessioned dataset versions are not included in the search unless ``includeDeaccessioned`` query parameter is set to ``true``.

Usage example:

.. code-block:: bash

export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
export DATASET_VERSION=:latest-published
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/versions/$DATASET_VERSION?persistentId=$PERSISTENT_IDENTIFIER&includeDeaccessioned=true"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/versions/:latest-published?persistentId=doi:10.5072/FK2/J8SJZB&includeDeaccessioned=true"

If you want to include the dataset version of the file in the response, there is an optional parameter for this called ``returnDatasetVersion`` whose default value is ``false``.

Usage example:

.. code-block:: bash

export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/J8SJZB
export DATASET_VERSION=:draft
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

curl -H "X-Dataverse-key:$API_TOKEN" "$SERVER_URL/api/files/:persistentId/versions/$DATASET_VERSION?persistentId=$PERSISTENT_IDENTIFIER&returnDatasetVersion=true"

The fully expanded example above (without environment variables) looks like this:

.. code-block:: bash

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" "https://demo.dataverse.org/api/files/:persistentId/versions/:draft?persistentId=doi:10.5072/FK2/J8SJZB&returnDatasetVersion=true"

Adding Files
~~~~~~~~~~~~

Expand Down
66 changes: 29 additions & 37 deletions src/main/java/edu/harvard/iq/dataverse/DataFile.java
Original file line number Diff line number Diff line change
Expand Up @@ -549,57 +549,49 @@ public void setDescription(String description) {
public FileMetadata getFileMetadata() {
return getLatestFileMetadata();
}

public FileMetadata getLatestFileMetadata() {
FileMetadata fmd = null;
FileMetadata resultFileMetadata = null;

// for newly added or harvested, just return the one fmd
if (fileMetadatas.size() == 1) {
return fileMetadatas.get(0);
}

for (FileMetadata fileMetadata : fileMetadatas) {
// if it finds a draft, return it
if (fileMetadata.getDatasetVersion().getVersionState().equals(VersionState.DRAFT)) {
return fileMetadata;
}

// otherwise return the one with the latest version number
// duplicate logic in getLatestPublishedFileMetadata()
if (fmd == null || fileMetadata.getDatasetVersion().getVersionNumber().compareTo( fmd.getDatasetVersion().getVersionNumber() ) > 0 ) {
fmd = fileMetadata;
} else if ((fileMetadata.getDatasetVersion().getVersionNumber().compareTo( fmd.getDatasetVersion().getVersionNumber())==0 )&&
( fileMetadata.getDatasetVersion().getMinorVersionNumber().compareTo( fmd.getDatasetVersion().getMinorVersionNumber()) > 0 ) ) {
fmd = fileMetadata;
}
resultFileMetadata = getTheNewerFileMetadata(resultFileMetadata, fileMetadata);
}
return fmd;

return resultFileMetadata;
}

// //Returns null if no published version

public FileMetadata getLatestPublishedFileMetadata() throws UnsupportedOperationException {
FileMetadata fmd = null;

for (FileMetadata fileMetadata : fileMetadatas) {
// if it finds a draft, skip
if (fileMetadata.getDatasetVersion().getVersionState().equals(VersionState.DRAFT)) {
continue;
}

// otherwise return the one with the latest version number
// duplicate logic in getLatestFileMetadata()
if (fmd == null || fileMetadata.getDatasetVersion().getVersionNumber().compareTo( fmd.getDatasetVersion().getVersionNumber() ) > 0 ) {
fmd = fileMetadata;
} else if ((fileMetadata.getDatasetVersion().getVersionNumber().compareTo( fmd.getDatasetVersion().getVersionNumber())==0 )&&
( fileMetadata.getDatasetVersion().getMinorVersionNumber().compareTo( fmd.getDatasetVersion().getMinorVersionNumber()) > 0 ) ) {
fmd = fileMetadata;
}
}
if(fmd == null) {
FileMetadata resultFileMetadata = fileMetadatas.stream()
.filter(metadata -> !metadata.getDatasetVersion().getVersionState().equals(VersionState.DRAFT))
.reduce(null, DataFile::getTheNewerFileMetadata);

if (resultFileMetadata == null) {
throw new UnsupportedOperationException("No published metadata version for DataFile " + this.getId());
}

return fmd;
return resultFileMetadata;
}

public static FileMetadata getTheNewerFileMetadata(FileMetadata current, FileMetadata candidate) {
if (current == null) {
return candidate;
}

DatasetVersion currentVersion = current.getDatasetVersion();
DatasetVersion candidateVersion = candidate.getDatasetVersion();

if (DatasetVersion.compareByVersion.compare(candidateVersion, currentVersion) > 0) {
return candidate;
}

return current;
}

/**
Expand All @@ -610,7 +602,7 @@ public long getFilesize() {
if (this.filesize == null) {
// -1 means "unknown"
return -1;
}
}
return this.filesize;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -423,7 +423,6 @@ public Command<DatasetVersion> handleLatestPublished() {
}

protected DataFile findDataFileOrDie(String id) throws WrappedResponse {

DataFile datafile;
if (id.equals(PERSISTENT_ID_KEY)) {
String persistentId = getRequestParameter(PERSISTENT_ID_KEY.substring(1));
Expand Down
Loading
Loading