Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easier way to get dataset title #2110

Closed
astrofrog opened this issue Apr 29, 2015 · 12 comments
Closed

Easier way to get dataset title #2110

astrofrog opened this issue Apr 29, 2015 · 12 comments

Comments

@astrofrog
Copy link

At the moment, there is no way to easily get the title of a dataset from the JSON response:

{'authority': '10.5072/FK2',
 'id': 177,
 'identifier': 'JIGZKZ',
 'latestVersion': {'createTime': '2015-04-28T15:44:09Z',
                   'files': [{'datafile': {'contentType': 'application/fits',
                                           'description': 'This is a '
                                                          'FITS file '
                                                          'with 1 '
                                                          '(primary) '
                                                          'HDU.\n'
                                                          'The following '
                                                          'recognized '
                                                          'metadata keys '
                                                          'have been '
                                                          'found in the '
                                                          'FITS file:\n'
                                                          'INSTRUME; '
                                                          'NAXIS0; '
                                                          'NAXIS1; '
                                                          'TELESCOP; '
                                                          'NAXIS; \n',
                                           'filename': '%2Fusr%2Flocal%2Fglassfish4%2Fglassfish%2Fdomains%2Fdomain1%2Ffiles%2F10.5072%2FFK2%2FJIGZKZ%2F14d00b49b54-9b3f57881c5d',
                                           'id': 178,
                                           'md5': 'd7a4341002ff45c218ebefd073fe3438',
                                           'name': 'MSX_E.fits',
                                           'originalFormatLabel': 'UNKNOWN'},
                              'datasetVersionId': 56,
                              'description': 'This is a FITS file with 1 '
                                             '(primary) HDU.\n'
                                             'The following recognized '
                                             'metadata keys have been '
                                             'found in the FITS file:\n'
                                             'INSTRUME; NAXIS0; NAXIS1; '
                                             'TELESCOP; NAXIS; \n',
                              'label': 'MSX_E.fits',
                              'version': 1}],
                   'id': 56,
                   'lastUpdateTime': '2015-04-28T15:46:39Z',
                   'metadataBlocks': {'astrophysics': {'displayName': 'Astronomy '
                                                                      'and '
                                                                      'Astrophysics '
                                                                      'Metadata',
                                                       'fields': [{'multiple': True,
                                                                   'typeClass': 'controlledVocabulary',
                                                                   'typeName': 'astroType',
                                                                   'value': ['Image']},
                                                                  {'multiple': True,
                                                                   'typeClass': 'primitive',
                                                                   'typeName': 'astroFacility',
                                                                   'value': ['MSX']},
                                                                  {'multiple': True,
                                                                   'typeClass': 'primitive',
                                                                   'typeName': 'astroInstrument',
                                                                   'value': ['SPIRITIII']}]},
                                      'citation': {'displayName': 'Citation '
                                                                  'Metadata',
                                                   'fields': [{'multiple': False,
                                                               'typeClass': 'primitive',
                                                               'typeName': 'title',
                                                               'value': 'MSX '
                                                                        'Band '
                                                                        'E '
                                                                        'image '
                                                                        'of '
                                                                        'the '
                                                                        'Galactic '
                                                                        'Center'},
                                                              {'multiple': True,
                                                               'typeClass': 'compound',
                                                               'typeName': 'author',
                                                               'value': [{'authorName': {'multiple': False,
                                                                                         'typeClass': 'primitive',
                                                                                         'typeName': 'authorName',
                                                                                         'value': 'Onymous, '
                                                                                                  'A. '
                                                                                                  'N.'}}]},
                                                              {'multiple': True,
                                                               'typeClass': 'compound',
                                                               'typeName': 'datasetContact',
                                                               'value': [{'datasetContactEmail': {'multiple': False,
                                                                                                  'typeClass': 'primitive',
                                                                                                  'typeName': 'datasetContactEmail',
                                                                                                  'value': 'trobitaille@cfa.harvard.edu'},
                                                                          'datasetContactName': {'multiple': False,
                                                                                                 'typeClass': 'primitive',
                                                                                                 'typeName': 'datasetContactName',
                                                                                                 'value': 'Robitaille, '
                                                                                                          'Thomas'}}]},
                                                              {'multiple': True,
                                                               'typeClass': 'compound',
                                                               'typeName': 'dsDescription',
                                                               'value': [{'dsDescriptionValue': {'multiple': False,
                                                                                                 'typeClass': 'primitive',
                                                                                                 'typeName': 'dsDescriptionValue',
                                                                                                 'value': 'Test '
                                                                                                          'MSX '
                                                                                                          'dataset'}}]},
                                                              {'multiple': True,
                                                               'typeClass': 'controlledVocabulary',
                                                               'typeName': 'subject',
                                                               'value': ['Astronomy '
                                                                         'and '
                                                                         'Astrophysics']},
                                                              {'multiple': False,
                                                               'typeClass': 'primitive',
                                                               'typeName': 'depositor',
                                                               'value': 'Robitaille, '
                                                                        'Thomas'},
                                                              {'multiple': False,
                                                               'typeClass': 'primitive',
                                                               'typeName': 'dateOfDeposit',
                                                               'value': '2015-04-28'}]}},
                   'productionDate': 'Production Date',
                   'versionState': 'DRAFT'},
 'persistentUrl': 'http://dx.doi.org/10.5072/FK2/JIGZKZ',
 'protocol': 'doi'}

I basically have to do something like:

    info =response['latestVersion']
    for field in info['metadataBlocks']['citation']['fields']:
        if field['typeName'] == 'title':
            title = field['value']
            break
    else:
        title = None

Would it not be possible to add the main title of the dataset at the root level of the response?

@pdurbin
Copy link
Member

pdurbin commented Apr 29, 2015

This reminds me of #761.

@scolapasta scolapasta added this to the In Review milestone May 8, 2015
@pdurbin
Copy link
Member

pdurbin commented Jul 17, 2015

I'm still annoyed by this but at least I found a good way to do it with jq:

$ curl -s http://localhost:8080/api/datasets/2672184?key=$API_TOKEN | jq '.data.latestVersion.metadataBlocks.citation.fields[] | select(.typeName=="title").value' "Darwin's Finches"

Here's the JSON for the dataset: https://github.com/IQSS/dataverse/blob/306bf7ff9fb22c7fd94a4c412198deec22eb1660/scripts/search/tests/data/dataset-finch1.json

@raprasad
Copy link
Contributor

Painful to watch... Let's switch the metadata schema from relational to nosql soon--even if just putting JSON into a postgres field--e.g. the main operations is reading it for the dataset page.

@scolapasta scolapasta removed this from the Not Assigned to a Release milestone Jan 28, 2016
@pdurbin pdurbin added the Component: Code Infrastructure formerly "Feature: Code Infrastructure" label Feb 1, 2016
@mheppler mheppler added Feature: Metadata and removed Component: Code Infrastructure formerly "Feature: Code Infrastructure" labels Feb 1, 2016
@michbarsinai
Copy link
Member

Note that, strictly speaking, there is no such thing as a "dataset title". A title is a property of a version of a Dataset. So that's one level of indirection that will alway see there (at least, until the application logic changes).

@pdurbin
Copy link
Member

pdurbin commented Aug 15, 2016

@michbarsinai sure, but once we're within a dataset version I'm just saying I would prefer something like...

jq '.data.latestVersion.title'

... rather than what we have to do now:

jq '.data.latestVersion.metadataBlocks.citation.fields[] | select(.typeName=="title").value'

@raprasad
Copy link
Contributor

raprasad commented Oct 7, 2016

fyi: This will re-impact #3241 -- primarily taking time to write "efficient" code to pull the title out of every dataset's version w/o repeating. e.g. need to write custom SQL similar to metrics and not use the ORM layer

@michbarsinai
Copy link
Member

How about adding a "dataset summary" endpoint with easy representation at api/datasets/<dataset-idtf>/summary? This won't break backwards compatibility, will not appear to be a full representation of a dataset (as in, no one will expect the response body can be POSTed into dataverse later) and will allow easy access to selected fields.

Open questions: what fields will be included, and what the JSON schema will be. I assume it's not that hard to answer.

@pdurbin
Copy link
Member

pdurbin commented Oct 15, 2016

I'm well aware that this issue is about JSON and that people don't like XML very much but I'd like to point out that now that we have export (#907) it isn't too tough to get the title out XML representations of datasets such as Dublin Core:

curl -s 'https://dataverse.harvard.edu/api/datasets/export?exporter=dcterms&persistentId=doi:10.7910/DVN/FAZJE4' | xpath -e '//dcterms:title/text()'

Found 1 nodes in stdin:
-- NODE --
Replication data for: Roads, Railroads and Decentralization of Chinese Cities

It's still quite painful to get the title out of the JSON representation. I'm cool with a summary or whatever, as @michbarsinai suggests. Is there any standard, JSON Schema or otherwise, for representing minimal information about a dataset in JSON? Let's implement a standard if there is one.

@pdurbin
Copy link
Member

pdurbin commented Jun 23, 2017

This still drives me crazy. I'm hoping that we work on #3599 some day and that it helps in this area. For now, I recommend getting the title via the SWORD API, I guess. Rather than JSON it's XML (bleh!) but at least it's easy to get the title.

@raprasad
Copy link
Contributor

@pdurbin : the unofficial JSON puts the title at the top--as well as keeping it within the citation metadata block:

https://services.dataverse.harvard.edu/miniverse/metrics/v1/datasets/by-persistent-id?persistentId=doi%3A10.7910%2FDVN%2F26935&pretty=true

source: https://services.dataverse.harvard.edu/static/swagger-ui/index.html
section: "dataverse/dataset JSON"

@pdurbin pdurbin added User Role: API User Makes use of APIs and removed zTriaged labels Jun 30, 2017
@pdurbin
Copy link
Member

pdurbin commented Mar 25, 2019

@astrofrog these days Dataverse supports exporting in Schema.org JSON-LD so you can get the title ("name"), like this:

curl -s 'https://dataverse.harvard.edu/api/datasets/export?exporter=schema.org&persistentId=doi:10.7910/DVN/RQSQY8' | jq -r '.name'

(The title is "Block and Block Longitudinal Study, 1969 - 1999" in this example.)

Does this help?

I don't know why I didn't mention this earlier, but you can also get the title/name from the Search API, like this:

curl -s 'https://dataverse.harvard.edu/api/search?q=RQSQY8' | jq '.data.items[0].name'

Please let us know if either option helps.

@djbrooke
Copy link
Contributor

Closing this, as it appears this space has matured a bit and there are now options/workarounds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants