Iqss/9150 handle fundreg reqs for ext cvv #9402

qqmyers · 2023-02-21T23:06:16Z

What this PR does / why we need it: This PR makes changes/extensions to the external vocabulary support mechanism to address requirements from the Crossref Funder registry use case. These include:

- allowing a script to only manage the one term-uri-field child
- when defining an expanded value to be included in the json and ore exports
- in the metadata display
- in the search response highlights
- supporting retrieval filtering params to specify a path including all array elements rather than a single one

Which issue(s) this PR closes:

Special notes for your reviewer:

Suggestions on how to test this:
Use the new fundreg and/or ror example external vocab configs and related Javascripts at https://github.com/gdcc/dataverse-external-vocab-support/tree/fundregror and verify that you can add CrossRef fundreg entries in the citation block Funder (Grant Agency) metadata field or an ROR as the author affiliation. For regression testing, one could run the older cvoc example metadatablock and the orcid/skosmos scripts there and verify they still work (skosmos requires that "skosmos.dev.finto.fi" be replaced by "demo.skosmos.org" everywhere in the config).

Fundreg/ROR should work as with other external vocabs in that they should show correctly in the metadata display, search results, advanced search. Fundreg should also provide an expandedValue entry in the json metadata export that includes the name of the funding org (along with the URI as the "value").

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

This allows creating an expandedValue field for a child cvoc field instead of just a primitive one.

Made method recursive. The new =* syntax allows navigating to all children in an array, which is required for handling the crossref funder registry output to extract labels in other languages

coveralls · 2023-02-21T23:08:59Z

Coverage: 20.112% (-0.005%) from 20.116% when pulling 16ee26a on GlobalDataverseCommunityConsortium:IQSS/9150-handle_fundreg_reqs_for_ext_cvv into 1a79717 on IQSS:develop.

…dreg_reqs_for_ext_cvv

mreekie · 2023-03-15T17:54:14Z

grooming:

Looks like this will bleed over to March 15 sprint.
@qqmyers Thanks for the sizing estimate.

…dreg_reqs_for_ext_cvv

mreekie · 2023-03-27T18:36:54Z

Prio review:

After the technical review of the code
Needs to be installed on Demo.
At that point Julian and possibly others will review the User Experience.
The intent is that feedback that comes out of the UEX review will spin out additional issues as needed.

@kcondon @jggautier collaborate on the details of how/where the UI testing gets done.

luddaniel · 2023-04-05T15:36:45Z

Closes #9498

kcondon · 2023-04-13T20:26:13Z

@qqmyers @jggautier Is there a simple, brief description of what this does from a user's perspective or do I need to do a deep dive into those "closes" issues? This seems like a tip of the iceberg kind of pr.

qqmyers · 2023-04-13T20:46:39Z

@kcondon - I added some details on testing in the description. Basically this PR does nothing on its own, but it allows you to run the new Fundreg and ROR external vocab config/javascripts that are in a branch/PR in the gdcc external vocab repo. Nominally the setup for that is the same as for the older ORCID/Skosmos ones except that the example configs for fundreg/ror apply them to fields in the citation block rather than a new example block.

kcondon · 2023-04-18T22:36:16Z

Issues found so far:

Still having trouble configuring local fundreq instance due to url ref in json. Note: got this to work by manually editing json and script files for paths. Can this be made easier? (Jim put the path to two scripts needed for fundreq and ror and cvocutil in config file so easier to adjust in one place. So, call it fixed).
On temp test instance, adding a new funreq field, hitting +, after entering data in first fundreq agency field, clears any fundreq agency fields. (mostly fixed, still see it at times if click + on a second entry)
On temp test instance, after making a selection the fundreq agency field sometimes appears cropped or the text selected appears outside/below the selection box. Maybe due to page sizing? (Mostly fixed, due to long strings but if narrow the screen and click + the refresh draws the cropped string)

4. The controlled vocab search is sometimes sluggish. Performance varies, have seen 10s+ delays, some times longer.

Minor point, when you select the funder agency field the focus is not automatically placed in the text field. So click dropdown, then click into field. (Jim indicates this is an issue with the jquery lib, see his comments below).
Display of fundreq agency shows human readable when short, machine readable when long. (Jim explained this is a delay in the uri to display string resolution by fundreg service. Strange that some values resolve, others take longer)

Same as J3, Basic search does not find fundreq agency display values but adv search does due to using same select box as for entry. Is this due to separating display (human readable) from recorded value (machine readable) and recorded value is indexed? (Jim indicates this is a limitation of current implementation and would need a separate issue)
OpenAire export fails to generate output but no error in logs.
Some export formats contain only machine readable uri for funder agency (schema.org, ddi) while others show both (OAI_ORE, dataverse_json) (Jim indicates this is because some formats are by design comprehensive but others, schema, ddi, would need some decision to add full info/strings and a new issue)
DDI HTML Codebook export does not show any funder agency fields at all, whereas ddi export does.

I have more testing to do but this is what I've found. What I'm planning to finish, in addition to how to test items, is a. data input, b. data display, c. search, d. export

Paraphrased from Julian's testing:

J1. I ran into the display bug mentioned earlier.
J2. The relevance of search results needs improving for fundereg. This is for data input, the agency field funder lookup.
J3. When a depositor adds a funder name from the suggested list (from the FundReg API), that funder name doesn't seem searchable from the simple search field.
J4. I saw that the search query is using the funder PID, e.g. grantNumberAgency:"http://dx.doi.org/10.13039/100000865". I'm not sure how well this helps people find data of research funded by a particular funder.
J5. When the "Funding Information Agency" facet is added to the Dataverse collection page, there were tooltips for the funder names. I'd like to learn about the goal of these tooltips.
J6. I have thoughts on how funder metadata might be included in a particular metadata export and they are not reflected in the current export

[Julian] I'd like to make sure that it's clear that points J2-4, and maybe J5, are questions I have that might be answered with the research I've been planning. And I was thinking of point 6 as being a matter of scope and timing.
So for point J2 for example, we could learn if the relevance of suggested funder names needs improving.

qqmyers · 2023-04-19T16:58:26Z

There are a few possibilities. The issue is because I refactored so that the main scripts now use a common utility script and the main scripts handle loading it, so that it is only loaded when needed. Options include:

Document where the scripts have to be placed
Hardcode so that the main scripts use the util script from the gdcc github.io page (if you local install, you're not getting everything local
Add the util script to always be added by Dataverse
Use the setting created by EryK to add the util script manually (may load when not needed, but not all the time)
Adjust the config mechanism to allow multiple scripts to be required (probably best, not a one-line update)
I'll plan on implementing the last one.

Working on it - looks like it is on the script side rather than the Dataverse PR
I've made a change (in the Dataverse PR) to help with this.
As far as I can tell, this is just from delays at those services. We could tweak the minimum characters to search for (currently 3) or the delay to wait for new chars (so we don't send repeating requests to the service for your 3,4,5,6, ... char strings) - currently 500ms to see if that helps. Both are in the scripts themselves rather that the Dataverse PR.

qqmyers · 2023-04-19T19:23:47Z

As far as I can tell, this is a known issue for select2 which will be fixed when jquery 3.7 comes out - see the end of Search not auto focusing in jQuery 3.6.0 select2/select2#5993 (comment). There is a work-around we could try, but given that it's minor, I'd suggest waiting.
This is due to the poor performance of the Fundreg service. They do threaten to throttle so perhaps many requests from testing is causing the slow down. FWIW, I've done what they suggest to get on their faster server (basically providing contact info in the requests themselves) and, for the latest ~100 values you use, I cache the results so things should be faster after the first access as we don't go back to Fundreg for them. It may be worth talking with them (as GREI?) and/or looking into whether we can run a local service (I forget, Fundreg or ROR suggested that, but it may be possible for both).
This is a current limitation of the external vocab mechanism that probably could be addressed as a new issue. For now, advanced search, and use in facets should be possible. Facets might be the best way to quickly find things funded by one agency. For simple search, there could still be problems with the stemming and other settings we have on the underlying field, i.e. "Institutes" would match "Institute" as well.
I wasn't able to reproduce this on my test server - the oai_datacite was produced when I had one ror affiliation for author and two or three fundreg entries, using controlled and free-text entries.
This is ~by design in that the ORE and json outputs are intended to be complete and the other exports then have to be updated if/when someone determines whether those formats could/should display something different (the human readable form, the i18n form, both with the identifier as an attribute, etc.). The raw info should be available to all the exporters through the json they receive so further changes can be done in new issues/by someone familiar with a given exporter.
I haven't looked into this but I assume not seeing something in the DDI HTML is a bug (since it is supposed to mirror the DDI XML?) but perhaps just a missing enhancement if that was added to the DDI XML recently.

J2) Unlike ORCID, Fundreg doesn't provide much to help with prioritizing (that I've found anyway). I have added support to search not only by name but by the acronym and tags that they provide. These help the query 'NIH' bring the National Institutes of Health (which doesn't contain NIH) into the search results. I also raise the priority of anything you've chosen recently (last ~100 choices), so NIH should be first once you've used it once. There's code that prioritizes by prior use, then acronym, then tags (various parts of NIH add that tag even though they are one institute within NIH. The relative priorities there could be changed, or perhaps weights could be added. A more useful thing that only Crossref/a service could do would be to prioritize results based on popularity (either hits or the number of existing resources that cite that funder, etc.).

J3 - see 7

J4 - this is in the URL? If so, yes, the advanced search and facets use the id when searching rather than the name (or international variants) for the back end, but all the places in the UI should be showing the name (again, from 6 these may show briefly as the id until the lookup can complete (against the server the first time or more quickly from browser memory after that).

J5 -the popup shows the alternate names and acronyms that Crossref provides for the given id. Thus while the facet shows National Institutes of Health, which is the official name, you may have searched for NIH to add it and the popup will show that value. Whether there's a popup and what it shows (other names or just acronym) is controlled by the javascript in the external vocab repo and changes could be made there. There may be some limitations if we want to just exclude some values, i.e. I'm not sure that alternate names are distinguished from i18n values when Fundreg returns them.

J6 - see 9. I'd definitely suggest changes to other formats should be separate issues.

jggautier · 2023-04-20T15:03:16Z

Thanks @qqmyers for being so thorough (as always!).

The rationale for the popup is interesting. I haven't seen anything like it before. Is this being used in other repositories right now? I took a quick look at the facets on QDR's "Root" collection but didn't see anything (although I wouldn't think those facets are being populated by terms from an external vocab).

qqmyers · 2023-04-20T16:07:24Z

The external vocab mechanism tells the script whether an item needs to be displayed, or an input/selection UI is needed. What the script does for each of those is up to the script, but, on the display side, there isn't currently any distinction for display in different parts of the UI, i.e. in the metadata pane, a facet, as a search hit, etc. A couple of the example scripts do use a popup - I think ORCID is the other one - it will show the email of the person (if their email is public at ORCID). For internal text/CVV fields, there is no equivalent mechanism to customize the display (just the basic formatting in the metadata block). If changes are desired, if the change applies to all places a term is displayed, the script can be updated. If a general mechanism is needed to make facet display different from display elsewhere, it would be an extension of the mechanism itself and would involve changes to Dataverse and scripts and the config mechanism, schema, etc.

kcondon · 2023-04-20T16:45:38Z

Stefano has decided that we should release this due to grant commitments and address/fix fundreq service performance with them and NIH after release.

Have finished testing, will consolidate the above open issues and new questions/concerns here.
New:

With fundreg, ror, skosmos ennabled, on publish log file is very noisy:
cvoc_noisy_log_on_pub.txt
With ror enabled, server log complains about default dataverse.org as being invalid ror value:
default_ror_err.txt
Ror affiliation export varies by format, some have none (DC), uri only (DDI,Datacite,DDI_HTML, OpenAire, schema.org), some display string only (OAI_ORE), some both (json). Is this by design?
Fundreg export for a couple formats (json, oai_ore) has a lot of extra content entries, all similar for funder, Center for Alternatives to Animal Testing, Johns Hopkins Bloomberg School of Public Health
json_jhopkins_xtra_content.txt
Likely an existing bug but Affiliation facet shows many empty string facets:

Existing, from above:
E2, E3. Some UI flakiness (cropping of values, apparently disappearing or masked values when adding new record) affected by narrow window scaling of page and possibly server performance.
E7. Basic search does not support searching on display strings for funding agency or ror.
E10. DDI_HTML export does not include funding agency info but ddi export does.

qqmyers · 2023-04-20T21:53:56Z

Most of these were real, which is good news. Turns out simple search was enabled and there were parsing errors. I was able to fix those and now simple search works. I also turned a couple remain logger.info()s into fine()s.
The code was not ignoring free-text entries and would ask the servers about those - now fixed.
I added the @id to the ORE export - it should be complete like the json export. For other formats, I'd suggest separate issues for any updates.
The Dataverse code stores whatever the config tells it to pull from the server's response. In the case of fundreg, I believe it puts multiple English variants in the same structure with other language variants. We currently have the capability to store none of that or all of it, but can't filter to just get other language terms (assuming we don't actually want the English name variations to be sent.) I'd suggest a separate issue(s) to figure out what's desired here given what fundreg returns and the variability between their entries and then implement whatever change is desired. (Any given installation can remove what's there now by editing the example configuration in the repo.)
This was a cut/paste type in the ror script. When the value was a string instead of an id, the script added a blank between the () chars instead of the string. Now fixed in the ror.js script.
E2, E3) The blank values are probably due to the fundreg server being slow - the script currently displays nothing until the call to fundreg returns. It is probably possible to add in a spinner or temporary value by editing the script. I'd suggest a separate issue here just because that would all be in the javascript in the external vocab repo, versus changes to Dataverse. For the scaling, it is a ~known issue with the select2 widget that it doesn't resize when the page is dynamically resized. Refreshing on a resized page will update the widget. I believe when I looked before this was something that was reported to the select2 devs but I'm not sure. In any case, I think it would be hard for us to fix (could be wrong) unless there's an update. An alternative would be to move to use an alternate widget in any/all scripts. The original community ror script, for example, just put an icon on the page and if you clicked it would popup a separate dialog to let you find the org you wanted. That, or other design could be done by changing the script - again an issue for the gdcc ext-vocab repo. I did also notice a static issue with narrow pages - as reported in other images, the text can appear below the field input. I verified on my machine that with the current css that only happens when the screen is narrow enough that other non-external-cvoc fields are also messed up (I think it was an internal CVV field where the drop-down control slid under the adjacent child field.) In any case, the static issue could be fixed by providing different css values depending on the screen width as is done in other places. That would be a Dataverse issue rather than one for the gdcc ext-vocab repo.
E7 - as noted above, simple search is now working.
E10 - I assume this is true regardless of whether the field is associated with an external-vocab? If so, it should probably be a separate issue (like the existing issue about removing the funder contributor type which isn't external vocab related).

kcondon · 2023-04-24T14:46:24Z

Latest testing shows all major issues resolved. It was decided to merge and perform any ux later after spa.

qqmyers added 3 commits February 21, 2023 15:54

Switch to sending cvoc map by term-uri-field

a6604ca

This allows creating an expandedValue field for a child cvoc field instead of just a primitive one.

Support searching through all objs in array

6bf71a2

Made method recursive. The new =* syntax allows navigating to all children in an array, which is required for handling the crossref funder registry output to extract labels in other languages

add cvoc on child field in ORE map

d699228

pdurbin mentioned this pull request Mar 1, 2023

Epic: Get the existing ROR plug-in working on the dataverse demo #9151

Closed

3 tasks

qqmyers added 6 commits March 3, 2023 15:55

Merge remote-tracking branch 'IQSS/develop' into IQSS/9150-handle_fun…

13fa533

…dreg_reqs_for_ext_cvv

change settingsWrapper to provide either map - by parent or term uri fld

88ce01a

use by term uri field map

0f039f7

handle two compound cases

576565e

use c:set

957d657

adjust for new plan to not always use parent field

93c1eb8

qqmyers mentioned this pull request Mar 13, 2023

Fundreg and ROR support gdcc/dataverse-external-vocab-support#14

Merged

qqmyers marked this pull request as ready for review March 13, 2023 19:12

qqmyers mentioned this pull request Mar 13, 2023

Create a javascript for the frontend that supports Fundref #9150

Closed

2 tasks

qqmyers added the Size: 10 A percentage of a sprint. 7 hours. label Mar 14, 2023

style fix on advanced search page

16ee26a

Merge remote-tracking branch 'IQSS/develop' into IQSS/9150-handle_fun…

77a01ec

…dreg_reqs_for_ext_cvv

qqmyers mentioned this pull request Apr 5, 2023

#9498 - fixing issue while searching for TermUri field that is a compound field #9499

Closed

sekmiller self-assigned this Apr 10, 2023

sekmiller approved these changes Apr 12, 2023

View reviewed changes

pdurbin unassigned sekmiller Apr 12, 2023

kcondon self-assigned this Apr 14, 2023

improve display of long values for child input fields

7829f79

qqmyers added 2 commits April 19, 2023 13:06

add release notes

ba2dabd

allow js-url to be an array

8241739

qqmyers added 4 commits April 20, 2023 16:15

fix string return for simple search

d43ebb1

add @id for cvoc fields, quiet logging

233e16a

quiet log

391404f

ignore free text entries when registering cvoc term

bd7cdb4

kcondon merged commit 6dc9a5f into IQSS:develop Apr 24, 2023

pdurbin added this to the 5.14 milestone May 10, 2023

pdurbin mentioned this pull request Jul 26, 2023

Feature Request/Idea: Investigate status of Crossref Funder Registry, review alternatives for retrieving and recording funder metadata #9720

Closed

kcondon mentioned this pull request Oct 17, 2023

QA for Fundreg and ROR support #9973

Closed

luddaniel mentioned this pull request Nov 28, 2023

Feature Request/Idea:Implement mechanism to support integration with OntoPortal #9276

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iqss/9150 handle fundreg reqs for ext cvv #9402

Iqss/9150 handle fundreg reqs for ext cvv #9402

qqmyers commented Feb 21, 2023 •

edited by pdurbin

Loading

coveralls commented Feb 21, 2023 •

edited

Loading

mreekie commented Mar 15, 2023

mreekie commented Mar 27, 2023

luddaniel commented Apr 5, 2023 •

edited

Loading

kcondon commented Apr 13, 2023

qqmyers commented Apr 13, 2023

kcondon commented Apr 18, 2023 •

edited

Loading

qqmyers commented Apr 19, 2023

qqmyers commented Apr 19, 2023

jggautier commented Apr 20, 2023

qqmyers commented Apr 20, 2023

kcondon commented Apr 20, 2023

qqmyers commented Apr 20, 2023

kcondon commented Apr 24, 2023

Iqss/9150 handle fundreg reqs for ext cvv #9402

Iqss/9150 handle fundreg reqs for ext cvv #9402

Conversation

qqmyers commented Feb 21, 2023 • edited by pdurbin Loading

coveralls commented Feb 21, 2023 • edited Loading

mreekie commented Mar 15, 2023

mreekie commented Mar 27, 2023

luddaniel commented Apr 5, 2023 • edited Loading

kcondon commented Apr 13, 2023

qqmyers commented Apr 13, 2023

kcondon commented Apr 18, 2023 • edited Loading

qqmyers commented Apr 19, 2023

qqmyers commented Apr 19, 2023

jggautier commented Apr 20, 2023

qqmyers commented Apr 20, 2023

kcondon commented Apr 20, 2023

qqmyers commented Apr 20, 2023

kcondon commented Apr 24, 2023

qqmyers commented Feb 21, 2023 •

edited by pdurbin

Loading

coveralls commented Feb 21, 2023 •

edited

Loading

luddaniel commented Apr 5, 2023 •

edited

Loading

kcondon commented Apr 18, 2023 •

edited

Loading