Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iqss/9150 handle fundreg reqs for ext cvv #9402

Conversation

qqmyers
Copy link
Member

@qqmyers qqmyers commented Feb 21, 2023

What this PR does / why we need it: This PR makes changes/extensions to the external vocabulary support mechanism to address requirements from the Crossref Funder registry use case. These include:

  • - allowing a script to only manage the one term-uri-field child
  • - when defining an expanded value to be included in the json and ore exports
  • - in the metadata display
  • - in the search response highlights
  • - supporting retrieval filtering params to specify a path including all array elements rather than a single one

Which issue(s) this PR closes:

Special notes for your reviewer:

Suggestions on how to test this:
Use the new fundreg and/or ror example external vocab configs and related Javascripts at https://github.com/gdcc/dataverse-external-vocab-support/tree/fundregror and verify that you can add CrossRef fundreg entries in the citation block Funder (Grant Agency) metadata field or an ROR as the author affiliation. For regression testing, one could run the older cvoc example metadatablock and the orcid/skosmos scripts there and verify they still work (skosmos requires that "skosmos.dev.finto.fi" be replaced by "demo.skosmos.org" everywhere in the config).

Fundreg/ROR should work as with other external vocabs in that they should show correctly in the metadata display, search results, advanced search. Fundreg should also provide an expandedValue entry in the json metadata export that includes the name of the funding org (along with the URI as the "value").

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

This allows creating an expandedValue field for a child cvoc field
instead of just a primitive one.
Made method recursive. The new =* syntax allows navigating to all
children in an array, which is required for handling the crossref funder
registry output to extract labels in other languages
@coveralls
Copy link

coveralls commented Feb 21, 2023

Coverage Status

Coverage: 20.112% (-0.005%) from 20.116% when pulling 16ee26a on GlobalDataverseCommunityConsortium:IQSS/9150-handle_fundreg_reqs_for_ext_cvv into 1a79717 on IQSS:develop.

@qqmyers qqmyers marked this pull request as ready for review March 13, 2023 19:12
@qqmyers qqmyers added the Size: 10 A percentage of a sprint. 7 hours. label Mar 14, 2023
@mreekie
Copy link

mreekie commented Mar 15, 2023

grooming:

  • Looks like this will bleed over to March 15 sprint.
  • @qqmyers Thanks for the sizing estimate.

@mreekie mreekie added NIH OTA: 1.2.1 2 | 1.2.1 | Design and implement integration with controlled vocabularies | 5 prdOwnThis is an it... D: 5 Core PIDs Deliverable Increment defining how we support the 5 core PIDs pm.GREI-d-1.2.1 NIH, yr1, aim2, task1: Design and implement integration with controlled voc labels Mar 15, 2023
@mreekie
Copy link

mreekie commented Mar 27, 2023

Prio review:

  • After the technical review of the code
  • Needs to be installed on Demo.
  • At that point Julian and possibly others will review the User Experience.
  • The intent is that feedback that comes out of the UEX review will spin out additional issues as needed.

@kcondon @jggautier collaborate on the details of how/where the UI testing gets done.

@luddaniel
Copy link
Contributor

luddaniel commented Apr 5, 2023

Closes #9498

@kcondon
Copy link
Contributor

kcondon commented Apr 13, 2023

@qqmyers @jggautier Is there a simple, brief description of what this does from a user's perspective or do I need to do a deep dive into those "closes" issues? This seems like a tip of the iceberg kind of pr.

@qqmyers
Copy link
Member Author

qqmyers commented Apr 13, 2023

@kcondon - I added some details on testing in the description. Basically this PR does nothing on its own, but it allows you to run the new Fundreg and ROR external vocab config/javascripts that are in a branch/PR in the gdcc external vocab repo. Nominally the setup for that is the same as for the older ORCID/Skosmos ones except that the example configs for fundreg/ror apply them to fields in the citation block rather than a new example block.

@kcondon kcondon self-assigned this Apr 14, 2023
@kcondon
Copy link
Contributor

kcondon commented Apr 18, 2023

Issues found so far:

  1. Still having trouble configuring local fundreq instance due to url ref in json. Note: got this to work by manually editing json and script files for paths. Can this be made easier? (Jim put the path to two scripts needed for fundreq and ror and cvocutil in config file so easier to adjust in one place. So, call it fixed).
  2. On temp test instance, adding a new funreq field, hitting +, after entering data in first fundreq agency field, clears any fundreq agency fields. (mostly fixed, still see it at times if click + on a second entry)
  3. On temp test instance, after making a selection the fundreq agency field sometimes appears cropped or the text selected appears outside/below the selection box. Maybe due to page sizing? (Mostly fixed, due to long strings but if narrow the screen and click + the refresh draws the cropped string)

Screen Shot 2023-04-18 at 6 33 36 PM

Screen Shot 2023-04-18 at 6 33 47 PM

4. The controlled vocab search is sometimes sluggish. Performance varies, have seen 10s+ delays, some times longer.
  1. Minor point, when you select the funder agency field the focus is not automatically placed in the text field. So click dropdown, then click into field. (Jim indicates this is an issue with the jquery lib, see his comments below).

  2. Display of fundreq agency shows human readable when short, machine readable when long. (Jim explained this is a delay in the uri to display string resolution by fundreg service. Strange that some values resolve, others take longer)

Screen Shot 2023-04-19 at 2 05 21 PM

  1. Same as J3, Basic search does not find fundreq agency display values but adv search does due to using same select box as for entry. Is this due to separating display (human readable) from recorded value (machine readable) and recorded value is indexed? (Jim indicates this is a limitation of current implementation and would need a separate issue)

  2. OpenAire export fails to generate output but no error in logs.

  3. Some export formats contain only machine readable uri for funder agency (schema.org, ddi) while others show both (OAI_ORE, dataverse_json) (Jim indicates this is because some formats are by design comprehensive but others, schema, ddi, would need some decision to add full info/strings and a new issue)

  4. DDI HTML Codebook export does not show any funder agency fields at all, whereas ddi export does.

I have more testing to do but this is what I've found. What I'm planning to finish, in addition to how to test items, is a. data input, b. data display, c. search, d. export

Paraphrased from Julian's testing:

J1. I ran into the display bug mentioned earlier.
J2. The relevance of search results needs improving for fundereg. This is for data input, the agency field funder lookup.
J3. When a depositor adds a funder name from the suggested list (from the FundReg API), that funder name doesn't seem searchable from the simple search field.
J4. I saw that the search query is using the funder PID, e.g. grantNumberAgency:"http://dx.doi.org/10.13039/100000865". I'm not sure how well this helps people find data of research funded by a particular funder.
J5. When the "Funding Information Agency" facet is added to the Dataverse collection page, there were tooltips for the funder names. I'd like to learn about the goal of these tooltips.
J6. I have thoughts on how funder metadata might be included in a particular metadata export and they are not reflected in the current export

[Julian] I'd like to make sure that it's clear that points J2-4, and maybe J5, are questions I have that might be answered with the research I've been planning. And I was thinking of point 6 as being a matter of scope and timing.
So for point J2 for example, we could learn if the relevance of suggested funder names needs improving.

@qqmyers
Copy link
Member Author

qqmyers commented Apr 19, 2023

  1. There are a few possibilities. The issue is because I refactored so that the main scripts now use a common utility script and the main scripts handle loading it, so that it is only loaded when needed. Options include:
  • Document where the scripts have to be placed
  • Hardcode so that the main scripts use the util script from the gdcc github.io page (if you local install, you're not getting everything local
  • Add the util script to always be added by Dataverse
  • Use the setting created by EryK to add the util script manually (may load when not needed, but not all the time)
  • Adjust the config mechanism to allow multiple scripts to be required (probably best, not a one-line update)
    I'll plan on implementing the last one.
  1. Working on it - looks like it is on the script side rather than the Dataverse PR

  2. I've made a change (in the Dataverse PR) to help with this.

  3. As far as I can tell, this is just from delays at those services. We could tweak the minimum characters to search for (currently 3) or the delay to wait for new chars (so we don't send repeating requests to the service for your 3,4,5,6, ... char strings) - currently 500ms to see if that helps. Both are in the scripts themselves rather that the Dataverse PR.

@qqmyers
Copy link
Member Author

qqmyers commented Apr 19, 2023

  1. As far as I can tell, this is a known issue for select2 which will be fixed when jquery 3.7 comes out - see the end of Search not auto focusing in jQuery 3.6.0 select2/select2#5993 (comment). There is a work-around we could try, but given that it's minor, I'd suggest waiting.

  2. This is due to the poor performance of the Fundreg service. They do threaten to throttle so perhaps many requests from testing is causing the slow down. FWIW, I've done what they suggest to get on their faster server (basically providing contact info in the requests themselves) and, for the latest ~100 values you use, I cache the results so things should be faster after the first access as we don't go back to Fundreg for them. It may be worth talking with them (as GREI?) and/or looking into whether we can run a local service (I forget, Fundreg or ROR suggested that, but it may be possible for both).

  3. This is a current limitation of the external vocab mechanism that probably could be addressed as a new issue. For now, advanced search, and use in facets should be possible. Facets might be the best way to quickly find things funded by one agency. For simple search, there could still be problems with the stemming and other settings we have on the underlying field, i.e. "Institutes" would match "Institute" as well.

  4. I wasn't able to reproduce this on my test server - the oai_datacite was produced when I had one ror affiliation for author and two or three fundreg entries, using controlled and free-text entries.

  5. This is ~by design in that the ORE and json outputs are intended to be complete and the other exports then have to be updated if/when someone determines whether those formats could/should display something different (the human readable form, the i18n form, both with the identifier as an attribute, etc.). The raw info should be available to all the exporters through the json they receive so further changes can be done in new issues/by someone familiar with a given exporter.

  6. I haven't looked into this but I assume not seeing something in the DDI HTML is a bug (since it is supposed to mirror the DDI XML?) but perhaps just a missing enhancement if that was added to the DDI XML recently.

J2) Unlike ORCID, Fundreg doesn't provide much to help with prioritizing (that I've found anyway). I have added support to search not only by name but by the acronym and tags that they provide. These help the query 'NIH' bring the National Institutes of Health (which doesn't contain NIH) into the search results. I also raise the priority of anything you've chosen recently (last ~100 choices), so NIH should be first once you've used it once. There's code that prioritizes by prior use, then acronym, then tags (various parts of NIH add that tag even though they are one institute within NIH. The relative priorities there could be changed, or perhaps weights could be added. A more useful thing that only Crossref/a service could do would be to prioritize results based on popularity (either hits or the number of existing resources that cite that funder, etc.).

J3 - see 7

J4 - this is in the URL? If so, yes, the advanced search and facets use the id when searching rather than the name (or international variants) for the back end, but all the places in the UI should be showing the name (again, from 6 these may show briefly as the id until the lookup can complete (against the server the first time or more quickly from browser memory after that).

J5 -the popup shows the alternate names and acronyms that Crossref provides for the given id. Thus while the facet shows National Institutes of Health, which is the official name, you may have searched for NIH to add it and the popup will show that value. Whether there's a popup and what it shows (other names or just acronym) is controlled by the javascript in the external vocab repo and changes could be made there. There may be some limitations if we want to just exclude some values, i.e. I'm not sure that alternate names are distinguished from i18n values when Fundreg returns them.

J6 - see 9. I'd definitely suggest changes to other formats should be separate issues.

@jggautier
Copy link
Contributor

Thanks @qqmyers for being so thorough (as always!).

The rationale for the popup is interesting. I haven't seen anything like it before. Is this being used in other repositories right now? I took a quick look at the facets on QDR's "Root" collection but didn't see anything (although I wouldn't think those facets are being populated by terms from an external vocab).

@qqmyers
Copy link
Member Author

qqmyers commented Apr 20, 2023

The external vocab mechanism tells the script whether an item needs to be displayed, or an input/selection UI is needed. What the script does for each of those is up to the script, but, on the display side, there isn't currently any distinction for display in different parts of the UI, i.e. in the metadata pane, a facet, as a search hit, etc. A couple of the example scripts do use a popup - I think ORCID is the other one - it will show the email of the person (if their email is public at ORCID). For internal text/CVV fields, there is no equivalent mechanism to customize the display (just the basic formatting in the metadata block). If changes are desired, if the change applies to all places a term is displayed, the script can be updated. If a general mechanism is needed to make facet display different from display elsewhere, it would be an extension of the mechanism itself and would involve changes to Dataverse and scripts and the config mechanism, schema, etc.

@kcondon
Copy link
Contributor

kcondon commented Apr 20, 2023

Stefano has decided that we should release this due to grant commitments and address/fix fundreq service performance with them and NIH after release.

Have finished testing, will consolidate the above open issues and new questions/concerns here.
New:

  1. With fundreg, ror, skosmos ennabled, on publish log file is very noisy:
    cvoc_noisy_log_on_pub.txt
  2. With ror enabled, server log complains about default dataverse.org as being invalid ror value:
    default_ror_err.txt
  3. Ror affiliation export varies by format, some have none (DC), uri only (DDI,Datacite,DDI_HTML, OpenAire, schema.org), some display string only (OAI_ORE), some both (json). Is this by design?
  4. Fundreg export for a couple formats (json, oai_ore) has a lot of extra content entries, all similar for funder, Center for Alternatives to Animal Testing, Johns Hopkins Bloomberg School of Public Health
    json_jhopkins_xtra_content.txt
  5. Likely an existing bug but Affiliation facet shows many empty string facets:

Screen Shot 2023-04-20 at 12 38 51 PM

Existing, from above:
E2, E3. Some UI flakiness (cropping of values, apparently disappearing or masked values when adding new record) affected by narrow window scaling of page and possibly server performance.
E7. Basic search does not support searching on display strings for funding agency or ror.
E10. DDI_HTML export does not include funding agency info but ddi export does.

@qqmyers
Copy link
Member Author

qqmyers commented Apr 20, 2023

  1. Most of these were real, which is good news. Turns out simple search was enabled and there were parsing errors. I was able to fix those and now simple search works. I also turned a couple remain logger.info()s into fine()s.
  2. The code was not ignoring free-text entries and would ask the servers about those - now fixed.
  3. I added the @id to the ORE export - it should be complete like the json export. For other formats, I'd suggest separate issues for any updates.
  4. The Dataverse code stores whatever the config tells it to pull from the server's response. In the case of fundreg, I believe it puts multiple English variants in the same structure with other language variants. We currently have the capability to store none of that or all of it, but can't filter to just get other language terms (assuming we don't actually want the English name variations to be sent.) I'd suggest a separate issue(s) to figure out what's desired here given what fundreg returns and the variability between their entries and then implement whatever change is desired. (Any given installation can remove what's there now by editing the example configuration in the repo.)
  5. This was a cut/paste type in the ror script. When the value was a string instead of an id, the script added a blank between the () chars instead of the string. Now fixed in the ror.js script.
    E2, E3) The blank values are probably due to the fundreg server being slow - the script currently displays nothing until the call to fundreg returns. It is probably possible to add in a spinner or temporary value by editing the script. I'd suggest a separate issue here just because that would all be in the javascript in the external vocab repo, versus changes to Dataverse. For the scaling, it is a ~known issue with the select2 widget that it doesn't resize when the page is dynamically resized. Refreshing on a resized page will update the widget. I believe when I looked before this was something that was reported to the select2 devs but I'm not sure. In any case, I think it would be hard for us to fix (could be wrong) unless there's an update. An alternative would be to move to use an alternate widget in any/all scripts. The original community ror script, for example, just put an icon on the page and if you clicked it would popup a separate dialog to let you find the org you wanted. That, or other design could be done by changing the script - again an issue for the gdcc ext-vocab repo. I did also notice a static issue with narrow pages - as reported in other images, the text can appear below the field input. I verified on my machine that with the current css that only happens when the screen is narrow enough that other non-external-cvoc fields are also messed up (I think it was an internal CVV field where the drop-down control slid under the adjacent child field.) In any case, the static issue could be fixed by providing different css values depending on the screen width as is done in other places. That would be a Dataverse issue rather than one for the gdcc ext-vocab repo.
    E7 - as noted above, simple search is now working.
    E10 - I assume this is true regardless of whether the field is associated with an external-vocab? If so, it should probably be a separate issue (like the existing issue about removing the funder contributor type which isn't external vocab related).

@kcondon
Copy link
Contributor

kcondon commented Apr 24, 2023

Latest testing shows all major issues resolved. It was decided to merge and perform any ux later after spa.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
D: 5 Core PIDs Deliverable Increment defining how we support the 5 core PIDs NIH OTA: 1.2.1 2 | 1.2.1 | Design and implement integration with controlled vocabularies | 5 prdOwnThis is an it... pm.GREI-d-1.2.1 NIH, yr1, aim2, task1: Design and implement integration with controlled voc Size: 10 A percentage of a sprint. 7 hours.
Projects
None yet
8 participants