Skip to content

Fix DOI fetcher and add documentation on fetcher trust levels #6990

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 16, 2021

Conversation

koppor
Copy link
Member

@koppor koppor commented Oct 7, 2020

Follow-up to This reverts #6937 in order to add documentation on our decisions taken in 2018.. That PR added a workaround to thing introduced in 15c7981.

In this PR,

  • A bug in DoiResultion is fixed: String similarity was used (where it should not have been)
  • Documentation on fetcher trust levels is added

Small, unlreated changes:

  • Fix reading of keys from build.properties. Gradle wrote literally null in the generated build.properties. It was changed to contain "" instead of "null" for an unknown key
  • Fix typo "chache"

Side note: It seems that 15c7981 accidently committed to the master branch and not using a PR.

  • Change in CHANGELOG.md described (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked documentation: Is the information available and up to date? If not created an issue at https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository.

Copy link
Member

@tobiasdiez tobiasdiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I still think that Publisher should be a higher priority as Source. They also take the DOI into account, but use the publisher-specific API to get the fulltext url. Thus, they are more reliable than just following the DOI and then trying to guess the PDF path from the webpage.

@stefan-kolb
Copy link
Member

stefan-kolb commented Oct 8, 2020

Thanks!

I still think that Publisher should be a higher priority as Source. They also take the DOI into account, but use the publisher-specific API to get the fulltext url. Thus, they are more reliable than just following the DOI and then trying to guess the PDF path from the webpage.

  • Publisher can only be higher ranked than source if it ALWAYS searches via DOI and does not return close/related results or even searches based on strings.
  • We might switch priority then but should rename SOURCE to e.g. DOI then
  • Another "problem" is that the downloads might take longer than before as they Publisher fetcher s are generally not that fast compared to the direct DOI forwarding

@tobiasdiez
Copy link
Member

Ok, so we have: Publisher_Identifier (use doi or other identifier, uses publisher API), Identifier (general identifier-based), Publisher (not identifier-based), preprint, other

I'm not sure if the distinction between publisher with and without identifier is really necessary, but that should make it clear also in the future.

@koppor
Copy link
Member Author

koppor commented Nov 23, 2020

The issue is about full-text fetchers.

The issue was that one fetcher implementeation fetches the complete proceedings and the other fetcher fetches the concrete paper. A DOI points to the paper; the publisher fetcher fetches the complete proceedings. This is unintended.

I will try to pair-up with @stefan-kolb

@stefan-kolb
Copy link
Member

We fixed a similarity problem inside the DoiResolution that REALLY caused this issue.

@koppor koppor changed the title Add documentation on fetcher trust levels Fix DOI fetcher and add documentation on fetcher trust levels Jan 10, 2021
- remove similarity of links
- add citation meta tag <meta name="citation_pdf_url">
Default value is "" and not "null"
@koppor koppor removed their assignment Jan 10, 2021
@koppor koppor added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Jan 10, 2021
Copy link
Member

@tobiasdiez tobiasdiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, LGTM. One question though.

tobiasdiez
tobiasdiez previously approved these changes Jan 11, 2021
@Siedlerchr Siedlerchr merged commit 8a66328 into master Jan 16, 2021
@Siedlerchr Siedlerchr deleted the add-fetcher-doc branch January 16, 2021 17:08
Siedlerchr added a commit that referenced this pull request Jan 17, 2021
…dtask

* upstream/master:
  Export urldate to MSOffice (#7357)
  Fix DOI fetcher and add documentation on fetcher trust levels (#6990)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants