Introduce Web-scraping inside JabRef

Currently, our web search sends out search strings to API endpoints and then interprets the results. In other words: We have fetchers with API key and screen scraping. For the screen scapers, they mostly don't work. We should switch to a browser-based screen-scraping. Mostly because of CloudFlare.

JabRef should display the HTML page **inside** JabRef and offer scraping the citations directly from the page. Similar as BibDesk does.

<img width="1233" alt="316482562-b4a3d1e7-bd0a-4475-ae52-71120ae0d1fe" src="https://github.com/koppor/jabref/assets/1366654/fd7b54c3-1a43-455f-be91-6d1d6ae88f69">

<img width="1182" alt="316482726-6a80130f-f920-44a4-8689-f420fa459226" src="https://github.com/koppor/jabref/assets/1366654/56f335c6-a164-4775-abc5-e872829a36f0">

---

Maybe the [Java Chromium Embedded Framework (JCEF)](https://github.com/JetBrains/jcef) helps. The test class https://github.com/chromiumembedded/java-cef/blob/master/java/tests/detailed/handler/RequestHandler.java seems to guide one to the usage.

---

The PR https://github.com/JabRef/jabref/pull/7075 attempted to display the Google Scholar captchas in JabRef. The PR was not completed. -- This issue says: Rewrite the fetchers not to use [`URLDownload`](https://github.com/JabRef/jabref/blob/main/src/main/java/org/jabref/logic/net/URLDownload.java), but JCEF.

Note that this is different from https://github.com/JabRef/jabref/issues/11093. There, a new UI is demanded.

Here, it should be allowed that the fetchers run **stand-alone** **without user interaction**.

---

Affected fetchers:


- ACS: org.jabref.logic.importer.fetcher.ACS
- Google Scholar: org.jabref.logic.importer.fetcher.GoogleScholar)
- Icar: org.jabref.logic.importer.fetcher.IacrEprintFetcher
- JStor: org.jabref.logic.importer.fetcher.JstorFetcher
- ResearchGate: org.jabref.logic.importer.fetcher.ResearchGate
- ScienceDirect: org.jabref.logic.importer.fetcher.ScienceDirect
- SpringerLink: org.jabref.logic.importer.fetcher.SpringerLink

Sometimes, the API used. Then `findFullText` is the method handling HTML only.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Web-scraping inside JabRef #11093

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Introduce Web-scraping inside JabRef #11093

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions