PDFJS word and sentence highlight with Nextjs demo

Get up and running

Clone

cd <path-to-cloned-repo>
npm install
npm run dev

And open a localhost link provided by npm run dev.

Use case

Soon we'll have a text-to-speech functionality, so we need to have a way of visualizing what is being currently read by a voice. Highlighting boundaries of a sentences; and current spoken word.
- Speechify is a good example and implementation

Requirements

Make as many calculations on the front-end side as possible
Precisely highlight sentences and words
Demo viewer
Demo integration with Next.js (this repo)
Handle resizes properly
Doesn't interfere with annotations nor search
For POC #1, use a simple react + vite + ts template
For POC #2, use a simple next.js template
Verify WebKit, Blink and Gecko on Desktop
Verify Webkit on iOS
Verify Chrome-based (Blink) and Firefox (Gecko) on Android

Remarks

Same as the POC #1 https://github.com/inlinecoder/curie-pdfjs-viewer:

- Possible implementations for word and sentence highlight:
    - With Canvas: Simple; easy scaling; animations and transitions will be harder to achieve;
    - With DOM elements: Simple, but harder to scale; animation and transitions are a breeze; can't use compound shapes;
    - With SVG: Highly flexible customizable; compound shapes; animation and transitions;
- SVG is chosen
- Pdfjs `textContent` doesn't provide info to work with
    - Opted for client-side OCR
    - OCR, however, isn't ideal, and doesn't recognize some text. Example: gray text.
        - Needs investigation and tinkering
    - Much more reliable info, but a normalization required (done)
- Firefox has a weird first-time PDF rendering issue
- Mobiles need further investigation
- In dev mode, Pdfjs leaves a `console.error` message, but that's bc of React doing some magic under the hood, e.g. rendering twice: one real and one virtual to help devs to ensure the code has no side-effects
    - In prod, it doesn't happen. Source — React docs
- A bunch of demo PDFs are included
- A sentence boundaries check requires more work. Example: `I'm not Dr. Watson.`. The `Dr.` should not be a sentence terminator.

Added a start page, a viewer page and the not found one
Updating the url when selecting different PDF demo assets
Could've forgotten about something else

Not doing

Zoom in / out
- With SVG layer, should be a piece of cake
Scrolling across all pages
- It's not clear when to run an OCR to highlight anything, when you scroll here and there
Changing pages, as it's just a POC

Out of scope

It's just a proof of concept and feasibility check, e.g. playground. Thus to speed the things up, some things are left out.

❌ Proper testing
❌ Linter and formatting settings
❌ Common abstractions for PDF, ePub, Mobi, etc.
❌ Optimizations
❌ Decoupling, etc.
❌ Edge-cases
etc..

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
public		public
scripts		scripts
src		src
.gitignore		.gitignore
README-DEMO.gif		README-DEMO.gif
README.md		README.md
components.json		components.json
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDFJS word and sentence highlight with Nextjs demo

Get up and running

Use case

Requirements

Remarks

Not doing

Out of scope

About

Uh oh!

Releases

Packages

Languages

inlinecoder/curie-pdfjs-viewer-nextjs

Folders and files

Latest commit

History

Repository files navigation

PDFJS word and sentence highlight with Nextjs demo

Get up and running

Use case

Requirements

Remarks

Not doing

Out of scope

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages