Skip to content

Conversation

@Soxasora
Copy link
Member

@Soxasora Soxasora commented Oct 11, 2025

Description

fixes #2341
fixes #1433
can address #850 for images

We check if a link is a media file by downloading it fully, and then we download it again when we want to render it.
This PR improves media type recognition on links by fetching HEAD or, as a fallback, the first magic bytes via magic-bytes.js.

Client and imgproxy can use an endpoint placed in the capture micro service (avoids CORS) to know if they're dealing with an image or a video.

Also checks if a link has HTTP Basic Auth.

Screenshots

Loading an image/avif file from the browser to check and render media vs checking the magic bytes
Proof of concept

Screen.Recording.2025-10-11.at.17.19.03.mp4

Loading a URL with CORS

Screen.Recording.2025-10-28.at.12.20.07.mp4

Additional Context

The endpoint lives in the capture micro service, because of this, the compose profile must have "capture".
Maybe we can add an extra fallback for when the capture instances go offline, if ever


We can get rid of the HEAD fetch if there's the possibility of false informations, it's just a cheap way to get Content-Type


Can address #850 for images

The first magic bytes of an image can also contain informations about dimensions, and we could use it to avoid render jumps before imgproxy takes over. It's not implemented in this PR


This job could also be done in-house but we would deal with lots of magic numbers this way, so a popular and well-maintained library seemed a better idea.


This doesn't get rid of the heuristics involved in the imgproxy worker, it's still something that we know for sure that it works. But it's definitely redundant now.

Checklist

Are your changes backward compatible? Please answer below:

For example, a change is not backward compatible if you removed a GraphQL field or dropped a database column.
Yes

On a scale of 1-10 how well and how have you QA'd this change and any features it might affect? Please answer below:
7, pretty good actually

For frontend changes: Tested on mobile, light and dark mode? Please answer below:
n/a

Did you introduce any new environment variables? If so, call them out explicitly here:
The following env vars have been introduced

MEDIA_CHECK_URL_DOCKER=http://capture:5678/media
-- url for imgproxy, communication between containers
NEXT_PUBLIC_MEDIA_CHECK_URL=http://localhost:5678/media
-- url for client-side fetches, e.g. media-or-link.js

The last one has been introduced in .env.production too but I don't think that file is even used

Did you use AI for this? If so, how much did it assist you?
The readFirstBytes function is partially vibed, there were some things not really clear to me in that moment about the part of reading the small chunk with Reader, so it came in help.


Note

Adds a /media endpoint in the capture service for HEAD/magic-bytes media detection and integrates it in the client and imgproxy worker, with new env/config wiring.

  • Capture service:
    • Add GET /media/:url route with CORS, using HEAD then magic bytes (magic-bytes.js) to detect image/* or video/* (capture/index.js, capture/media-check.js).
    • New deps: cors, magic-bytes.js (capture/package.json).
  • Worker (imgproxy):
    • Use MEDIA_CHECK_URL_* to query capture media check before falling back to direct HEAD/GET (worker/imgproxy.js).
  • Frontend:
    • Replace DOM-based media probing with fetch to PUBLIC_MEDIA_CHECK_URL in components/media-or-link.js.
  • Config/Env:
    • Add MEDIA_CHECK_URL_DOCKER and NEXT_PUBLIC_MEDIA_CHECK_URL (.envs).
    • Expose process.env.NEXT_PUBLIC_MEDIA_CHECK_URL via webpack DefinePlugin (next.config.js).
    • Update docker compose: add images profile for capture service (docker-compose.yml).

Written by Cursor Bugbot for commit e4e2e6f. This will update automatically on new commits. Configure here.

@socket-security
Copy link

socket-security bot commented Oct 11, 2025

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedmagic-bytes.js@​1.12.110010010080100

View full report

@Soxasora Soxasora marked this pull request as ready for review October 11, 2025 18:15
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@Soxasora Soxasora marked this pull request as draft October 12, 2025 09:25
@Soxasora Soxasora marked this pull request as ready for review October 12, 2025 16:24
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

Copy link
Member

@ekzyis ekzyis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a quick look, didn't pull the code

cursor[bot]

This comment was marked as outdated.

@Soxasora
Copy link
Member Author

Thanks ^^
Like suggested, I think we can also give it its own micro-service, which could also house many more tools in the future

@Soxasora Soxasora force-pushed the better_media_recognition branch from f90baef to 3178619 Compare October 18, 2025 23:43
@Soxasora Soxasora added the enhancement improvements to existing features label Oct 20, 2025
Copy link
Member

@huumn huumn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. I need to redeploy capture before merging this.

cursor[bot]

This comment was marked as outdated.

Copy link
Member

@ekzyis ekzyis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs https://capture.stacker.news/ to return CORS headers to work, see discussion in chat

cursor[bot]

This comment was marked as outdated.

capture/index.js Outdated
Comment on lines 56 to 61
app.use(cors({
origin: process.env.NEXT_PUBLIC_URL,
methods: ['GET', 'OPTIONS'],
credentials: false
}))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I think this applies to capture too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think that it's gonna be a problem, for NextSeo in particular, but if everything worked before then maybe it's better to restrict CORS to the media endpoint

export const IMAGE_PIXELS_MAX = 35000000
// backwards compatibile with old media domain env var and precedence for docker url if set
export const MEDIA_URL = process.env.MEDIA_URL_DOCKER || process.env.NEXT_PUBLIC_MEDIA_URL || `https://${process.env.NEXT_PUBLIC_MEDIA_DOMAIN}`
export const PUBLIC_MEDIA_CHECK_URL = process.env.NEXT_PUBLIC_MEDIA_CHECK_URL
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Undefined Environment Variable Causes Invalid Fetch URLs

PUBLIC_MEDIA_CHECK_URL is exported as undefined if NEXT_PUBLIC_MEDIA_CHECK_URL environment variable is not set. This causes undefined to be used as part of fetch URLs in components/media-or-link.js (line 137) and worker/imgproxy.js (line 152), resulting in invalid URLs like "undefined/..." being fetched. While environment variables are currently always set, this should have a fallback or null check to prevent unexpected behavior in edge cases.

Fix in Cursor Fix in Web

@ekzyis
Copy link
Member

ekzyis commented Oct 28, 2025

Okay, looks good to me now.

Please upload a video where you test this, so I can see that you tested it and how you tested it.

Then I will test this myself again.

@Soxasora
Copy link
Member Author

I uploaded a video where I test the endpoint by loading a post with a number of URLs (that are images), and a new post with just a single URL.
I also took a video of the capture container being loaded by the images profile, but I realized that I captured (tss) nothing worth being shown (docker loading)

@huumn huumn merged commit 654af65 into stackernews:master Oct 28, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement improvements to existing features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Attempt to render links as media triggers HTTP Basic Auth AVIF images are rendered as video

3 participants