-
-
Notifications
You must be signed in to change notification settings - Fork 135
enhance: improve media type recognition with HEAD or magic bytes #2599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
ekzyis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a quick look, didn't pull the code
|
Thanks ^^ |
f90baef to
3178619
Compare
huumn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. I need to redeploy capture before merging this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs https://capture.stacker.news/ to return CORS headers to work, see discussion in chat
…egrate capture in the images compose profile, address useEffect bug
capture/index.js
Outdated
| app.use(cors({ | ||
| origin: process.env.NEXT_PUBLIC_URL, | ||
| methods: ['GET', 'OPTIONS'], | ||
| credentials: false | ||
| })) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm I think this applies to capture too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't think that it's gonna be a problem, for NextSeo in particular, but if everything worked before then maybe it's better to restrict CORS to the media endpoint
| export const IMAGE_PIXELS_MAX = 35000000 | ||
| // backwards compatibile with old media domain env var and precedence for docker url if set | ||
| export const MEDIA_URL = process.env.MEDIA_URL_DOCKER || process.env.NEXT_PUBLIC_MEDIA_URL || `https://${process.env.NEXT_PUBLIC_MEDIA_DOMAIN}` | ||
| export const PUBLIC_MEDIA_CHECK_URL = process.env.NEXT_PUBLIC_MEDIA_CHECK_URL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Undefined Environment Variable Causes Invalid Fetch URLs
PUBLIC_MEDIA_CHECK_URL is exported as undefined if NEXT_PUBLIC_MEDIA_CHECK_URL environment variable is not set. This causes undefined to be used as part of fetch URLs in components/media-or-link.js (line 137) and worker/imgproxy.js (line 152), resulting in invalid URLs like "undefined/..." being fetched. While environment variables are currently always set, this should have a fallback or null check to prevent unexpected behavior in edge cases.
|
Okay, looks good to me now. Please upload a video where you test this, so I can see that you tested it and how you tested it. Then I will test this myself again. |
|
I uploaded a video where I test the endpoint by loading a post with a number of URLs (that are images), and a new post with just a single URL. |
Description
fixes #2341
fixes #1433
can address #850 for images
We check if a link is a media file by downloading it fully, and then we download it again when we want to render it.
This PR improves media type recognition on links by fetching
HEADor, as a fallback, the first magic bytes via magic-bytes.js.Client and imgproxy can use an endpoint placed in the capture micro service (avoids CORS) to know if they're dealing with an image or a video.
Also checks if a link has HTTP Basic Auth.
Screenshots
Loading an
image/aviffile from the browser to check and render media vs checking the magic bytesProof of concept
Screen.Recording.2025-10-11.at.17.19.03.mp4
Loading a URL with CORS
Screen.Recording.2025-10-28.at.12.20.07.mp4
Additional Context
The endpoint lives in the capture micro service, because of this, the compose profile must have "capture".
Maybe we can add an extra fallback for when the capture instances go offline, if ever
We can get rid of the
HEADfetch if there's the possibility of false informations, it's just a cheap way to getContent-TypeThe first magic bytes of an image can also contain informations about dimensions, and we could use it to avoid render jumps before imgproxy takes over. It's not implemented in this PR
This job could also be done in-house but we would deal with lots of magic numbers this way, so a popular and well-maintained library seemed a better idea.
This doesn't get rid of the heuristics involved in the imgproxy worker, it's still something that we know for sure that it works. But it's definitely redundant now.
Checklist
Are your changes backward compatible? Please answer below:
For example, a change is not backward compatible if you removed a GraphQL field or dropped a database column.
Yes
On a scale of 1-10 how well and how have you QA'd this change and any features it might affect? Please answer below:
7, pretty good actually
For frontend changes: Tested on mobile, light and dark mode? Please answer below:
n/a
Did you introduce any new environment variables? If so, call them out explicitly here:
The following env vars have been introduced
The last one has been introduced in
.env.productiontoo but I don't think that file is even usedDid you use AI for this? If so, how much did it assist you?
The readFirstBytes function is partially vibed, there were some things not really clear to me in that moment about the part of reading the small chunk with Reader, so it came in help.
Note
Adds a /media endpoint in the capture service for HEAD/magic-bytes media detection and integrates it in the client and imgproxy worker, with new env/config wiring.
GET /media/:urlroute with CORS, usingHEADthen magic bytes (magic-bytes.js) to detectimage/*orvideo/*(capture/index.js,capture/media-check.js).cors,magic-bytes.js(capture/package.json).MEDIA_CHECK_URL_*to query capture media check before falling back to directHEAD/GET(worker/imgproxy.js).PUBLIC_MEDIA_CHECK_URLincomponents/media-or-link.js.MEDIA_CHECK_URL_DOCKERandNEXT_PUBLIC_MEDIA_CHECK_URL(.envs).process.env.NEXT_PUBLIC_MEDIA_CHECK_URLvia webpack DefinePlugin (next.config.js).imagesprofile forcaptureservice (docker-compose.yml).Written by Cursor Bugbot for commit e4e2e6f. This will update automatically on new commits. Configure here.