Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ATProto: check blob size before storing [blocked on Bluesky] #1348

Closed
snarfed opened this issue Sep 28, 2024 · 4 comments
Closed

ATProto: check blob size before storing [blocked on Bluesky] #1348

snarfed opened this issue Sep 28, 2024 · 4 comments
Labels

Comments

@snarfed
Copy link
Owner

snarfed commented Sep 28, 2024

We're finally rolling out lexicon schema validation, #1327 / snarfed/lexrpc#3, and started seeing a few failed blobs that were over the embed limits. Traceback below.

Traceback (most recent call last):
...
  File "/workspace/protocol.py", line 1726, in send_task
    sent = PROTOCOLS[protocol].send(obj, url, from_user=user, orig_obj=orig_obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/atproto.py", line 523, in send
    record = to_cls.convert(base_obj, fetch_blobs=True, from_user=from_user)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/protocol.py", line 569, in convert
    converted = cls._convert(obj, from_user=from_user, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/workspace/atproto.py", line 790, in _convert
    appview.validate(nsid, type, ret)
  File "/layers/google.python.pip/pip/lib/python3.12/site-packages/lexrpc/base.py", line 275, in validate
    self._validate_schema(name=type, val=obj, type_=nsid, lexicon=nsid,
  File "/layers/google.python.pip/pip/lib/python3.12/site-packages/lexrpc/base.py", line 434, in _validate_schema
    self._validate_schema(name=inner_name, val=inner_val, type_=inner_type,
...
  File "/layers/google.python.pip/pip/lib/python3.12/site-packages/lexrpc/base.py", line 314, in fail

lexrpc.base.ValidationError: in app.bsky.embed.images#image, blob image with value `{'$type': 'blob', 'ref': CID('base32', 1, 'raw', '…`: has size 2493302 over maxSize 1000000

Also, related but not quite the same, we should check Content-Length and bail if a blob is too big before fetching it to hash and store in AtpRemoteBlob. Here's an example of downloading a 900MB (!) video:

2024-09-24 09:53:17.557 PDT
POST
http://atproto-hub.bridgy-federated.uc.r.appspot.com/queue/send

Task 4493810335435324008
Params: [('obj', 'ahBicmlkZ3ktZmVkZXJhdGVkclMLEgZPYmplY3QiR2h0dHBzOi8vZXcuY29tL3RoZS1ib3lzLXNlYXNvbi01LWRhdmVlZC1kaWdncy1jYXN0aW5nLWV4Y2x1c2l2ZS04NzE2MTUzDA'), ('orig_obj', ''), ('protocol', 'atproto'), ('url', 'https://atproto.brid.gy'), ('user', 'ahBicmlkZ3ktZmVkZXJhdGVkchQLEghNYWdpY0tleSIGZXcuY29tDA')]
Sending web post https://ew.com/the-boys-season-5-daveed-diggs-casting-exclusive-8716153 to atproto https://atproto.brid.gy (4 s behind)
<function requests_fn.<locals>.call at 0x78e7c9abf560> https://ew.com/thmb/T1QNu_e-AKj1LEs3Sy5IHJ0X_uI=/filters:no_upscale():max_bytes(150000):strip_icc():format(jpeg)/daveed-diggs-karl-urban-the-boys-092424-2-5a8cef4417f440079b47791a38efa8b4.jpg
requests.get https://ew.com/thmb/T1QNu_e-AKj1LEs3Sy5IHJ0X_uI=/filters:no_upscale():max_bytes(150000):strip_icc():format(jpeg)/daveed-diggs-karl-urban-the-boys-092424-2-5a8cef4417f440079b47791a38efa8b4.jpg {}
Received 200
Got 200 image/jpeg 131140 bytes https://ew.com/thmb/T1QNu_e-AKj1LEs3Sy5IHJ0X_uI=/filters:no_upscale():max_bytes(150000):strip_icc():format(jpeg)/daveed-diggs-karl-urban-the-boys-092424-2-5a8cef4417f440079b47791a38efa8b4.jpg
Creating new AtpRemoteBlob for https://ew.com/thmb/T1QNu_e-AKj1LEs3Sy5IHJ0X_uI=/filters:no_upscale():max_bytes(150000):strip_icc():format(jpeg)/daveed-diggs-karl-urban-the-boys-092424-2-5a8cef4417f440079b47791a38efa8b4.jpg CID bafkreihgxry5eckdhq47tnevc35dtkuhmo56btzbqyi3uyp3psb2b2u2l4
<function requests_fn.<locals>.call at 0x78e7c9abf560> https://content.jwplatform.com/videos/XQ52cjQE-GQ0UtFMY.mp4
requests.get https://content.jwplatform.com/videos/XQ52cjQE-GQ0UtFMY.mp4 {}
Redirected to https://videos-cloudfront.jwpsrv.com/66f2f316_93cbdb2700a87e6cb17326171db08e92bfffd6bc/content/conversions/B9u4tisZ/videos/xrRbDuXS-34577338.mp4
Received 200
Got 200 video/mp4 919236715 bytes https://videos-cloudfront.jwpsrv.com/66f2f316_93cbdb2700a87e6cb17326171db08e92bfffd6bc/content/conversions/B9u4tisZ/videos/xrRbDuXS-34577338.mp4
Creating new AtpRemoteBlob for https://content.jwplatform.com/videos/XQ52cjQE-GQ0UtFMY.mp4 CID bafkreiavppchmh7yn2zx5x7rzwarovkul4g4f5jfovya2uilp6ehtiswlm
ew.com is did:plc:6rzq64c4u5swi25ppvq3ckav
Loading repo Key('AtpRepo', 'did:plc:6rzq64c4u5swi25ppvq3ckav')
loaded repo for did:plc:6rzq64c4u5swi25ppvq3ckav at commit bafyreif7gpmrrgda5xyhkg5uxmsppuwy7r3zlfppex3ehhdtlhxrk4zv5a
Storing ATProto Action.CREATE app.bsky.feed.post 3l4w3gx73pwx2 b'{"$type":"app.bsky.feed.post","bridgyOriginalText":"The \\"Hamilton\\" and \\"Little Mermaid\\" star is the first newbie to join the established cast for the final season.","bridgyOriginalUrl":"https://ew.com/the-boys-season-5-daveed-diggs-casting-exclusive-8716153","createdAt":"2024-09-24T16:00:00.000Z","embed":{"$type":"app.bsky.embed.external","external":{"$type":"app.bsky.embed.external#external","description":"The \\"Hamilton\\" and \\"Little Mermaid\\" star is the first newbie to join the established cast for the final season.","thumb":{"$type":"blob","mimeType":"image/jpeg","ref":{"/":"bafkreihgxry5eckdhq47tnevc35dtkuhmo56btzbqyi3uyp3psb2b2u2l4"},"size":131140},"title":"\xe2\x80\x9cThe Boys\xe2\x80\x9d season 5 adds Daveed Diggs in mystery role (exclusive)","uri":"https://ew.com/the-boys-season-5-daveed-diggs-casting-exclusive-8716153"}},"text":""}'
Updating Key('AtpRepo', 'did:plc:6rzq64c4u5swi25ppvq3ckav')
Added atproto-commit task projects/bridgy-federated/locations/us-central1/queues/atproto-commit/tasks/4402426836361459938 delay None 
@snarfed snarfed added the now label Sep 28, 2024
@snarfed
Copy link
Owner Author

snarfed commented Sep 29, 2024

Interesting plot twist, evidently the AppView itself currently isn't enforcing these limits, at least for images and maybe for video too. Here's where both have maxSize set on their blobs: app.bsky.embed.images#image (1000000), app.bsky.embed.video (50000000). And here's an example post with an image embed blob (getBlob) with size 1858016. The AppView allowed it and it's visible on the post, despite being over the limit.

@snarfed
Copy link
Owner Author

snarfed commented Sep 29, 2024

Asked on Discord, @ericvolp12 is looking into it.

snarfed added a commit to snarfed/lexrpc that referenced this issue Sep 29, 2024
…Size

hopefully we'll bring this back once we figure out why the AppView isn't currently enforcing this: snarfed/bridgy-fed#1348 (comment)

...and maybe also advocate for raising it from 1MiB :)
snarfed added a commit that referenced this issue Sep 29, 2024
@snarfed
Copy link
Owner Author

snarfed commented Sep 29, 2024

We're now doing this for videos but not images.

@snarfed snarfed changed the title ATProto: check blob size before storing ATProto: check blob size before storing [blocked on Bluesky] Sep 29, 2024
@snarfed snarfed added blocked and removed now labels Sep 29, 2024
@snarfed
Copy link
Owner Author

snarfed commented Sep 30, 2024

Asked about this in bluesky-social/atproto#2845

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant