Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to remove invalid block #1040

Closed
daltschu22 opened this issue Sep 30, 2024 · 5 comments
Closed

unable to remove invalid block #1040

daltschu22 opened this issue Sep 30, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@daltschu22
Copy link

daltschu22 commented Sep 30, 2024

Mountpoint for Amazon S3 version

1.8.0

AWS Region

us-east-1

Describe the running environment

Running on an ec2 with Rocky 8.10

Runs as a systemd service

Mountpoint options

/usr/bin/mount-s3 --read-only --allow-other --file-mode 0555 --dir-mode 0555 --part-size 134217728 --metadata-ttl 300 --cache /opt/mountpoint/cache/<bucket name> --max-cache-size 1024 <bucket name> --prefix .fuse/references_nosymlinks/

What happened?

Mountpoint was running cleanly for months and then hard failed.

Required a fusermount -zu to be able to remount.

Below logs appeared on many different object keys before the crash. All in the same directory in the bucket.

Relevant log output

Sep 29 00:38:10 <host name> mount-s3[15471]: [WARN] mountpoint_s3::prefetch::caching_stream: error reading block from cache cache_key=ObjectId { inner: InnerObjectId { key: "<object key>", etag: ETag("\"51fbfbec40872e0057cd626920cb58e7-24\"") } } block_index=30 range=18874368..85983232 out of 3101804844 error=IoFailure(Error { kind: UnexpectedEof, message: "failed to fill whole buffer" })
Sep 29 00:38:10 <host name> mount-s3[15471]: [WARN] mountpoint_s3::data_cache::disk_data_cache: unable to remove invalid block: Os { code: 2, kind: NotFound, message: "No such file or directory" }
Sep 29 00:38:10 <host name> mount-s3[15471]: [WARN] mountpoint_s3::prefetch::caching_stream: error reading block from cache cache_key=ObjectId { inner: InnerObjectId { key: "<object key>", etag: ETag("\"51fbfbec40872e0057cd626920cb58e7-24\"") } } block_index=30 range=29360128..230686720 out of 3101804844 error=IoFailure(Error { kind: UnexpectedEof, message: "failed to fill whole buffer" })
Sep 29 00:38:27 <host name> mount-s3[15471]: [WARN] mountpoint_s3::data_cache::disk_data_cache: block could not be deserialized: Io(Error { kind: UnexpectedEof, message: "failed to fill whole buffer" })
Sep 29 00:38:27 <host name> mount-s3[15471]: [WARN] mountpoint_s3::prefetch::caching_stream: error reading block from cache cache_key=ObjectId { inner: InnerObjectId { key: "<object key>", etag: ETag("\"51fbfbec40872e0057cd626920cb58e7-24\"") } } block_index=586 range=614465536..2761949184 out of 3101804844 error=InvalidBlockContent
Sep 29 00:38:59 <host name> mount-s3[15471]: [WARN] mountpoint_s3::data_cache::disk_data_cache: unable to remove invalid block: Os { code: 2, kind: NotFound, message: "No such file or directory" }
Sep 29 00:39:15 <host name> systemd[1]: <service name>.service: Main process exited, code=killed, status=6/ABRT
Sep 29 00:39:15 <host name> systemd[1]: <service name>.service: Failed with result 'signal'.
@daltschu22 daltschu22 added the bug Something isn't working label Sep 30, 2024
@daltschu22 daltschu22 changed the title Mountpoint crashed for unknown reasons unable to remove invalid block Sep 30, 2024
@passaro
Copy link
Contributor

passaro commented Oct 1, 2024

Hi @daltschu22, the warnings indicate that Mountpoint cannot retrieve the data stored in the local cache. Could there be another process modifying or deleting files in the cache directory /opt/mountpoint/cache/<bucket name>? We recommend avoiding that since Mountpoint will automatically manage the files in the cache to respect the specified max-cache-size.

That said, it is not at all clear whether these issues are related to Mountpoint crashing. Are you able to reproduce the crash? If so, could you enable debug logging (--debug, see docs) and provide more details on what is happening before the crash?

@daltschu22
Copy link
Author

daltschu22 commented Oct 1, 2024

Thanks @passaro !

I cant see a reason why anything else would have been modifying data in that cache directory. The permissions dont allow normal users on the machine to do any modifications in there.

Unfortunately these machines are used by numerous scientists for various purposes so its hard to say specifically what the mount was being used for at the time of the crash. I will say we have 5 mountpoint mounts on the system and none of the others experienced any issues. We also didn't see any other indications of the machine itself having problems, only the mountpoint service for that bucket specifically.

I will be happy to update if we do see the problem again, but I'm not sure we want to run in debug mode all the time until it happens.

I figured these logs wouldnt point to any smoking guns, but wanted to document either way. Happy to close this out if it comes back up? up to you.

Thank you!

@passaro
Copy link
Contributor

passaro commented Oct 1, 2024

Before closing, can I ask if you already checked whether this is another occurrence of out-of-memory, like #674?

@daltschu22
Copy link
Author

We didnt see any out of memory events on our monitoring. The machine seemed fine otherwise.

@passaro
Copy link
Contributor

passaro commented Oct 7, 2024

Thanks @daltschu22. I'll close this for now. But please do reopen if the issue occurs again and/or you have more information.

@passaro passaro closed this as completed Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants