Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalidate ORC metadata cache based on file modification time #24346

Merged

Conversation

nmahadevuni
Copy link
Member

@nmahadevuni nmahadevuni commented Jan 10, 2025

Description

Supports invalidation of ORC metadata cache entries based on the file modification time.

Motivation and Context

For traditional Hive ORC tables, after Presto reads and caches ORC metadata, if other tools modify the ORC files in-place. Presto would try to read old offsets in the new file resulting in errors like "Malformed ORC file..". This PR implements support to invalidate the cache entries based on the file modified timestamp.

Impact

No impact.

Test Plan

Added tests to AbstractTestOrcReader.java.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add support for ORC metadata cache invalidation based on file modification time. :pr:`24346`

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Jan 10, 2025
@prestodb-ci prestodb-ci requested review from a team, czentgr and aditi-pandit and removed request for a team January 10, 2025 13:38
@nmahadevuni nmahadevuni changed the title [WIP]: Invalidate ORC metadata cache based on file modification time Invalidate ORC metadata cache based on file modification time Jan 10, 2025
Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One concern I have with this solution is that the file modification times are also cached by the directory lister implementation. If the caching directory lister is enabled, then we won't get updated file modification times, and this solution won't solve the original issue.

I traced the call paths to where we populate the fileLastModified time and it comes from the result of directoryLister.list inside of StoragePartitionLoader.java. If the listing result is cached, we won't get the updated file modification time. "Fixing" the issue would mean bypassing the listing cache which would also have a pretty significant impact on performance. It might be easier to just provide a procedure to invalidate the cache - either through a JMX procedure or SQL procedure.

@steveburnett
Copy link
Contributor

Please add the PR number to the release note entry, and nit rephrasing following the Order of changes in the Release Notes Guidelines.

== RELEASE NOTES ==

General Changes
* Add support for ORC metadata cache invalidation based on file modification time. :pr:`24346`

@nmahadevuni
Copy link
Member Author

One concern I have with this solution is that the file modification times are also cached by the directory lister implementation. If the caching directory lister is enabled, then we won't get updated file modification times, and this solution won't solve the original issue.

I traced the call paths to where we populate the fileLastModified time and it comes from the result of directoryLister.list inside of StoragePartitionLoader.java. If the listing result is cached, we won't get the updated file modification time. "Fixing" the issue would mean bypassing the listing cache which would also have a pretty significant impact on performance. It might be easier to just provide a procedure to invalidate the cache - either through a JMX procedure or SQL procedure.

Agree. All caches which do time based invalidation will come with the caveat that file list caching may interfere with its functioning. There is a stored procedure invalidate_directory_list_cache() which users can use to flush the file list cache.

@nmahadevuni nmahadevuni force-pushed the invalidate_orc_metadata_cache branch from 290adc3 to 3138555 Compare January 13, 2025 09:15
@nmahadevuni
Copy link
Member Author

@ZacBlanco Addressed the comments. Please review. @agrawalreetika @pratyakshsharma can you please have a look?

Copy link
Member

@agrawalreetika agrawalreetika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes. LGTM

@nmahadevuni
Copy link
Member Author

@hantangwangd Can you please review this?

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change, overall looks good to me, this can alleviate a significant portion of the problems caused by cache stale.

Bring one question for discussing: when we enable the cached directoryLister and a new file is added into any partition of a table through an external application without modify existing ones, do we still unable to recognize it?

return delegate.getStripeFooterSlice(orcDataSource, stripeId, footerOffset, footerLength, cacheable);
CacheableSlice cacheableSlice = footerSliceCache.getIfPresent(stripeId);
if (cacheableSlice != null) {
if (cacheableSlice.getFileModificationTime() == fileModificationTime) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little question, what is more appropriate when cacheableSlice.getFileModificationTime() > fileModificationTime? Is it intentional to invalidate and refill the cache in such scenario?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are assuming that new fileModificationTime will always be same or more recent than cacheableSlice.getFileModificationTime().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's ok.

@nmahadevuni
Copy link
Member Author

Thanks for the change, overall looks good to me, this can alleviate a significant portion of the problems caused by cache stale.

Bring one question for discussing: when we enable the cached directoryLister and a new file is added into any partition of a table through an external application without modify existing ones, do we still unable to recognize it?

Right, it will not recognize the new file because the CachingDirectoryLister will have the stale file list.

@hantangwangd
Copy link
Member

Right, it will not recognize the new file because the CachingDirectoryLister will have the stale file list.

Thanks, glad to confirm this.

@nmahadevuni
Copy link
Member Author

@hantangwangd @ZacBlanco Can one of you merge this please? I see the merge button is disabled for me even after approvals.

@ZacBlanco
Copy link
Contributor

We need a committer approval - @tdcmeehan could you help?

@tdcmeehan tdcmeehan merged commit cf19b92 into prestodb:master Jan 23, 2025
52 checks passed
@shangm2
Copy link
Contributor

shangm2 commented Jan 23, 2025

Hey guys. We are seeing the following errors due to the conflict between this pr and #24351. @nmahadevuni @tdcmeehan

org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.0:compile (default-cli) on project presto-orc: Compilation failure /Users/shangma/workspace/presto/presto-orc/src/main/java/com/facebook/presto/orc/StripeReader.java:[260,75] method getRowIndexes in interface com.facebook.presto.orc.StripeMetadataSource cannot be applied to given types; required: com.facebook.presto.orc.metadata.MetadataReader,com.facebook.presto.orc.metadata.PostScript.HiveWriterVersion,com.facebook.presto.orc.StripeReader.StripeId,com.facebook.presto.orc.StreamId,com.facebook.presto.orc.stream.OrcInputStream,java.util.List<com.facebook.presto.orc.metadata.statistics.HiveBloomFilter>,com.facebook.presto.common.RuntimeStats,long found: com.facebook.presto.orc.metadata.MetadataReader,com.facebook.presto.orc.metadata.PostScript.HiveWriterVersion,com.facebook.presto.orc.StripeReader.StripeId,com.facebook.presto.orc.StreamId,com.facebook.presto.orc.stream.OrcInputStream,<nulltype>,com.facebook.presto.common.RuntimeStats reason: actual and formal argument lists differ in length

@tdcmeehan
Copy link
Contributor

The trivial fix is in #24421

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants