Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write metadata cache data to mappings _meta with refresh time update #805

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

seankao-az
Copy link
Collaborator

@seankao-az seankao-az commented Oct 24, 2024

Description

Metadata Cache Writer

For the most part, same as

In addition to the regular metadata storage using FlintIndexMetadataService, we're dual-writing additional fields, defined by FlintMetadataCache, to the index mappings _meta field. It's intended for frontend users to access some crucial metadata for an index quickly without invoking another backend API call.

This PR adds such fields for all indexes, if the spark config spark.flint.metadataCacheWrite.enabled is set to true.

  • _meta.properties.metadataCacheVersion: "1.0"
  • _meta.properties.refreshInterval: Integer. Refresh interval of an index measured in seconds. This field is added only if index refresh type is auto refresh and refresh_interval is set
  • _meta.properties.sourceTables: Array of Strings. For now, it's mocked data. Update coming in later PR.
  • _meta.properties.lastRefreshTime: Long. Timestamp in milliseconds when last refresh happened. This field is added only if index already gets refreshed at least once

Last Refresh Time

Added two new fields in FlintMetadataLogEntry and bumped version of its json doc from 1.0 to 1.1 (because adding new field but not changing existing fields)

  • lastRefreshStartTime: Long. Timestamp when last refresh started
  • lastRefreshCompleteTime: Long. Timestamp when last refresh completed

These are accurate only for manual refresh (full, incremental) and external scheduler for auto refresh.
For internal scheduler, the jobStartTime (or createTime in FlintMetadataLogEntry) is used to track streaming job start time.

I'm not reusing createTime because they should be updated at different times.
For createTime (for internal scheduler) it's during refreshIndex, recoverIndex, updateIndexManualToAuto
But for lastRefreshStartTime and lastRefreshCompleteTime (for manual refresh and external scheduler) it's only updated in refreshIndex

Related Issues

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…rch-project#744)

* write mock metadata cache data to mappings _meta

Signed-off-by: Sean Kao <seankao@amazon.com>

* Enable write to cache by default

Signed-off-by: Sean Kao <seankao@amazon.com>

* bugfix: _meta.latestId missing when create index

Signed-off-by: Sean Kao <seankao@amazon.com>

* set and unset config in test suite

Signed-off-by: Sean Kao <seankao@amazon.com>

* fix: use member flintSparkConf

Signed-off-by: Sean Kao <seankao@amazon.com>

---------

Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
@seankao-az seankao-az self-assigned this Oct 24, 2024
@seankao-az seankao-az added the enhancement New feature or request label Oct 24, 2024
@seankao-az
Copy link
Collaborator Author

add label to backport to the nexus branch.
To be clear, it shouldn't be backported to 0.5.
The 0.5- part in the name 0.5-nexus is obsolete

Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
Signed-off-by: Sean Kao <seankao@amazon.com>
* Handles refresh for refresh mode AUTO, which is used exclusively by auto refresh index with
* internal scheduler.
*/
private def refreshIndexAuto(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we update for auto refresh?

* @throws IllegalArgumentException if the schedule string is invalid
*/
public static IntervalSchedule parse(String scheduleStr) {
public static Long parseMillis(String scheduleStr) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParseMillis sounds like parsing millisec string. Should we call it parseAndConvertToMillis?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants