Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pinot 1.2.0] Batch upload for realtime table using Spark fails with error "Creation time must be set for uploaded realtime segment name generator" #14083

Open
ajeydudhe opened this issue Sep 25, 2024 · 1 comment

Comments

@ajeydudhe
Copy link

ajeydudhe commented Sep 25, 2024

Steps to reproduce

  • Create schema for realtime table and define the table config having full upsert enabled.
  • Use the attached job spec for spark-submit command.
  • Note that there was issue with using http endpoint to fetch table config since it seems to expect the config to be returned only for OFFLINE table. Hence, using the local file path for realtime table. This is another issue.
  • Following is the segmentNameGeneratorSpec used.
  • The input file has format: uploaded__myTable__0__20220101T0000Z__suffix
  • Tried using the type as inputFile and uploadedRealtime
  • If type = uploadedRealtime then it fails with error "Creation time must be set for uploaded realtime segment name generator"
  • If type is inputFile and generated segment has same name format then segment gets loaded but on server it fails to load.
segmentNameGeneratorSpec:

  # type: Current supported type is 'simple' and 'normalizedDate'.
  type: uploadedRealtime
  #type: inputFile

  # configs: Configs to init SegmentNameGenerator.
  configs:
    #segment.name.prefix: 'uploaded__myTable__0__20220101T0000Z__suffix'
    #exclude.sequence.id: true
    # Below is for using file name as segment name
    file.path.pattern: '.+/(.+)\.json'
    segment.name.template: '\${filePathPattern:\1}'
  • Please confirm on what should be the segmentNameGeneratorSpec.type used to generate segments from json files for realtime table using Spark.

sparkIngestionJobSpec_myTable.yaml.txt

@Jackie-Jiang
Copy link
Contributor

@rohityadav1993 Can you help take a look? This is related to changes introduced in #13107

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants