Skip to content

Conversation

@xushiyan
Copy link
Member

Change Logs

Update dbt example with more detailed instructions.

Impact

Improve dbt example for learning.

Risk level

None.

Documentation Update

NA

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@xushiyan xushiyan merged commit 18f7181 into apache:master Nov 22, 2023
@xushiyan xushiyan deleted the HUDI-7133-dbt-example branch November 22, 2023 08:00
@xushiyan xushiyan linked an issue Nov 22, 2023 that may be closed by this pull request
jonvex pushed a commit to jonvex/hudi that referenced this pull request Nov 29, 2023
commit dfa3bde
Merge: bfc0a85 473cf9a
Author: Jonathan Vexler <=>
Date:   Wed Nov 29 15:01:45 2023 -0500

    Merge branch 'master' into fg_reader_implement_bootstrap

commit bfc0a85
Author: Jonathan Vexler <=>
Date:   Wed Nov 29 14:55:57 2023 -0500

    fix bug with nested required fields due to spark nested schema pruning bug

commit 473cf9a
Author: Rajesh Mahindra <76502047+rmahindra123@users.noreply.github.com>
Date:   Wed Nov 29 08:37:40 2023 -0800

    [HUDI-7138] Fix error table writer and schema registry provider (apache#10173)

    ---------

    Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>

commit 91eabab
Author: Lin Liu <141371752+linliu-code@users.noreply.github.com>
Date:   Tue Nov 28 23:49:37 2023 -0800

    [HUDI-7103] Support time travel queies for COW tables (apache#10109)

    This is based on HadoopFsRelation.

commit b300728
Author: Rajesh Mahindra <76502047+rmahindra123@users.noreply.github.com>
Date:   Tue Nov 28 22:31:12 2023 -0800

    [HUDI-7086] Fix the default for gcp pub sub max sync time to 1min (apache#10171)

    Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>

commit 8370c62
Author: Shiyan Xu <2701446+xushiyan@users.noreply.github.com>
Date:   Tue Nov 28 22:31:34 2023 -0600

    [HUDI-7149] Add a dbt example project with CDC capability (apache#10192)

commit 817d81a
Author: zhuanshenbsj1 <34104400+zhuanshenbsj1@users.noreply.github.com>
Date:   Wed Nov 29 11:46:20 2023 +0800

    [MINOR] Add log to print wrong number of instant metadata files (apache#10196)

commit cadeade
Author: leixin <1403342953@qq.com>
Date:   Wed Nov 29 11:45:24 2023 +0800

    [minor] when metric prefix length is 0 ignore the metric prefix (apache#10190)

    Co-authored-by: leixin1 <leixin1@jd.com>

commit 91daa7d
Author: Lin Liu <141371752+linliu-code@users.noreply.github.com>
Date:   Tue Nov 28 19:03:50 2023 -0800

    [HUDI-7102] Fix bugs related to time travel queries (apache#10102)

commit d1dfa5b
Author: Dongsj <90449228+eric9204@users.noreply.github.com>
Date:   Wed Nov 29 10:49:38 2023 +0800

    [HUDI-7148] Add an additional fix to the potential thread insecurity problem of heartbeat client (apache#10188)

    Co-authored-by: dongsj <dongsj@asiainfo.com>

commit b0b711e
Author: Jonathan Vexler <=>
Date:   Tue Nov 28 21:35:20 2023 -0500

    nested schema kinda fix

commit 77cfb3a
Author: YueZhang <69956021+zhangyue19921010@users.noreply.github.com>
Date:   Wed Nov 29 09:46:53 2023 +0800

    [HUDI-7147] Fix CDC write flush bug (apache#10186)

    * Using iterator instead of values to avoid unsupported operation exception

    * check style

commit b144ee0
Author: Jon Vexler <jbvexler@gmail.com>
Date:   Tue Nov 28 14:23:46 2023 -0500

    Update hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/HoodieFileGroupReaderBasedParquetFileFormat.scala

    Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>

commit 89fab14
Author: Jonathan Vexler <=>
Date:   Tue Nov 28 14:23:03 2023 -0500

    fix failing tests and address some of sagar pr review

commit 675abf1
Author: Tim Brown <tim@onehouse.ai>
Date:   Mon Nov 27 23:21:56 2023 -0600

    [MINOR] Schema Converter should use default identity transform if not specified (apache#10178)

commit 5450aff
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 22:21:06 2023 -0500

    disable vector for bootstrap

commit fb062df
Author: Danny Chan <yuzhao.cyz@gmail.com>
Date:   Tue Nov 28 10:52:33 2023 +0800

    [Minor] Fix the flaky tests in TestRemoteHoodieTableFileSystemView (apache#10179)

commit 3ae4d30
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 21:07:17 2023 -0500

    fix various issues that caused failing tests

commit a045da6
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 18:00:46 2023 -0500

    see if this works

commit 91be81a
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 17:07:30 2023 -0500

    use java to create unary operator

commit c22d1db
Merge: 38b2603 4c3a1db
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 15:56:39 2023 -0500

    Merge branch 'master' into fg_reader_implement_bootstrap

commit 38b2603
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 15:42:22 2023 -0500

    set precombine in test

commit 2a9a363
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 13:27:38 2023 -0500

    try to fix scala2.11 unary operator issue

commit 60bdf14
Author: Jonathan Vexler <=>
Date:   Mon Nov 27 13:02:16 2023 -0500

    try fix ci

commit 4c3a1db
Author: majian <47964462+majian1998@users.noreply.github.com>
Date:   Mon Nov 27 16:44:25 2023 +0800

    [HUDI-7110][FOLLOW-UP] Improve call procedure for show column stats information (apache#10169)

commit 499423c
Author: zhuanshenbsj1 <34104400+zhuanshenbsj1@users.noreply.github.com>
Date:   Sun Nov 26 10:13:46 2023 +0800

    [HUDI-7041] Optimize the memory usage of timeline server for table service (apache#10002)

commit 4f875ed
Author: Y Ethan Guo <ethan.guoyihua@gmail.com>
Date:   Sat Nov 25 15:10:37 2023 -0800

    [HUDI-7139] Fix operation type for bulk insert with row writer in Hudi Streamer (apache#10175)

    This commit fixes the bug which causes the `operationType` to be null in the commit metadata of bulk insert operation with row writer enabled in Hudi Streamer (`hoodie.datasource.write.row.writer.enable=true`).  `HoodieStreamerDatasetBulkInsertCommitActionExecutor` is updated so that `#preExecute` and `#afterExecute` should run the same logic as regular bulk insert operation without row writer.

commit 332e7e8
Author: harshal <harshal.j.patil@gmail.com>
Date:   Sat Nov 25 14:04:29 2023 +0530

    [HUDI-7006] Reduce unnecessary is_empty rdd calls in StreamSync (apache#10158)

    ---------

    Co-authored-by: sivabalan <n.siva.b@gmail.com>

commit 86232d2
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Thu Nov 23 19:27:50 2023 -0800

    [HUDI-7095] Making perf enhancements to JSON serde (apache#10097)

commit a7fd27c
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Thu Nov 23 19:20:01 2023 -0800

    [HUDI-7086] Scaling gcs event source (apache#10073)

    -  Scaling gcs event source

    ---------

    Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>

commit bb42c4b
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Thu Nov 23 18:33:32 2023 -0800

    [HUDI-7097] Fix instantiation of Hms Uri with HiveSync tool (apache#10099)

commit 0b7f47a
Author: Jonathan Vexler <=>
Date:   Thu Nov 23 16:27:36 2023 -0500

    decently working

commit bcb974b
Author: VitoMakarevich <vitaliy.makarevich.work@gmail.com>
Date:   Thu Nov 23 11:22:14 2023 +0100

    [HUDI-7034] Fix refresh table/view (apache#10151)

    * [HUDI-7034] Refresh index fix - remove cached file slices within partitions

    ---------

    Co-authored-by: vmakarevich <vitali.makarevich@instructure.com>
    Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>

commit b77eff2
Author: Lokesh Jain <ljain@apache.org>
Date:   Thu Nov 23 10:47:40 2023 +0530

    [HUDI-7120] Performance improvements in deltastreamer executor code path (apache#10135)

commit 405be17
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Wed Nov 22 21:00:33 2023 -0800

    [MINOR] Making misc fixes to deltastreamer sources(S3 and GCS) (apache#10095)

    * Making misc fixes to deltastreamer sources

    * Fixing test failures

    * adding inference to CloudSourceconfig... cloud.data.datafile.format

    * Fix the tests for s3 events source

    * Fix the tests for s3 events source

    ---------

    Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>

commit 3d21285
Author: Tim Brown <tim@onehouse.ai>
Date:   Wed Nov 22 22:51:14 2023 -0600

    [HUDI-7112] Reuse existing timeline server and performance improvements (apache#10122)

    - Reuse timeline server across tables.

    ---------

    Co-authored-by: sivabalan <n.siva.b@gmail.com>

commit 72ff9a7
Author: Rajesh Mahindra <76502047+rmahindra123@users.noreply.github.com>
Date:   Wed Nov 22 20:49:15 2023 -0800

    [HUDI-7052] Fix partition key validation for custom key generators. (apache#10014)

    ---------

    Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>

commit 8d6d043
Author: majian <47964462+majian1998@users.noreply.github.com>
Date:   Thu Nov 23 10:08:17 2023 +0800

    [HUDI-7110] Add call procedure for show column stats information (apache#10120)

commit aabaa99
Author: huangxiaoping <1754789345@qq.com>
Date:   Thu Nov 23 09:06:45 2023 +0800

    [MINOR] Remove unused import (apache#10159)

commit f88a73f
Author: Y Ethan Guo <ethan.guoyihua@gmail.com>
Date:   Wed Nov 22 10:48:48 2023 -0800

    [HUDI-7123] Improve CI scripts (apache#10136)

    Improves the CI scripts in the following aspects:
    - Removes `hudi-common` tests from `test-spark` job in GH CI as they are already covered by Azure CI
    - Removes unnecesary bundle validation jobs and adds new bundle validation images (`flink1153hive313spark323`, `flink1162hive313spark331`)
    - Updates `validate-release-candidate-bundles` jobs
    - Moves functional tests of `hudi-spark-datasource/hudi-spark` from job 4 (3 hours) to job 2 (1 hour) in Azure CI to rebalance the finish time.

commit 38c87b7
Author: harshal <harshal.j.patil@gmail.com>
Date:   Wed Nov 22 20:53:42 2023 +0530

    [HUDI-7004] Add support of snapshotLoadQuerySplitter in s3/gcs sources (apache#10152)

commit d0edfb5
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Wed Nov 22 10:22:53 2023 -0500

    [HUDI-6961] Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custome delete marker (apache#10150)

    - Fixing DefaultHoodieRecordPayload to honor deletion based on meta field as well as custom delete marker across all delete apis

commit cda9dbc
Author: Jing Zhang <beyond1920@gmail.com>
Date:   Wed Nov 22 18:04:39 2023 +0800

    [HUDI-7129] Fix bug when upgrade from table version three using UpgradeOrDowngradeProcedure (apache#10147)

commit 18f7181
Author: Shiyan Xu <2701446+xushiyan@users.noreply.github.com>
Date:   Wed Nov 22 02:00:27 2023 -0600

    [HUDI-7133] Improve dbt example for better guidance (apache#10155)

commit c5af85d
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Wed Nov 22 01:33:49 2023 -0500

    [HUDI-7096] Improving incremental query to fetch partitions based on commit metadata (apache#10098)

commit 2522f6d
Author: xuzifu666 <xuyu@zepp.com>
Date:   Wed Nov 22 11:53:21 2023 +0800

    [HUDI-7128] DeleteMarkerProcedures support delete in batch mode (apache#10148)

    Co-authored-by: xuyu <11161569@vivo.com>

commit a1afcdd
Author: Tim Brown <tim@onehouse.ai>
Date:   Tue Nov 21 14:58:12 2023 -0600

    [HUDI-7115] Add in new options for the bigquery sync (apache#10125)

    - Add in new options for the bigquery sync

commit 35cd873
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Tue Nov 21 13:11:21 2023 -0500

    [HUDI-7084] Fixing schema retrieval for table w/ no commits (apache#10069)

    * Fixing schema retrieval for table w/ no commits

    * fixing compilation failure

commit 74793d5
Author: Rajesh Mahindra <76502047+rmahindra123@users.noreply.github.com>
Date:   Tue Nov 21 09:53:12 2023 -0800

    [HUDI-7106] Fix sqs deletes, deltasync service close and error table default configs. (apache#10117)

    Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>

commit b981877
Author: harshal <harshal.j.patil@gmail.com>
Date:   Tue Nov 21 22:52:28 2023 +0530

    [HUDI-7003] Add option to fallback to full table scan if files are deleted due to cleaner (apache#9941)

commit 600fd4d
Author: Akira Ajisaka <akiraaj@amazon.com>
Date:   Wed Nov 22 01:24:37 2023 +0900

    [HUDI-6734] Add back HUDI-5409: Avoid file index and use fs view cache in COW input format (apache#9567)

    * [HUDI-6734] Add back HUDI-5409: Avoid file index and use fs view cache in COW input format

    This reverts commit 2567ada.

     Conflicts:
    	hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/HoodieCopyOnWriteTableInputFormat.java
    	hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieMergeOnReadTableInputFormat.java

    * Always use file index if files partition is available

    ---------

    Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>

commit 9e2500c
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Tue Nov 21 09:55:23 2023 -0500

    [HUDI-7083] Adding support for multiple tables with Prometheus Reporter (apache#10068)

    * Adding support for multiple tables with Prometheus Reporter

    * Fixing closure of http server

    * Remove entry from port-collector registry map after stopping http server

    ---------

    Co-authored-by: Sagar Sumit <sagarsumit09@gmail.com>

commit baffe1d
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Tue Nov 21 09:32:39 2023 -0500

    [MINOR] Misc fixes in deltastreamer (apache#10067)

commit 0c4f3a3
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Tue Nov 21 02:17:13 2023 -0500

    [HUDI-7127] Fixing set up and tear down in tests (apache#10146)

commit eaba114
Author: Akira Ajisaka <akiraaj@amazon.com>
Date:   Tue Nov 21 11:37:47 2023 +0900

    [HUDI-7107] Reused MetricsReporter fails to publish metrics in Spark streaming job (apache#10132)

commit 578e756
Author: Jing Zhang <beyond1920@gmail.com>
Date:   Tue Nov 21 10:04:33 2023 +0800

    [HUDI-7118] Set conf 'spark.sql.parquet.enableVectorizedReader' to true automatically only if the value is not explicitly set (apache#10134)

commit d24220a
Author: Jing Zhang <beyond1920@gmail.com>
Date:   Tue Nov 21 09:56:07 2023 +0800

    [HUDI-7111] Fix performance regression of tag when written into simple bucket index table (apache#10130)

commit 84990ae
Author: Rajesh Mahindra <76502047+rmahindra123@users.noreply.github.com>
Date:   Mon Nov 20 11:17:45 2023 -0800

    Fix schema refresh for KafkaAvroSchemaDeserializer (apache#10118)

    Co-authored-by: rmahindra123 <rmahindra@Rajeshs-MacBook-Pro.local>

commit 979132b
Author: majian <47964462+majian1998@users.noreply.github.com>
Date:   Mon Nov 20 10:43:11 2023 +0800

    [HUDI-7099] Providing metrics for archive and defining some string constants (apache#10101)

commit 3225625
Author: Fabio Buso <dev.siroibaf@gmail.com>
Date:   Mon Nov 20 03:19:41 2023 +0100

    [MINOR] Add Hopsworks File System to StorageSchemes (apache#10141)

commit 3913dca
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Sat Nov 18 23:50:37 2023 -0500

    [HUDI-7098] Add max bytes per partition with cloud stores source in DS (apache#10100)

commit 4c295b2
Author: hehuiyuan <471627698@qq.com>
Date:   Sun Nov 19 09:43:52 2023 +0800

    [HUDI-7119] Don't write precombine field to hoodie.properties when the ts field does not exist for append mode (apache#10133)

commit b2f4493
Author: Jing Zhang <beyond1920@gmail.com>
Date:   Sun Nov 19 09:35:54 2023 +0800

    [HUDI-7072] Remove support for Flink 1.13 (apache#10052)

commit dfe1674
Author: Sagar Lakshmipathy <18vidhyasagar@gmail.com>
Date:   Fri Nov 17 18:43:07 2023 -0800

    [Minor] Fixed twitter link to redirect to twitter (apache#10139)

commit f58d9cb
Author: Jonathan Vexler <=>
Date:   Fri Nov 17 18:10:00 2023 -0500

    current point

commit 184858b
Author: Jonathan Vexler <=>
Date:   Fri Nov 17 16:21:56 2023 -0500

    non-working. Want to review with team that this makes sense

commit 8240b6a
Author: Y Ethan Guo <ethan.guoyihua@gmail.com>
Date:   Fri Nov 17 11:20:57 2023 -0800

    [HUDI-7113] Update release scripts and docs for Spark 3.5 support (apache#10123)

commit 216aeb4
Author: Danny Chan <yuzhao.cyz@gmail.com>
Date:   Fri Nov 17 14:35:17 2023 +0800

    [HUDI-7116] Add docker image for flink 1.14 and spark 2.4.8 (apache#10126)

commit 3d0c450
Author: YueZhang <69956021+zhangyue19921010@users.noreply.github.com>
Date:   Fri Nov 17 09:48:59 2023 +0800

    [HUDI-7109] Fix Flink may re-use a committed instant in append mode (apache#10119)

commit f06ff5b
Author: hehuiyuan <471627698@qq.com>
Date:   Fri Nov 17 09:43:21 2023 +0800

    [HUDI-7090] Set the maxParallelism for singleton operator  (apache#10090)

commit faa73e9
Author: Y Ethan Guo <ethan.guoyihua@gmail.com>
Date:   Thu Nov 16 12:12:22 2023 -0800

    [MINOR] Disable failed test on master (apache#10124)

commit 6cc39bf
Author: Sivabalan Narayanan <n.siva.b@gmail.com>
Date:   Thu Nov 16 06:00:54 2023 -0500

    [MINOR] Removing unnecessary guards to row writer (apache#10004)

commit 4ea752f
Author: voonhous <voonhousu@gmail.com>
Date:   Thu Nov 16 16:53:28 2023 +0800

    [MINOR] Modified description to include missing trigger strategy (apache#10114)

commit 874b5de
Author: Shawn Chang <42792772+CTTY@users.noreply.github.com>
Date:   Wed Nov 15 21:57:14 2023 -0800

    [HUDI-6806] Support Spark 3.5.0 (apache#9717)

    ---------

    Co-authored-by: Shawn Chang <yxchang@amazon.com>
    Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>

commit 35af64d
Author: Shawn Chang <42792772+CTTY@users.noreply.github.com>
Date:   Wed Nov 15 18:36:42 2023 -0800

    [Minor] Throw exceptions when cleaner/compactor fail (apache#10108)

    Co-authored-by: Shawn Chang <yxchang@amazon.com>

commit bada5d9
Author: Shawn Chang <42792772+CTTY@users.noreply.github.com>
Date:   Wed Nov 15 16:50:38 2023 -0800

    [HUDI-5936] Fix serialization problem when FileStatus is not serializable (apache#10065)

    Co-authored-by: Shawn Chang <yxchang@amazon.com>

commit dcd5a81
Author: majian <47964462+majian1998@users.noreply.github.com>
Date:   Wed Nov 15 16:10:15 2023 +0800

    [HUDI-7069] Optimize metaclient construction and include table config options (apache#10048)

commit f218e54
Author: Jing Zhang <beyond1920@gmail.com>
Date:   Wed Nov 15 16:07:04 2023 +0800

    [MINOR] Add detailed error logs in RunCompactionProcedure (apache#10070)

    * add detailed error logs in RunCompactionProcedure
    * only print 100 error file paths into logs

commit 2185abb
Author: Jing Zhang <beyond1920@gmail.com>
Date:   Wed Nov 15 16:03:23 2023 +0800

    [HUDI-7094] AlterTableAddColumnCommand/AlterTableChangeColumnCommand update table with ro/rt suffix (apache#10094)

commit abd3afc
Author: Hussein Awala <hussein@awala.fr>
Date:   Wed Nov 15 06:55:47 2023 +0200

    [HUDI-6695] Use the AWS provider chain in Glue sync and add a new provider for STS assume role (apache#9260)

commit 424e0ce
Author: chao chen <59957056+waywtdcc@users.noreply.github.com>
Date:   Wed Nov 15 12:20:10 2023 +0800

    [HUDI-7050] Flink HoodieHiveCatalog supports hadoop parameters (apache#10013)

commit 19b3e7f
Author: leixin <1403342953@qq.com>
Date:   Wed Nov 15 09:24:29 2023 +0800

    [Minor] Throws an exception when using bulk_insert and stream mode (apache#10082)

    Co-authored-by: leixin1 <leixin1@jd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

[SUPPORT] hudi-examples-dbt not running with spark thrift server

3 participants