Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[datalake](hudi) add hudi docker compose to run hudi examples #37451

Merged
merged 4 commits into from
Jul 9, 2024

Conversation

AshinGau
Copy link
Member

@AshinGau AshinGau commented Jul 8, 2024

Proposed changes

Doris+Hudi+MINIO Environments:
Launch spark/doris/hive/hudi/minio test environments, and give examples to query hudi in Doris.

Launch Docker Compose

Create Network

sudo docker network create -d bridge hudi-net

Launch all components in docker

sudo ./start-hudi-compose.sh

Login into Spark

sudo ./login-spark.sh

Login into Doris

sudo ./login-doris.sh

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

Copy link
Contributor

github-actions bot commented Jul 8, 2024

sh-checker report

To get the full details, please check in the job output.

shellcheck errors

'shellcheck ' returned error 1 finding the following syntactical issues:

----------

In samples/datalake/hudi/scripts/spark-hudi.sh line 7:
if [ ! -d "$SPARK_HOME" ]; then
   ^--------------------^ SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.
           ^---------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
if [[ ! -d "${SPARK_HOME}" ]]; then


In samples/datalake/hudi/scripts/spark-hudi.sh line 8:
  cp -r /opt/spark-3.4.2-bin-hadoop3 $SPARK_HOME
                                     ^---------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                                     ^---------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  cp -r /opt/spark-3.4.2-bin-hadoop3 "${SPARK_HOME}"


In samples/datalake/hudi/scripts/spark-hudi.sh line 11:
cp ${HIVE_HOME}/conf/hive-site.xml ${SPARK_HOME}/conf/
   ^----------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                                   ^-----------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.

Did you mean: 
cp "${HIVE_HOME}"/conf/hive-site.xml "${SPARK_HOME}"/conf/


In samples/datalake/hudi/scripts/spark-hudi.sh line 12:
cp ${HIVE_HOME}/lib/postgresql-jdbc.jar ${SPARK_HOME}/jars/
   ^----------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                                        ^-----------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.

Did you mean: 
cp "${HIVE_HOME}"/lib/postgresql-jdbc.jar "${SPARK_HOME}"/jars/


In samples/datalake/hudi/scripts/spark-hudi.sh line 13:
cp ${HADOOP_HOME}/etc/hadoop/core-site.xml ${SPARK_HOME}/conf/
   ^------------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                                           ^-----------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.

Did you mean: 
cp "${HADOOP_HOME}"/etc/hadoop/core-site.xml "${SPARK_HOME}"/conf/


In samples/datalake/hudi/scripts/spark-hudi.sh line 15:
${SPARK_HOME}/bin/spark-sql \
^-----------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.

Did you mean: 
"${SPARK_HOME}"/bin/spark-sql \


In samples/datalake/hudi/start-hudi-compose.sh line 20:
  echo "Download $FILE_PATH"
                 ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  echo "Download ${FILE_PATH}"


In samples/datalake/hudi/start-hudi-compose.sh line 22:
  if [ -f "$FILE_PATH" ]; then
     ^-----------------^ SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.
           ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  if [[ -f "${FILE_PATH}" ]]; then


In samples/datalake/hudi/start-hudi-compose.sh line 24:
    FILE_MD5=$(md5sum "$FILE_PATH" | awk '{ print $1 }')
                       ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    FILE_MD5=$(md5sum "${FILE_PATH}" | awk '{ print $1 }')


In samples/datalake/hudi/start-hudi-compose.sh line 26:
    if [ "$FILE_MD5" = "$EXPECTED_MD5" ]; then
       ^-- SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.
          ^-------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                        ^-----------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    if [[ "${FILE_MD5}" = "${EXPECTED_MD5}" ]]; then


In samples/datalake/hudi/start-hudi-compose.sh line 27:
      echo "$FILE_PATH is ready!"
            ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
      echo "${FILE_PATH} is ready!"


In samples/datalake/hudi/start-hudi-compose.sh line 29:
      echo "$FILE_PATH is broken, Redownloading ..."
            ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
      echo "${FILE_PATH} is broken, Redownloading ..."


In samples/datalake/hudi/start-hudi-compose.sh line 30:
      rm $FILE_PATH
         ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
         ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
      rm "${FILE_PATH}"


In samples/datalake/hudi/start-hudi-compose.sh line 31:
      wget ${DOWNLOAD_URL}/${FILE_PATH}
           ^-------------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                           ^----------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean: 
      wget "${DOWNLOAD_URL}"/"${FILE_PATH}"


In samples/datalake/hudi/start-hudi-compose.sh line 34:
    echo "Downloading $FILE_PATH ..."
                      ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    echo "Downloading ${FILE_PATH} ..."


In samples/datalake/hudi/start-hudi-compose.sh line 35:
    wget ${DOWNLOAD_URL}/${FILE_PATH}
         ^-------------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                         ^----------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean: 
    wget "${DOWNLOAD_URL}"/"${FILE_PATH}"


In samples/datalake/hudi/start-hudi-compose.sh line 41:
cd ${curdir}
^----------^ SC2164 (warning): Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
   ^-------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean: 
cd "${curdir}" || exit


In samples/datalake/hudi/start-hudi-compose.sh line 43:
if [ ! -d "packages" ]; then
   ^-----------------^ SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.

Did you mean: 
if [[ ! -d "packages" ]]; then


In samples/datalake/hudi/start-hudi-compose.sh line 46:
cd packages
^---------^ SC2164 (warning): Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

Did you mean: 
cd packages || exit


In samples/datalake/hudi/start-hudi-compose.sh line 48:
download_source_file "aws-java-sdk-bundle-1.12.48.jar" "$md5_aws_java_sdk" "https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.12.48"
                                                        ^---------------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
download_source_file "aws-java-sdk-bundle-1.12.48.jar" "${md5_aws_java_sdk}" "https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.12.48"


In samples/datalake/hudi/start-hudi-compose.sh line 49:
download_source_file "hadoop-aws-3.3.1.jar" "$md5_hadoop_aws" "https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.1"
                                             ^-------------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
download_source_file "hadoop-aws-3.3.1.jar" "${md5_hadoop_aws}" "https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.1"


In samples/datalake/hudi/start-hudi-compose.sh line 50:
download_source_file "hudi-spark3.4-bundle_2.12-0.14.1.jar" "$md5_hudi_bundle" "https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.4-bundle_2.12/0.14.1"
                                                             ^--------------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
download_source_file "hudi-spark3.4-bundle_2.12-0.14.1.jar" "${md5_hudi_bundle}" "https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.4-bundle_2.12/0.14.1"


In samples/datalake/hudi/start-hudi-compose.sh line 51:
download_source_file "openjdk-17.0.2_linux-x64_bin.tar.gz" "$md5_jdk17" "https://download.java.net/java/GA/jdk17.0.2/dfd4a8d0985749f896bed50d7138ee7f/8/GPL"
                                                            ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
download_source_file "openjdk-17.0.2_linux-x64_bin.tar.gz" "${md5_jdk17}" "https://download.java.net/java/GA/jdk17.0.2/dfd4a8d0985749f896bed50d7138ee7f/8/GPL"


In samples/datalake/hudi/start-hudi-compose.sh line 52:
download_source_file "spark-3.4.2-bin-hadoop3.tgz" "$md5_spark" "https://archive.apache.org/dist/spark/spark-3.4.2"
                                                    ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
download_source_file "spark-3.4.2-bin-hadoop3.tgz" "${md5_spark}" "https://archive.apache.org/dist/spark/spark-3.4.2"


In samples/datalake/hudi/start-hudi-compose.sh line 53:
download_source_file "${DORIS_PACKAGE}.tar.gz" "$md5_doris" "$DORIS_DOWNLOAD_URL"
                                                ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                                             ^-----------------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
download_source_file "${DORIS_PACKAGE}.tar.gz" "${md5_doris}" "${DORIS_DOWNLOAD_URL}"


In samples/datalake/hudi/start-hudi-compose.sh line 55:
if [ ! -f "jdk-17.0.2/SUCCESS" ]; then
   ^---------------------------^ SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.

Did you mean: 
if [[ ! -f "jdk-17.0.2/SUCCESS" ]]; then


In samples/datalake/hudi/start-hudi-compose.sh line 57:
  if [ -d "jdk-17.0.2" ]; then
     ^-----------------^ SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.

Did you mean: 
  if [[ -d "jdk-17.0.2" ]]; then


In samples/datalake/hudi/start-hudi-compose.sh line 65:
if [ ! -f "spark-3.4.2-bin-hadoop3/SUCCESS" ]; then
   ^-- SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.

Did you mean: 
if [[ ! -f "spark-3.4.2-bin-hadoop3/SUCCESS" ]]; then


In samples/datalake/hudi/start-hudi-compose.sh line 67:
  if [ -d "spark-3.4.2-bin-hadoop3" ]; then
     ^-- SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.

Did you mean: 
  if [[ -d "spark-3.4.2-bin-hadoop3" ]]; then


In samples/datalake/hudi/start-hudi-compose.sh line 78:
if [ ! -f "doris-bin/SUCCESS" ]; then
   ^--------------------------^ SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.

Did you mean: 
if [[ ! -f "doris-bin/SUCCESS" ]]; then


In samples/datalake/hudi/start-hudi-compose.sh line 79:
  echo "Prepare $DORIS_PACKAGE environment"
                ^------------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  echo "Prepare ${DORIS_PACKAGE} environment"


In samples/datalake/hudi/start-hudi-compose.sh line 80:
  if [ -d "doris-bin" ]; then
     ^----------------^ SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.

Did you mean: 
  if [[ -d "doris-bin" ]]; then


In samples/datalake/hudi/start-hudi-compose.sh line 81:
    echo "Remove broken $DORIS_PACKAGE"
                        ^------------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    echo "Remove broken ${DORIS_PACKAGE}"


In samples/datalake/hudi/start-hudi-compose.sh line 84:
  echo "Unpackage $DORIS_PACKAGE"
                  ^------------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  echo "Unpackage ${DORIS_PACKAGE}"


In samples/datalake/hudi/start-hudi-compose.sh line 85:
  tar xzf ${DORIS_PACKAGE}.tar.gz
          ^--------------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.

Did you mean: 
  tar xzf "${DORIS_PACKAGE}".tar.gz


In samples/datalake/hudi/start-hudi-compose.sh line 86:
  mv ${DORIS_PACKAGE} doris-bin
     ^--------------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.

Did you mean: 
  mv "${DORIS_PACKAGE}" doris-bin

For more information:
  https://www.shellcheck.net/wiki/SC2164 -- Use 'cd ... || exit' or 'cd ... |...
  https://www.shellcheck.net/wiki/SC2086 -- Double quote to prevent globbing ...
  https://www.shellcheck.net/wiki/SC2248 -- Prefer double quoting even when v...
----------

You can address the above issues in one of three ways:
1. Manually correct the issue in the offending shell script;
2. Disable specific issues by adding the comment:
  # shellcheck disable=NNNN
above the line that contains the issue, where NNNN is the error code;
3. Add '-e NNNN' to the SHELLCHECK_OPTS setting in your .yml action file.



shfmt errors

'shfmt ' returned error 1 finding the following formatting issues:

----------
--- samples/datalake/hudi/scripts/spark-hudi.sh.orig
+++ samples/datalake/hudi/scripts/spark-hudi.sh
@@ -5,7 +5,7 @@
 export HADOOP_HOME=/opt/hadoop-3.3.1
 
 if [ ! -d "$SPARK_HOME" ]; then
-  cp -r /opt/spark-3.4.2-bin-hadoop3 $SPARK_HOME
+    cp -r /opt/spark-3.4.2-bin-hadoop3 $SPARK_HOME
 fi
 
 cp ${HIVE_HOME}/conf/hive-site.xml ${SPARK_HOME}/conf/
@@ -13,9 +13,9 @@
 cp ${HADOOP_HOME}/etc/hadoop/core-site.xml ${SPARK_HOME}/conf/
 
 ${SPARK_HOME}/bin/spark-sql \
-  --master local[*] \
-  --name "spark-hudi-sql" \
-  --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
-  --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog \
-  --conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension \
-  --conf spark.sql.catalogImplementation=hive
+    --master local[*] \
+    --name "spark-hudi-sql" \
+    --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
+    --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog \
+    --conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension \
+    --conf spark.sql.catalogImplementation=hive
--- samples/datalake/hudi/start-hudi-compose.sh.orig
+++ samples/datalake/hudi/start-hudi-compose.sh
@@ -3,7 +3,6 @@
 DORIS_PACKAGE=apache-doris-2.1.4-bin-x64
 DORIS_DOWNLOAD_URL=https://apache-doris-releases.oss-accelerate.aliyuncs.com
 
-
 md5_aws_java_sdk="452d1e00efb11bff0ee17c42a6a44a0a"
 md5_hadoop_aws="a3e19d42cadd1a6862a41fd276f94382"
 md5_hudi_bundle="a9cb8c752d1d7132ef3cfe3ead78a30d"
@@ -11,37 +10,35 @@
 md5_spark="b393d314ffbc03facdc85575197c5db9"
 md5_doris="a4d8bc9730aca3a51294e87d7d5b3e8e"
 
-
 download_source_file() {
-  local FILE_PATH="$1"
-  local EXPECTED_MD5="$2"
-  local DOWNLOAD_URL="$3"
+    local FILE_PATH="$1"
+    local EXPECTED_MD5="$2"
+    local DOWNLOAD_URL="$3"
 
-  echo "Download $FILE_PATH"
+    echo "Download $FILE_PATH"
 
-  if [ -f "$FILE_PATH" ]; then
-    local FILE_MD5
-    FILE_MD5=$(md5sum "$FILE_PATH" | awk '{ print $1 }')
+    if [ -f "$FILE_PATH" ]; then
+        local FILE_MD5
+        FILE_MD5=$(md5sum "$FILE_PATH" | awk '{ print $1 }')
 
-    if [ "$FILE_MD5" = "$EXPECTED_MD5" ]; then
-      echo "$FILE_PATH is ready!"
+        if [ "$FILE_MD5" = "$EXPECTED_MD5" ]; then
+            echo "$FILE_PATH is ready!"
+        else
+            echo "$FILE_PATH is broken, Redownloading ..."
+            rm $FILE_PATH
+            wget ${DOWNLOAD_URL}/${FILE_PATH}
+        fi
     else
-      echo "$FILE_PATH is broken, Redownloading ..."
-      rm $FILE_PATH
-      wget ${DOWNLOAD_URL}/${FILE_PATH}
+        echo "Downloading $FILE_PATH ..."
+        wget ${DOWNLOAD_URL}/${FILE_PATH}
     fi
-  else
-    echo "Downloading $FILE_PATH ..."
-    wget ${DOWNLOAD_URL}/${FILE_PATH}
-  fi
 }
 
-
 curdir="$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd)"
 cd ${curdir}
 
 if [ ! -d "packages" ]; then
-  mkdir packages
+    mkdir packages
 fi
 cd packages
 
@@ -53,38 +50,38 @@
 download_source_file "${DORIS_PACKAGE}.tar.gz" "$md5_doris" "$DORIS_DOWNLOAD_URL"
 
 if [ ! -f "jdk-17.0.2/SUCCESS" ]; then
-  echo "Prepare jdk17 environment"
-  if [ -d "jdk-17.0.2" ]; then
-    echo "Remove broken jdk-17.0.2"
-    rm -rf jdk-17.0.2
-  fi
-  echo "Unpackage jdk-17.0.2"
-  tar xzf openjdk-17.0.2_linux-x64_bin.tar.gz
-  touch jdk-17.0.2/SUCCESS
+    echo "Prepare jdk17 environment"
+    if [ -d "jdk-17.0.2" ]; then
+        echo "Remove broken jdk-17.0.2"
+        rm -rf jdk-17.0.2
+    fi
+    echo "Unpackage jdk-17.0.2"
+    tar xzf openjdk-17.0.2_linux-x64_bin.tar.gz
+    touch jdk-17.0.2/SUCCESS
 fi
 if [ ! -f "spark-3.4.2-bin-hadoop3/SUCCESS" ]; then
-  echo "Prepare spark3.4 environment"
-  if [ -d "spark-3.4.2-bin-hadoop3" ]; then
-    echo "Remove broken spark-3.4.2-bin-hadoop3"
-    rm -rf spark-3.4.2-bin-hadoop3
-  fi
-  echo "Unpackage spark-3.4.2-bin-hadoop3"
-  tar -xf spark-3.4.2-bin-hadoop3.tgz
-  cp aws-java-sdk-bundle-1.12.48.jar spark-3.4.2-bin-hadoop3/jars/
-  cp hadoop-aws-3.3.1.jar spark-3.4.2-bin-hadoop3/jars/
-  cp hudi-spark3.4-bundle_2.12-0.14.1.jar spark-3.4.2-bin-hadoop3/jars/
-  touch spark-3.4.2-bin-hadoop3/SUCCESS
+    echo "Prepare spark3.4 environment"
+    if [ -d "spark-3.4.2-bin-hadoop3" ]; then
+        echo "Remove broken spark-3.4.2-bin-hadoop3"
+        rm -rf spark-3.4.2-bin-hadoop3
+    fi
+    echo "Unpackage spark-3.4.2-bin-hadoop3"
+    tar -xf spark-3.4.2-bin-hadoop3.tgz
+    cp aws-java-sdk-bundle-1.12.48.jar spark-3.4.2-bin-hadoop3/jars/
+    cp hadoop-aws-3.3.1.jar spark-3.4.2-bin-hadoop3/jars/
+    cp hudi-spark3.4-bundle_2.12-0.14.1.jar spark-3.4.2-bin-hadoop3/jars/
+    touch spark-3.4.2-bin-hadoop3/SUCCESS
 fi
 if [ ! -f "doris-bin/SUCCESS" ]; then
-  echo "Prepare $DORIS_PACKAGE environment"
-  if [ -d "doris-bin" ]; then
-    echo "Remove broken $DORIS_PACKAGE"
-    rm -rf doris-bin
-  fi
-  echo "Unpackage $DORIS_PACKAGE"
-  tar xzf ${DORIS_PACKAGE}.tar.gz
-  mv ${DORIS_PACKAGE} doris-bin
-  touch doris-bin/SUCCESS
+    echo "Prepare $DORIS_PACKAGE environment"
+    if [ -d "doris-bin" ]; then
+        echo "Remove broken $DORIS_PACKAGE"
+        rm -rf doris-bin
+    fi
+    echo "Unpackage $DORIS_PACKAGE"
+    tar xzf ${DORIS_PACKAGE}.tar.gz
+    mv ${DORIS_PACKAGE} doris-bin
+    touch doris-bin/SUCCESS
 fi
 
 cd ../
@@ -102,5 +99,3 @@
 echo "./login-spark.sh to login into spark"
 echo "./login-doris.sh to login into doris"
 echo "======================================================"
-
-
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename


@AshinGau AshinGau marked this pull request as ready for review July 8, 2024 06:51
Copy link
Contributor

github-actions bot commented Jul 8, 2024

sh-checker report

To get the full details, please check in the job output.

shellcheck errors

'shellcheck ' returned error 1 finding the following syntactical issues:

----------

In samples/datalake/hudi/scripts/spark-hudi.sh line 24:
if [ ! -d "$SPARK_HOME" ]; then
   ^--------------------^ SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.
           ^---------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
if [[ ! -d "${SPARK_HOME}" ]]; then


In samples/datalake/hudi/scripts/spark-hudi.sh line 25:
  cp -r /opt/spark-3.4.2-bin-hadoop3 $SPARK_HOME
                                     ^---------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                                     ^---------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  cp -r /opt/spark-3.4.2-bin-hadoop3 "${SPARK_HOME}"


In samples/datalake/hudi/scripts/spark-hudi.sh line 28:
cp ${HIVE_HOME}/conf/hive-site.xml ${SPARK_HOME}/conf/
   ^----------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                                   ^-----------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.

Did you mean: 
cp "${HIVE_HOME}"/conf/hive-site.xml "${SPARK_HOME}"/conf/


In samples/datalake/hudi/scripts/spark-hudi.sh line 29:
cp ${HIVE_HOME}/lib/postgresql-jdbc.jar ${SPARK_HOME}/jars/
   ^----------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                                        ^-----------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.

Did you mean: 
cp "${HIVE_HOME}"/lib/postgresql-jdbc.jar "${SPARK_HOME}"/jars/


In samples/datalake/hudi/scripts/spark-hudi.sh line 30:
cp ${HADOOP_HOME}/etc/hadoop/core-site.xml ${SPARK_HOME}/conf/
   ^------------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                                           ^-----------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.

Did you mean: 
cp "${HADOOP_HOME}"/etc/hadoop/core-site.xml "${SPARK_HOME}"/conf/


In samples/datalake/hudi/scripts/spark-hudi.sh line 32:
${SPARK_HOME}/bin/spark-sql \
^-----------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.

Did you mean: 
"${SPARK_HOME}"/bin/spark-sql \


In samples/datalake/hudi/start-hudi-compose.sh line 37:
  echo "Download $FILE_PATH"
                 ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  echo "Download ${FILE_PATH}"


In samples/datalake/hudi/start-hudi-compose.sh line 39:
  if [ -f "$FILE_PATH" ]; then
     ^-----------------^ SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.
           ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  if [[ -f "${FILE_PATH}" ]]; then


In samples/datalake/hudi/start-hudi-compose.sh line 41:
    FILE_MD5=$(md5sum "$FILE_PATH" | awk '{ print $1 }')
                       ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    FILE_MD5=$(md5sum "${FILE_PATH}" | awk '{ print $1 }')


In samples/datalake/hudi/start-hudi-compose.sh line 43:
    if [ "$FILE_MD5" = "$EXPECTED_MD5" ]; then
       ^-- SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.
          ^-------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                        ^-----------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    if [[ "${FILE_MD5}" = "${EXPECTED_MD5}" ]]; then


In samples/datalake/hudi/start-hudi-compose.sh line 44:
      echo "$FILE_PATH is ready!"
            ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
      echo "${FILE_PATH} is ready!"


In samples/datalake/hudi/start-hudi-compose.sh line 46:
      echo "$FILE_PATH is broken, Redownloading ..."
            ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
      echo "${FILE_PATH} is broken, Redownloading ..."


In samples/datalake/hudi/start-hudi-compose.sh line 47:
      rm $FILE_PATH
         ^--------^ SC2086 (info): Double quote to prevent globbing and word splitting.
         ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
      rm "${FILE_PATH}"


In samples/datalake/hudi/start-hudi-compose.sh line 48:
      wget ${DOWNLOAD_URL}/${FILE_PATH}
           ^-------------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                           ^----------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean: 
      wget "${DOWNLOAD_URL}"/"${FILE_PATH}"


In samples/datalake/hudi/start-hudi-compose.sh line 51:
    echo "Downloading $FILE_PATH ..."
                      ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    echo "Downloading ${FILE_PATH} ..."


In samples/datalake/hudi/start-hudi-compose.sh line 52:
    wget ${DOWNLOAD_URL}/${FILE_PATH}
         ^-------------^ SC2086 (info): Double quote to prevent globbing and word splitting.
                         ^----------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean: 
    wget "${DOWNLOAD_URL}"/"${FILE_PATH}"


In samples/datalake/hudi/start-hudi-compose.sh line 58:
cd ${curdir}
^----------^ SC2164 (warning): Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
   ^-------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean: 
cd "${curdir}" || exit


In samples/datalake/hudi/start-hudi-compose.sh line 60:
if [ ! -d "packages" ]; then
   ^-----------------^ SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.

Did you mean: 
if [[ ! -d "packages" ]]; then


In samples/datalake/hudi/start-hudi-compose.sh line 63:
cd packages
^---------^ SC2164 (warning): Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

Did you mean: 
cd packages || exit


In samples/datalake/hudi/start-hudi-compose.sh line 65:
download_source_file "aws-java-sdk-bundle-1.12.48.jar" "$md5_aws_java_sdk" "https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.12.48"
                                                        ^---------------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
download_source_file "aws-java-sdk-bundle-1.12.48.jar" "${md5_aws_java_sdk}" "https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk-bundle/1.12.48"


In samples/datalake/hudi/start-hudi-compose.sh line 66:
download_source_file "hadoop-aws-3.3.1.jar" "$md5_hadoop_aws" "https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.1"
                                             ^-------------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
download_source_file "hadoop-aws-3.3.1.jar" "${md5_hadoop_aws}" "https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/3.3.1"


In samples/datalake/hudi/start-hudi-compose.sh line 67:
download_source_file "hudi-spark3.4-bundle_2.12-0.14.1.jar" "$md5_hudi_bundle" "https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.4-bundle_2.12/0.14.1"
                                                             ^--------------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
download_source_file "hudi-spark3.4-bundle_2.12-0.14.1.jar" "${md5_hudi_bundle}" "https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3.4-bundle_2.12/0.14.1"


In samples/datalake/hudi/start-hudi-compose.sh line 68:
download_source_file "openjdk-17.0.2_linux-x64_bin.tar.gz" "$md5_jdk17" "https://download.java.net/java/GA/jdk17.0.2/dfd4a8d0985749f896bed50d7138ee7f/8/GPL"
                                                            ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
download_source_file "openjdk-17.0.2_linux-x64_bin.tar.gz" "${md5_jdk17}" "https://download.java.net/java/GA/jdk17.0.2/dfd4a8d0985749f896bed50d7138ee7f/8/GPL"


In samples/datalake/hudi/start-hudi-compose.sh line 69:
download_source_file "spark-3.4.2-bin-hadoop3.tgz" "$md5_spark" "https://archive.apache.org/dist/spark/spark-3.4.2"
                                                    ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
download_source_file "spark-3.4.2-bin-hadoop3.tgz" "${md5_spark}" "https://archive.apache.org/dist/spark/spark-3.4.2"


In samples/datalake/hudi/start-hudi-compose.sh line 70:
download_source_file "${DORIS_PACKAGE}.tar.gz" "$md5_doris" "$DORIS_DOWNLOAD_URL"
                                                ^--------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                                             ^-----------------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
download_source_file "${DORIS_PACKAGE}.tar.gz" "${md5_doris}" "${DORIS_DOWNLOAD_URL}"


In samples/datalake/hudi/start-hudi-compose.sh line 72:
if [ ! -f "jdk-17.0.2/SUCCESS" ]; then
   ^---------------------------^ SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.

Did you mean: 
if [[ ! -f "jdk-17.0.2/SUCCESS" ]]; then


In samples/datalake/hudi/start-hudi-compose.sh line 74:
  if [ -d "jdk-17.0.2" ]; then
     ^-----------------^ SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.

Did you mean: 
  if [[ -d "jdk-17.0.2" ]]; then


In samples/datalake/hudi/start-hudi-compose.sh line 82:
if [ ! -f "spark-3.4.2-bin-hadoop3/SUCCESS" ]; then
   ^-- SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.

Did you mean: 
if [[ ! -f "spark-3.4.2-bin-hadoop3/SUCCESS" ]]; then


In samples/datalake/hudi/start-hudi-compose.sh line 84:
  if [ -d "spark-3.4.2-bin-hadoop3" ]; then
     ^-- SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.

Did you mean: 
  if [[ -d "spark-3.4.2-bin-hadoop3" ]]; then


In samples/datalake/hudi/start-hudi-compose.sh line 95:
if [ ! -f "doris-bin/SUCCESS" ]; then
   ^--------------------------^ SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.

Did you mean: 
if [[ ! -f "doris-bin/SUCCESS" ]]; then


In samples/datalake/hudi/start-hudi-compose.sh line 96:
  echo "Prepare $DORIS_PACKAGE environment"
                ^------------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  echo "Prepare ${DORIS_PACKAGE} environment"


In samples/datalake/hudi/start-hudi-compose.sh line 97:
  if [ -d "doris-bin" ]; then
     ^----------------^ SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.

Did you mean: 
  if [[ -d "doris-bin" ]]; then


In samples/datalake/hudi/start-hudi-compose.sh line 98:
    echo "Remove broken $DORIS_PACKAGE"
                        ^------------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
    echo "Remove broken ${DORIS_PACKAGE}"


In samples/datalake/hudi/start-hudi-compose.sh line 101:
  echo "Unpackage $DORIS_PACKAGE"
                  ^------------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
  echo "Unpackage ${DORIS_PACKAGE}"


In samples/datalake/hudi/start-hudi-compose.sh line 102:
  tar xzf ${DORIS_PACKAGE}.tar.gz
          ^--------------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.

Did you mean: 
  tar xzf "${DORIS_PACKAGE}".tar.gz


In samples/datalake/hudi/start-hudi-compose.sh line 103:
  mv ${DORIS_PACKAGE} doris-bin
     ^--------------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.

Did you mean: 
  mv "${DORIS_PACKAGE}" doris-bin

For more information:
  https://www.shellcheck.net/wiki/SC2164 -- Use 'cd ... || exit' or 'cd ... |...
  https://www.shellcheck.net/wiki/SC2086 -- Double quote to prevent globbing ...
  https://www.shellcheck.net/wiki/SC2248 -- Prefer double quoting even when v...
----------

You can address the above issues in one of three ways:
1. Manually correct the issue in the offending shell script;
2. Disable specific issues by adding the comment:
  # shellcheck disable=NNNN
above the line that contains the issue, where NNNN is the error code;
3. Add '-e NNNN' to the SHELLCHECK_OPTS setting in your .yml action file.



shfmt errors

'shfmt ' returned error 1 finding the following formatting issues:

----------
--- samples/datalake/hudi/scripts/spark-hudi.sh.orig
+++ samples/datalake/hudi/scripts/spark-hudi.sh
@@ -22,7 +22,7 @@
 export HADOOP_HOME=/opt/hadoop-3.3.1
 
 if [ ! -d "$SPARK_HOME" ]; then
-  cp -r /opt/spark-3.4.2-bin-hadoop3 $SPARK_HOME
+    cp -r /opt/spark-3.4.2-bin-hadoop3 $SPARK_HOME
 fi
 
 cp ${HIVE_HOME}/conf/hive-site.xml ${SPARK_HOME}/conf/
@@ -30,9 +30,9 @@
 cp ${HADOOP_HOME}/etc/hadoop/core-site.xml ${SPARK_HOME}/conf/
 
 ${SPARK_HOME}/bin/spark-sql \
-  --master local[*] \
-  --name "spark-hudi-sql" \
-  --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
-  --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog \
-  --conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension \
-  --conf spark.sql.catalogImplementation=hive
+    --master local[*] \
+    --name "spark-hudi-sql" \
+    --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
+    --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog \
+    --conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension \
+    --conf spark.sql.catalogImplementation=hive
--- samples/datalake/hudi/start-hudi-compose.sh.orig
+++ samples/datalake/hudi/start-hudi-compose.sh
@@ -20,7 +20,6 @@
 DORIS_PACKAGE=apache-doris-2.1.4-bin-x64
 DORIS_DOWNLOAD_URL=https://apache-doris-releases.oss-accelerate.aliyuncs.com
 
-
 md5_aws_java_sdk="452d1e00efb11bff0ee17c42a6a44a0a"
 md5_hadoop_aws="a3e19d42cadd1a6862a41fd276f94382"
 md5_hudi_bundle="a9cb8c752d1d7132ef3cfe3ead78a30d"
@@ -28,37 +27,35 @@
 md5_spark="b393d314ffbc03facdc85575197c5db9"
 md5_doris="a4d8bc9730aca3a51294e87d7d5b3e8e"
 
-
 download_source_file() {
-  local FILE_PATH="$1"
-  local EXPECTED_MD5="$2"
-  local DOWNLOAD_URL="$3"
+    local FILE_PATH="$1"
+    local EXPECTED_MD5="$2"
+    local DOWNLOAD_URL="$3"
 
-  echo "Download $FILE_PATH"
+    echo "Download $FILE_PATH"
 
-  if [ -f "$FILE_PATH" ]; then
-    local FILE_MD5
-    FILE_MD5=$(md5sum "$FILE_PATH" | awk '{ print $1 }')
+    if [ -f "$FILE_PATH" ]; then
+        local FILE_MD5
+        FILE_MD5=$(md5sum "$FILE_PATH" | awk '{ print $1 }')
 
-    if [ "$FILE_MD5" = "$EXPECTED_MD5" ]; then
-      echo "$FILE_PATH is ready!"
+        if [ "$FILE_MD5" = "$EXPECTED_MD5" ]; then
+            echo "$FILE_PATH is ready!"
+        else
+            echo "$FILE_PATH is broken, Redownloading ..."
+            rm $FILE_PATH
+            wget ${DOWNLOAD_URL}/${FILE_PATH}
+        fi
     else
-      echo "$FILE_PATH is broken, Redownloading ..."
-      rm $FILE_PATH
-      wget ${DOWNLOAD_URL}/${FILE_PATH}
+        echo "Downloading $FILE_PATH ..."
+        wget ${DOWNLOAD_URL}/${FILE_PATH}
     fi
-  else
-    echo "Downloading $FILE_PATH ..."
-    wget ${DOWNLOAD_URL}/${FILE_PATH}
-  fi
 }
 
-
 curdir="$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd)"
 cd ${curdir}
 
 if [ ! -d "packages" ]; then
-  mkdir packages
+    mkdir packages
 fi
 cd packages
 
@@ -70,38 +67,38 @@
 download_source_file "${DORIS_PACKAGE}.tar.gz" "$md5_doris" "$DORIS_DOWNLOAD_URL"
 
 if [ ! -f "jdk-17.0.2/SUCCESS" ]; then
-  echo "Prepare jdk17 environment"
-  if [ -d "jdk-17.0.2" ]; then
-    echo "Remove broken jdk-17.0.2"
-    rm -rf jdk-17.0.2
-  fi
-  echo "Unpackage jdk-17.0.2"
-  tar xzf openjdk-17.0.2_linux-x64_bin.tar.gz
-  touch jdk-17.0.2/SUCCESS
+    echo "Prepare jdk17 environment"
+    if [ -d "jdk-17.0.2" ]; then
+        echo "Remove broken jdk-17.0.2"
+        rm -rf jdk-17.0.2
+    fi
+    echo "Unpackage jdk-17.0.2"
+    tar xzf openjdk-17.0.2_linux-x64_bin.tar.gz
+    touch jdk-17.0.2/SUCCESS
 fi
 if [ ! -f "spark-3.4.2-bin-hadoop3/SUCCESS" ]; then
-  echo "Prepare spark3.4 environment"
-  if [ -d "spark-3.4.2-bin-hadoop3" ]; then
-    echo "Remove broken spark-3.4.2-bin-hadoop3"
-    rm -rf spark-3.4.2-bin-hadoop3
-  fi
-  echo "Unpackage spark-3.4.2-bin-hadoop3"
-  tar -xf spark-3.4.2-bin-hadoop3.tgz
-  cp aws-java-sdk-bundle-1.12.48.jar spark-3.4.2-bin-hadoop3/jars/
-  cp hadoop-aws-3.3.1.jar spark-3.4.2-bin-hadoop3/jars/
-  cp hudi-spark3.4-bundle_2.12-0.14.1.jar spark-3.4.2-bin-hadoop3/jars/
-  touch spark-3.4.2-bin-hadoop3/SUCCESS
+    echo "Prepare spark3.4 environment"
+    if [ -d "spark-3.4.2-bin-hadoop3" ]; then
+        echo "Remove broken spark-3.4.2-bin-hadoop3"
+        rm -rf spark-3.4.2-bin-hadoop3
+    fi
+    echo "Unpackage spark-3.4.2-bin-hadoop3"
+    tar -xf spark-3.4.2-bin-hadoop3.tgz
+    cp aws-java-sdk-bundle-1.12.48.jar spark-3.4.2-bin-hadoop3/jars/
+    cp hadoop-aws-3.3.1.jar spark-3.4.2-bin-hadoop3/jars/
+    cp hudi-spark3.4-bundle_2.12-0.14.1.jar spark-3.4.2-bin-hadoop3/jars/
+    touch spark-3.4.2-bin-hadoop3/SUCCESS
 fi
 if [ ! -f "doris-bin/SUCCESS" ]; then
-  echo "Prepare $DORIS_PACKAGE environment"
-  if [ -d "doris-bin" ]; then
-    echo "Remove broken $DORIS_PACKAGE"
-    rm -rf doris-bin
-  fi
-  echo "Unpackage $DORIS_PACKAGE"
-  tar xzf ${DORIS_PACKAGE}.tar.gz
-  mv ${DORIS_PACKAGE} doris-bin
-  touch doris-bin/SUCCESS
+    echo "Prepare $DORIS_PACKAGE environment"
+    if [ -d "doris-bin" ]; then
+        echo "Remove broken $DORIS_PACKAGE"
+        rm -rf doris-bin
+    fi
+    echo "Unpackage $DORIS_PACKAGE"
+    tar xzf ${DORIS_PACKAGE}.tar.gz
+    mv ${DORIS_PACKAGE} doris-bin
+    touch doris-bin/SUCCESS
 fi
 
 cd ../
@@ -119,5 +116,3 @@
 echo "./login-spark.sh to login into spark"
 echo "./login-doris.sh to login into doris"
 echo "======================================================"
-
-
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename


@AshinGau
Copy link
Member Author

AshinGau commented Jul 8, 2024

run buildall

Copy link
Contributor

github-actions bot commented Jul 9, 2024

sh-checker report

To get the full details, please check in the job output.

shellcheck errors
'shellcheck ' found no issues.

shfmt errors

'shfmt ' returned error 1 finding the following formatting issues:

----------
--- samples/datalake/hudi/scripts/spark-hudi.sh.orig
+++ samples/datalake/hudi/scripts/spark-hudi.sh
@@ -22,7 +22,7 @@
 export HADOOP_HOME=/opt/hadoop-3.3.1
 
 if [[ ! -d "${SPARK_HOME}" ]]; then
-  cp -r /opt/spark-3.4.2-bin-hadoop3 "${SPARK_HOME}"
+    cp -r /opt/spark-3.4.2-bin-hadoop3 "${SPARK_HOME}"
 fi
 
 cp "${HIVE_HOME}"/conf/hive-site.xml "${SPARK_HOME}"/conf/
@@ -30,9 +30,9 @@
 cp "${HADOOP_HOME}"/etc/hadoop/core-site.xml "${SPARK_HOME}"/conf/
 
 "${SPARK_HOME}"/bin/spark-sql \
-  --master local[*] \
-  --name "spark-hudi-sql" \
-  --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
-  --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog \
-  --conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension \
-  --conf spark.sql.catalogImplementation=hive
+    --master local[*] \
+    --name "spark-hudi-sql" \
+    --conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
+    --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog \
+    --conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension \
+    --conf spark.sql.catalogImplementation=hive
--- samples/datalake/hudi/start-hudi-compose.sh.orig
+++ samples/datalake/hudi/start-hudi-compose.sh
@@ -28,34 +28,34 @@
 md5_doris="a4d8bc9730aca3a51294e87d7d5b3e8e"
 
 download_source_file() {
-  local FILE_PATH="$1"
-  local EXPECTED_MD5="$2"
-  local DOWNLOAD_URL="$3"
+    local FILE_PATH="$1"
+    local EXPECTED_MD5="$2"
+    local DOWNLOAD_URL="$3"
 
-  echo "Download ${FILE_PATH}"
+    echo "Download ${FILE_PATH}"
 
-  if [[ -f "${FILE_PATH}" ]]; then
-    local FILE_MD5
-    FILE_MD5=$(md5sum "${FILE_PATH}" | awk '{ print $1 }')
+    if [[ -f "${FILE_PATH}" ]]; then
+        local FILE_MD5
+        FILE_MD5=$(md5sum "${FILE_PATH}" | awk '{ print $1 }')
 
-    if [[ "${FILE_MD5}" = "${EXPECTED_MD5}" ]]; then
-      echo "${FILE_PATH} is ready!"
+        if [[ "${FILE_MD5}" = "${EXPECTED_MD5}" ]]; then
+            echo "${FILE_PATH} is ready!"
+        else
+            echo "${FILE_PATH} is broken, Redownloading ..."
+            rm "${FILE_PATH}"
+            wget "${DOWNLOAD_URL}"/"${FILE_PATH}"
+        fi
     else
-      echo "${FILE_PATH} is broken, Redownloading ..."
-      rm "${FILE_PATH}"
-      wget "${DOWNLOAD_URL}"/"${FILE_PATH}"
+        echo "Downloading ${FILE_PATH} ..."
+        wget "${DOWNLOAD_URL}"/"${FILE_PATH}"
     fi
-  else
-    echo "Downloading ${FILE_PATH} ..."
-    wget "${DOWNLOAD_URL}"/"${FILE_PATH}"
-  fi
 }
 
 curdir="$(cd "$(dirname "${BASH_SOURCE[0]}")" &>/dev/null && pwd)"
 cd "${curdir}" || exit
 
 if [[ ! -d "packages" ]]; then
-  mkdir packages
+    mkdir packages
 fi
 cd packages || exit
 
@@ -67,38 +67,38 @@
 download_source_file "${DORIS_PACKAGE}.tar.gz" "${md5_doris}" "${DORIS_DOWNLOAD_URL}"
 
 if [[ ! -f "jdk-17.0.2/SUCCESS" ]]; then
-  echo "Prepare jdk17 environment"
-  if [[ -d "jdk-17.0.2" ]]; then
-    echo "Remove broken jdk-17.0.2"
-    rm -rf jdk-17.0.2
-  fi
-  echo "Unpackage jdk-17.0.2"
-  tar xzf openjdk-17.0.2_linux-x64_bin.tar.gz
-  touch jdk-17.0.2/SUCCESS
+    echo "Prepare jdk17 environment"
+    if [[ -d "jdk-17.0.2" ]]; then
+        echo "Remove broken jdk-17.0.2"
+        rm -rf jdk-17.0.2
+    fi
+    echo "Unpackage jdk-17.0.2"
+    tar xzf openjdk-17.0.2_linux-x64_bin.tar.gz
+    touch jdk-17.0.2/SUCCESS
 fi
 if [[ ! -f "spark-3.4.2-bin-hadoop3/SUCCESS" ]]; then
-  echo "Prepare spark3.4 environment"
-  if [[ -d "spark-3.4.2-bin-hadoop3" ]]; then
-    echo "Remove broken spark-3.4.2-bin-hadoop3"
-    rm -rf spark-3.4.2-bin-hadoop3
-  fi
-  echo "Unpackage spark-3.4.2-bin-hadoop3"
-  tar -xf spark-3.4.2-bin-hadoop3.tgz
-  cp aws-java-sdk-bundle-1.12.48.jar spark-3.4.2-bin-hadoop3/jars/
-  cp hadoop-aws-3.3.1.jar spark-3.4.2-bin-hadoop3/jars/
-  cp hudi-spark3.4-bundle_2.12-0.14.1.jar spark-3.4.2-bin-hadoop3/jars/
-  touch spark-3.4.2-bin-hadoop3/SUCCESS
+    echo "Prepare spark3.4 environment"
+    if [[ -d "spark-3.4.2-bin-hadoop3" ]]; then
+        echo "Remove broken spark-3.4.2-bin-hadoop3"
+        rm -rf spark-3.4.2-bin-hadoop3
+    fi
+    echo "Unpackage spark-3.4.2-bin-hadoop3"
+    tar -xf spark-3.4.2-bin-hadoop3.tgz
+    cp aws-java-sdk-bundle-1.12.48.jar spark-3.4.2-bin-hadoop3/jars/
+    cp hadoop-aws-3.3.1.jar spark-3.4.2-bin-hadoop3/jars/
+    cp hudi-spark3.4-bundle_2.12-0.14.1.jar spark-3.4.2-bin-hadoop3/jars/
+    touch spark-3.4.2-bin-hadoop3/SUCCESS
 fi
 if [[ ! -f "doris-bin/SUCCESS" ]]; then
-  echo "Prepare ${DORIS_PACKAGE} environment"
-  if [[ -d "doris-bin" ]]; then
-    echo "Remove broken ${DORIS_PACKAGE}"
-    rm -rf doris-bin
-  fi
-  echo "Unpackage ${DORIS_PACKAGE}"
-  tar xzf "${DORIS_PACKAGE}".tar.gz
-  mv "${DORIS_PACKAGE}" doris-bin
-  touch doris-bin/SUCCESS
+    echo "Prepare ${DORIS_PACKAGE} environment"
+    if [[ -d "doris-bin" ]]; then
+        echo "Remove broken ${DORIS_PACKAGE}"
+        rm -rf doris-bin
+    fi
+    echo "Unpackage ${DORIS_PACKAGE}"
+    tar xzf "${DORIS_PACKAGE}".tar.gz
+    mv "${DORIS_PACKAGE}" doris-bin
+    touch doris-bin/SUCCESS
 fi
 
 cd ../
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename


@AshinGau
Copy link
Member Author

AshinGau commented Jul 9, 2024

run buildall

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 9, 2024
Copy link
Contributor

github-actions bot commented Jul 9, 2024

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Jul 9, 2024

PR approved by anyone and no changes requested.

@AshinGau AshinGau merged commit e338cce into apache:master Jul 9, 2024
28 of 29 checks passed
AshinGau added a commit to AshinGau/incubator-doris that referenced this pull request Jul 15, 2024
…#37451)

## Proposed changes

**Doris+Hudi+MINIO Environments**:
Launch spark/doris/hive/hudi/minio test environments, and give examples
to query hudi in Doris.

## Launch Docker Compose
**Create Network**
```shell
sudo docker network create -d bridge hudi-net
```
**Launch all components in docker**
```shell
sudo ./start-hudi-compose.sh
```
**Login into Spark**
```shell
sudo ./login-spark.sh
```
**Login into Doris**
```shell
sudo ./login-doris.sh
```
dataroaring pushed a commit that referenced this pull request Jul 17, 2024
## Proposed changes

**Doris+Hudi+MINIO Environments**:
Launch spark/doris/hive/hudi/minio test environments, and give examples
to query hudi in Doris.

## Launch Docker Compose
**Create Network**
```shell
sudo docker network create -d bridge hudi-net
```
**Launch all components in docker**
```shell
sudo ./start-hudi-compose.sh
```
**Login into Spark**
```shell
sudo ./login-spark.sh
```
**Login into Doris**
```shell
sudo ./login-doris.sh
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.5-merged dev/3.0.1-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants