Skip to content

Commit 9d99de0

Browse files
committed
[SPARK-32245][INFRA] Run Spark tests in Github Actions
This PR aims to run the Spark tests in Github Actions. To briefly explain the main idea: - Reuse `dev/run-tests.py` with SBT build - Reuse the modules in `dev/sparktestsupport/modules.py` to test each module - Pass the modules to test into `dev/run-tests.py` directly via `TEST_ONLY_MODULES` environment variable. For example, `pyspark-sql,core,sql,hive`. - `dev/run-tests.py` _does not_ take the dependent modules into account but solely the specified modules to test. Another thing to note might be `SlowHiveTest` annotation. Running the tests in Hive modules takes too much so the slow tests are extracted and it runs as a separate job. It was extracted from the actual elapsed time in Jenkins: ![Screen Shot 2020-07-09 at 7 48 13 PM](https://user-images.githubusercontent.com/6477701/87050238-f6098e80-c238-11ea-9c4a-ab505af61381.png) So, Hive tests are separated into to jobs. One is slow test cases, and the other one is the other test cases. _Note that_ the current GitHub Actions build virtually copies what the default PR builder on Jenkins does (without other profiles such as JDK 11, Hadoop 2, etc.). The only exception is Kinesis https://github.com/apache/spark/pull/29057/files#diff-04eb107ee163a50b61281ca08f4e4c7bR23 Last week and onwards, the Jenkins machines became very unstable for many reasons: - Apparently, the machines became extremely slow. Almost all tests can't pass. - One machine (worker 4) started to have the corrupt `.m2` which fails the build. - Documentation build fails time to time for an unknown reason in Jenkins machine specifically. This is disabled for now at apache#29017. - Almost all PRs are basically blocked by this instability currently. The advantages of using Github Actions: - To avoid depending on few persons who can access to the cluster. - To reduce the elapsed time in the build - we could split the tests (e.g., SQL, ML, CORE), and run them in parallel so the total build time will significantly reduce. - To control the environment more flexibly. - Other contributors can test and propose to fix Github Actions configurations so we can distribute this build management cost. Note that: - The current build in Jenkins takes _more than 7 hours_. With Github actions it takes _less than 2 hours_ - We can now control the environments especially for Python easily. - The test and build look more stable than the Jenkins'. No, dev-only change. Tested at #4 Closes apache#29057 from HyukjinKwon/migrate-to-github-actions. Authored-by: HyukjinKwon <gurwls223@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
1 parent a36514e commit 9d99de0

File tree

22 files changed

+432
-169
lines changed

22 files changed

+432
-169
lines changed

.github/workflows/master.yml

Lines changed: 179 additions & 106 deletions
Original file line numberDiff line numberDiff line change
@@ -1,156 +1,229 @@
11
name: master
22

33
on:
4-
push:
5-
branches:
6-
- branch-3.0
74
pull_request:
85
branches:
96
- branch-3.0
107

118
jobs:
9+
# TODO(SPARK-32248): Recover JDK 11 builds
10+
# Build: build Spark and run the tests for specified modules.
1211
build:
13-
12+
name: "Build modules: ${{ matrix.modules }} ${{ matrix.comment }} (JDK ${{ matrix.java }}, ${{ matrix.hadoop }}, ${{ matrix.hive }})"
1413
runs-on: ubuntu-latest
1514
strategy:
15+
fail-fast: false
1616
matrix:
17-
java: [ '1.8', '11' ]
18-
hadoop: [ 'hadoop-2.7', 'hadoop-3.2' ]
19-
hive: [ 'hive-1.2', 'hive-2.3' ]
20-
exclude:
21-
- java: '11'
22-
hive: 'hive-1.2'
23-
- hadoop: 'hadoop-3.2'
24-
hive: 'hive-1.2'
25-
name: Build Spark - JDK${{ matrix.java }}/${{ matrix.hadoop }}/${{ matrix.hive }}
26-
17+
java:
18+
- 1.8
19+
hadoop:
20+
- hadoop2.7
21+
hive:
22+
- hive2.3
23+
# TODO(SPARK-32246): We don't test 'streaming-kinesis-asl' for now.
24+
# Kinesis tests depends on external Amazon kinesis service.
25+
# Note that the modules below are from sparktestsupport/modules.py.
26+
modules:
27+
- |-
28+
core, unsafe, kvstore, avro,
29+
network_common, network_shuffle, repl, launcher
30+
examples, sketch, graphx
31+
- |-
32+
catalyst, hive-thriftserver
33+
- |-
34+
streaming, sql-kafka-0-10, streaming-kafka-0-10,
35+
mllib-local, mllib,
36+
yarn, mesos, kubernetes, hadoop-cloud, spark-ganglia-lgpl
37+
- |-
38+
pyspark-sql, pyspark-mllib
39+
- |-
40+
pyspark-core, pyspark-streaming, pyspark-ml
41+
- |-
42+
sparkr
43+
# Here, we split Hive and SQL tests into some of slow ones and the rest of them.
44+
included-tags: [""]
45+
# Some tests are disabled in GitHun Actions. Ideally, we should remove this tag
46+
# and run all tests.
47+
excluded-tags: ["org.apache.spark.tags.GitHubActionsUnstableTest"]
48+
comment: [""]
49+
include:
50+
# Hive tests
51+
- modules: hive
52+
java: 1.8
53+
hadoop: hadoop2.7
54+
hive: hive2.3
55+
included-tags: org.apache.spark.tags.SlowHiveTest
56+
comment: "- slow tests"
57+
- modules: hive
58+
java: 1.8
59+
hadoop: hadoop2.7
60+
hive: hive2.3
61+
excluded-tags: org.apache.spark.tags.SlowHiveTest,org.apache.spark.tags.GitHubActionsUnstableTest
62+
comment: "- other tests"
63+
# SQL tests
64+
- modules: sql
65+
java: 1.8
66+
hadoop: hadoop2.7
67+
hive: hive2.3
68+
included-tags: org.apache.spark.tags.ExtendedSQLTest
69+
comment: "- slow tests"
70+
- modules: sql
71+
java: 1.8
72+
hadoop: hadoop2.7
73+
hive: hive2.3
74+
excluded-tags: org.apache.spark.tags.ExtendedSQLTest,org.apache.spark.tags.GitHubActionsUnstableTest
75+
comment: "- other tests"
76+
env:
77+
TEST_ONLY_MODULES: ${{ matrix.modules }}
78+
TEST_ONLY_EXCLUDED_TAGS: ${{ matrix.excluded-tags }}
79+
TEST_ONLY_INCLUDED_TAGS: ${{ matrix.included-tags }}
80+
HADOOP_PROFILE: ${{ matrix.hadoop }}
81+
HIVE_PROFILE: ${{ matrix.hive }}
82+
# GitHub Actions' default miniconda to use in pip packaging test.
83+
CONDA_PREFIX: /usr/share/miniconda
2784
steps:
28-
- uses: actions/checkout@master
29-
# We split caches because GitHub Action Cache has a 400MB-size limit.
30-
- uses: actions/cache@v1
85+
- name: Checkout Spark repository
86+
uses: actions/checkout@v2
87+
# Cache local repositories. Note that GitHub Actions cache has a 2G limit.
88+
- name: Cache Scala, SBT, Maven and Zinc
89+
uses: actions/cache@v1
3190
with:
3291
path: build
3392
key: build-${{ hashFiles('**/pom.xml') }}
3493
restore-keys: |
3594
build-
36-
- uses: actions/cache@v1
37-
with:
38-
path: ~/.m2/repository/com
39-
key: ${{ matrix.java }}-${{ matrix.hadoop }}-maven-com-${{ hashFiles('**/pom.xml') }}
40-
restore-keys: |
41-
${{ matrix.java }}-${{ matrix.hadoop }}-maven-com-
42-
- uses: actions/cache@v1
95+
- name: Cache Maven local repository
96+
uses: actions/cache@v2
4397
with:
44-
path: ~/.m2/repository/org
45-
key: ${{ matrix.java }}-${{ matrix.hadoop }}-maven-org-${{ hashFiles('**/pom.xml') }}
46-
restore-keys: |
47-
${{ matrix.java }}-${{ matrix.hadoop }}-maven-org-
48-
- uses: actions/cache@v1
49-
with:
50-
path: ~/.m2/repository/net
51-
key: ${{ matrix.java }}-${{ matrix.hadoop }}-maven-net-${{ hashFiles('**/pom.xml') }}
98+
path: ~/.m2/repository
99+
key: ${{ matrix.java }}-${{ matrix.hadoop }}-maven-${{ hashFiles('**/pom.xml') }}
52100
restore-keys: |
53-
${{ matrix.java }}-${{ matrix.hadoop }}-maven-net-
54-
- uses: actions/cache@v1
101+
${{ matrix.java }}-${{ matrix.hadoop }}-maven-
102+
- name: Cache Ivy local repository
103+
uses: actions/cache@v2
55104
with:
56-
path: ~/.m2/repository/io
57-
key: ${{ matrix.java }}-${{ matrix.hadoop }}-maven-io-${{ hashFiles('**/pom.xml') }}
105+
path: ~/.ivy2/cache
106+
key: ${{ matrix.java }}-${{ matrix.hadoop }}-ivy-${{ hashFiles('**/pom.xml') }}-${{ hashFiles('**/plugins.sbt') }}
58107
restore-keys: |
59-
${{ matrix.java }}-${{ matrix.hadoop }}-maven-io-
60-
- name: Set up JDK ${{ matrix.java }}
108+
${{ matrix.java }}-${{ matrix.hadoop }}-ivy-
109+
- name: Install JDK ${{ matrix.java }}
61110
uses: actions/setup-java@v1
62111
with:
63112
java-version: ${{ matrix.java }}
64-
- name: Build with Maven
65-
run: |
66-
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=1g -Dorg.slf4j.simpleLogger.defaultLogLevel=WARN"
67-
export MAVEN_CLI_OPTS="--no-transfer-progress"
68-
mkdir -p ~/.m2
69-
./build/mvn $MAVEN_CLI_OPTS -DskipTests -Pyarn -Pmesos -Pkubernetes -Phive -P${{ matrix.hive }} -Phive-thriftserver -P${{ matrix.hadoop }} -Phadoop-cloud -Djava.version=${{ matrix.java }} install
70-
rm -rf ~/.m2/repository/org/apache/spark
71-
72-
73-
lint:
74-
runs-on: ubuntu-latest
75-
name: Linters (Java/Scala/Python), licenses, dependencies
76-
steps:
77-
- uses: actions/checkout@master
78-
- uses: actions/setup-java@v1
113+
# PySpark
114+
- name: Install PyPy3
115+
# SQL component also has Python related tests, for example, IntegratedUDFTestUtils.
116+
# Note that order of Python installations here matters because default python3 is
117+
# overridden by pypy3.
118+
uses: actions/setup-python@v2
119+
if: contains(matrix.modules, 'pyspark') || (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-'))
79120
with:
80-
java-version: '11'
81-
- uses: actions/setup-python@v1
121+
python-version: pypy3
122+
architecture: x64
123+
- name: Install Python 2.7
124+
uses: actions/setup-python@v2
125+
if: contains(matrix.modules, 'pyspark') || (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-'))
82126
with:
83-
python-version: '3.x'
84-
architecture: 'x64'
85-
- name: Scala
86-
run: ./dev/lint-scala
87-
- name: Java
88-
run: ./dev/lint-java
89-
- name: Python
90-
run: |
91-
pip install flake8 sphinx numpy
92-
./dev/lint-python
93-
- name: License
94-
run: ./dev/check-license
95-
- name: Dependencies
96-
run: ./dev/test-dependencies.sh
97-
98-
lintr:
99-
runs-on: ubuntu-latest
100-
name: Linter (R)
101-
steps:
102-
- uses: actions/checkout@master
103-
- uses: actions/setup-java@v1
127+
python-version: 2.7
128+
architecture: x64
129+
- name: Install Python 3.6
130+
uses: actions/setup-python@v2
131+
if: contains(matrix.modules, 'pyspark') || (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-'))
104132
with:
105-
java-version: '11'
106-
- uses: r-lib/actions/setup-r@v1
133+
python-version: 3.6
134+
architecture: x64
135+
- name: Install Python packages
136+
if: contains(matrix.modules, 'pyspark') || (contains(matrix.modules, 'sql') && !contains(matrix.modules, 'sql-'))
137+
# PyArrow is not supported in PyPy yet, see ARROW-2651.
138+
# TODO(SPARK-32247): scipy installation with PyPy fails for an unknown reason.
139+
run: |
140+
python3 -m pip install numpy pyarrow pandas scipy
141+
python3 -m pip list
142+
python2 -m pip install numpy pyarrow pandas scipy
143+
python2 -m pip list
144+
pypy3 -m pip install numpy pandas
145+
pypy3 -m pip list
146+
# SparkR
147+
- name: Install R 3.6
148+
uses: r-lib/actions/setup-r@v1
149+
if: contains(matrix.modules, 'sparkr')
107150
with:
108-
r-version: '3.6.2'
109-
- name: Install lib
151+
r-version: 3.6
152+
- name: Install R packages
153+
if: contains(matrix.modules, 'sparkr')
110154
run: |
111155
sudo apt-get install -y libcurl4-openssl-dev
112-
- name: install R packages
156+
sudo Rscript -e "install.packages(c('knitr', 'rmarkdown', 'testthat', 'devtools', 'e1071', 'survival', 'arrow', 'roxygen2'), repos='https://cloud.r-project.org/')"
157+
# Show installed packages in R.
158+
sudo Rscript -e 'pkg_list <- as.data.frame(installed.packages()[, c(1,3:4)]); pkg_list[is.na(pkg_list$Priority), 1:2, drop = FALSE]'
159+
# Run the tests.
160+
- name: "Run tests: ${{ matrix.modules }}"
113161
run: |
114-
sudo Rscript -e "install.packages(c('curl', 'xml2', 'httr', 'devtools', 'testthat', 'knitr', 'rmarkdown', 'roxygen2', 'e1071', 'survival'), repos='https://cloud.r-project.org/')"
115-
sudo Rscript -e "devtools::install_github('jimhester/lintr@v2.0.0')"
116-
- name: package and install SparkR
117-
run: ./R/install-dev.sh
118-
- name: lint-r
119-
run: ./dev/lint-r
162+
# Hive tests become flaky when running in parallel as it's too intensive.
163+
if [[ "$TEST_ONLY_MODULES" == "hive" ]]; then export SERIAL_SBT_TESTS=1; fi
164+
mkdir -p ~/.m2
165+
./dev/run-tests --parallelism 2
166+
rm -rf ~/.m2/repository/org/apache/spark
120167
121-
docs:
168+
# Static analysis, and documentation build
169+
lint:
170+
name: Linters, licenses, dependencies and documentation generation
122171
runs-on: ubuntu-latest
123-
name: Generate documents
124172
steps:
125-
- uses: actions/checkout@master
126-
- uses: actions/cache@v1
173+
- name: Checkout Spark repository
174+
uses: actions/checkout@v2
175+
- name: Cache Maven local repository
176+
uses: actions/cache@v2
127177
with:
128178
path: ~/.m2/repository
129179
key: docs-maven-repo-${{ hashFiles('**/pom.xml') }}
130180
restore-keys: |
131-
docs-maven-repo-
132-
- uses: actions/setup-java@v1
181+
docs-maven-
182+
- name: Install JDK 1.8
183+
uses: actions/setup-java@v1
133184
with:
134-
java-version: '1.8'
135-
- uses: actions/setup-python@v1
185+
java-version: 1.8
186+
- name: Install Python 3.6
187+
uses: actions/setup-python@v2
136188
with:
137-
python-version: '3.x'
138-
architecture: 'x64'
139-
- uses: actions/setup-ruby@v1
189+
python-version: 3.6
190+
architecture: x64
191+
- name: Install Python linter dependencies
192+
run: |
193+
pip3 install flake8 sphinx numpy
194+
- name: Install R 3.6
195+
uses: r-lib/actions/setup-r@v1
140196
with:
141-
ruby-version: '2.7'
142-
- uses: r-lib/actions/setup-r@v1
197+
r-version: 3.6
198+
- name: Install R linter dependencies and SparkR
199+
run: |
200+
sudo apt-get install -y libcurl4-openssl-dev
201+
sudo Rscript -e "install.packages(c('devtools'), repos='https://cloud.r-project.org/')"
202+
sudo Rscript -e "devtools::install_github('jimhester/lintr@v2.0.0')"
203+
./R/install-dev.sh
204+
- name: Install Ruby 2.7 for documentation generation
205+
uses: actions/setup-ruby@v1
143206
with:
144-
r-version: '3.6.2'
145-
- name: Install lib and pandoc
207+
ruby-version: 2.7
208+
- name: Install dependencies for documentation generation
146209
run: |
147210
sudo apt-get install -y libcurl4-openssl-dev pandoc
148-
- name: Install packages
149-
run: |
150211
pip install sphinx mkdocs numpy
151212
gem install jekyll jekyll-redirect-from rouge
152-
sudo Rscript -e "install.packages(c('curl', 'xml2', 'httr', 'devtools', 'testthat', 'knitr', 'rmarkdown', 'roxygen2', 'e1071', 'survival'), repos='https://cloud.r-project.org/')"
153-
- name: Run jekyll build
213+
sudo Rscript -e "install.packages(c('devtools', 'testthat', 'knitr', 'rmarkdown', 'roxygen2'), repos='https://cloud.r-project.org/')"
214+
- name: Scala linter
215+
run: ./dev/lint-scala
216+
- name: Java linter
217+
run: ./dev/lint-java
218+
- name: Python linter
219+
run: ./dev/lint-python
220+
- name: R linter
221+
run: ./dev/lint-r
222+
- name: License test
223+
run: ./dev/check-license
224+
- name: Dependencies test
225+
run: ./dev/test-dependencies.sh
226+
- name: Run documentation build
154227
run: |
155228
cd docs
156229
jekyll build
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one or more
3+
* contributor license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright ownership.
5+
* The ASF licenses this file to You under the Apache License, Version 2.0
6+
* (the "License"); you may not use this file except in compliance with
7+
* the License. You may obtain a copy of the License at
8+
*
9+
* http://www.apache.org/licenses/LICENSE-2.0
10+
*
11+
* Unless required by applicable law or agreed to in writing, software
12+
* distributed under the License is distributed on an "AS IS" BASIS,
13+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
* See the License for the specific language governing permissions and
15+
* limitations under the License.
16+
*/
17+
18+
package org.apache.spark.tags;
19+
20+
import org.scalatest.TagAnnotation;
21+
22+
import java.lang.annotation.ElementType;
23+
import java.lang.annotation.Retention;
24+
import java.lang.annotation.RetentionPolicy;
25+
import java.lang.annotation.Target;
26+
27+
@TagAnnotation
28+
@Retention(RetentionPolicy.RUNTIME)
29+
@Target({ElementType.METHOD, ElementType.TYPE})
30+
public @interface GitHubActionsUnstableTest { }
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one or more
3+
* contributor license agreements. See the NOTICE file distributed with
4+
* this work for additional information regarding copyright ownership.
5+
* The ASF licenses this file to You under the Apache License, Version 2.0
6+
* (the "License"); you may not use this file except in compliance with
7+
* the License. You may obtain a copy of the License at
8+
*
9+
* http://www.apache.org/licenses/LICENSE-2.0
10+
*
11+
* Unless required by applicable law or agreed to in writing, software
12+
* distributed under the License is distributed on an "AS IS" BASIS,
13+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
* See the License for the specific language governing permissions and
15+
* limitations under the License.
16+
*/
17+
18+
package org.apache.spark.tags;
19+
20+
import org.scalatest.TagAnnotation;
21+
22+
import java.lang.annotation.ElementType;
23+
import java.lang.annotation.Retention;
24+
import java.lang.annotation.RetentionPolicy;
25+
import java.lang.annotation.Target;
26+
27+
@TagAnnotation
28+
@Retention(RetentionPolicy.RUNTIME)
29+
@Target({ElementType.METHOD, ElementType.TYPE})
30+
public @interface SlowHiveTest { }

0 commit comments

Comments
 (0)