Skip to content

[SPARK-29248][SQL] Add PhysicalWriteInfo with number of partitions #25990

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 46 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
c281373
[SPARK-29248][SQL] Pass in number of partitions to WriteBuilder
edrevo Sep 27, 2019
903d863
move numpartitions to physicalbuildinfo
edrevo Nov 15, 2019
c6b95f5
Merge branch 'master' of https://github.com/apache/spark into add-par…
edrevo Nov 15, 2019
30c8800
fixes
edrevo Nov 15, 2019
15096b2
fixes
edrevo Nov 15, 2019
389afee
lint fixes
edrevo Nov 15, 2019
4f10e54
[SPARK-29655][SQL] Read bucketed tables obeys spark.sql.shuffle.parti…
wangyum Nov 15, 2019
ca4e894
PR feedback
edrevo Nov 15, 2019
16e3c4a
more pr feedback
edrevo Nov 15, 2019
ee4784b
[SPARK-26499][SQL][FOLLOW-UP] Replace `update` with `setByte` for Byt…
maropu Nov 15, 2019
1521889
[SPARK-29902][DOC][MINOR] Add listener event queue capacity configura…
shahidki31 Nov 15, 2019
848bdfa
[SPARK-29829][SQL] SHOW TABLE EXTENDED should do multi-catalog resolu…
planga82 Nov 15, 2019
c0507e0
[SPARK-29833][YARN] Add FileNotFoundException check for spark.yarn.jars
ulysses-you Nov 16, 2019
7720781
[SPARK-29127][SQL][PYTHON] Add a clue for Python related version info…
HyukjinKwon Nov 16, 2019
16e7195
[SPARK-29834][SQL] DESC DATABASE should look up catalog like v2 commands
fuwhu Nov 16, 2019
6d6b233
[SPARK-29343][SQL][FOLLOW-UP] Remove floating-point Sum/Average/Centr…
maropu Nov 16, 2019
1112fc6
[SPARK-29867][ML][PYTHON] Add __repr__ in Python ML Models
huaxingao Nov 16, 2019
f77c10d
[SPARK-29923][SQL][TESTS] Set io.netty.tryReflectionSetAccessible for…
dongjoon-hyun Nov 16, 2019
40ea4a1
[SPARK-29807][SQL] Rename "spark.sql.ansi.enabled" to "spark.sql.dial…
xuanyuanking Nov 16, 2019
d0470d6
[MINOR][TESTS] Ignore GitHub Action and AppVeyor file changes in testing
dongjoon-hyun Nov 16, 2019
5336473
[SPARK-29476][WEBUI] add tooltip for Thread
PavithraRamachandran Nov 16, 2019
e88267c
[SPARK-29928][SQL][TESTS] Check parsing timestamps up to microsecond …
MaxGekk Nov 17, 2019
cc12cf6
[SPARK-29378][R] Upgrade SparkR to use Arrow 0.15 API
dongjoon-hyun Nov 17, 2019
388a737
[SPARK-29858][SQL] ALTER DATABASE (SET DBPROPERTIES) should look up c…
fuwhu Nov 17, 2019
a9959be
[SPARK-29456][WEBUI] Improve tooltip for Session Statistics Table col…
PavithraRamachandran Nov 17, 2019
e1fc38b
[SPARK-29932][R][TESTS] lint-r should do non-zero exit in case of errors
dongjoon-hyun Nov 17, 2019
5eb8973
[SPARK-29930][SQL] Remove SQL configs declared to be removed in Spark…
MaxGekk Nov 17, 2019
c5f644c
[SPARK-16872][ML][PYSPARK] Impl Gaussian Naive Bayes Classifier
zhengruifeng Nov 18, 2019
d83cacf
[SPARK-29907][SQL] Move DELETE/UPDATE/MERGE relative rules to dmlStat…
Nov 18, 2019
f280c6a
[SPARK-29378][R][FOLLOW-UP] Remove manual installation of Arrow depen…
HyukjinKwon Nov 18, 2019
42f8f79
[SPARK-29936][R] Fix SparkR lint errors and add lint-r GitHub Action
dongjoon-hyun Nov 18, 2019
9ff8ac7
javadoc fixes
edrevo Nov 18, 2019
ee3bd6d
[SPARK-25694][SQL] Add a config for `URL.setURLStreamHandlerFactory`
jiangzho Nov 18, 2019
7391237
[SPARK-29020][SQL] Improving array_sort behaviour
Nov 18, 2019
5cebe58
[SPARK-29783][SQL] Support SQL Standard/ISO_8601 output style for int…
yaooqinn Nov 18, 2019
50f6d93
[SPARK-29870][SQL] Unify the logic of multi-units interval string to …
yaooqinn Nov 18, 2019
c32e228
[SPARK-29859][SQL] ALTER DATABASE (SET LOCATION) should look up catal…
fuwhu Nov 18, 2019
ae6b711
[SPARK-29941][SQL] Add ansi type aliases for char and decimal
yaooqinn Nov 18, 2019
ea010a2
[SPARK-29873][SQL][TEST][FOLLOWUP] set operations should not escape w…
yaooqinn Nov 18, 2019
9514b82
[SPARK-29777][SPARKR] SparkR::cleanClosure aggressively removes a fun…
falaki Nov 19, 2019
8469614
[SPARK-25694][SQL][FOLLOW-UP] Move 'spark.sql.defaultUrlStreamHandler…
HyukjinKwon Nov 19, 2019
882f54b
[SPARK-29870][SQL][FOLLOW-UP] Keep CalendarInterval's toString
HyukjinKwon Nov 19, 2019
28a502c
[SPARK-28527][FOLLOW-UP][SQL][TEST] Add guides for ThriftServerQueryT…
wangyum Nov 19, 2019
a834dba
Revert "[SPARK-29644][SQL] Corrected ShortType and ByteType mapping t…
shivsood Nov 19, 2019
3d45779
[SPARK-29728][SQL] Datasource V2: Support ALTER TABLE RENAME TO
imback82 Nov 19, 2019
118d81f
[SPARK-29248][SQL] Add PhysicalWriteInfo with number of partitions
edrevo Nov 19, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion .github/workflows/master.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ jobs:

lint:
runs-on: ubuntu-latest
name: Linters
name: Linters (Java/Scala/Python), licenses, dependencies
steps:
- uses: actions/checkout@master
- uses: actions/setup-java@v1
Expand All @@ -72,3 +72,26 @@ jobs:
run: ./dev/check-license
- name: Dependencies
run: ./dev/test-dependencies.sh

lintr:
runs-on: ubuntu-latest
name: Linter (R)
steps:
- uses: actions/checkout@master
- uses: actions/setup-java@v1
with:
java-version: '11'
- name: install R
run: |
echo 'deb https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/' | sudo tee -a /etc/apt/sources.list
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
sudo apt-get update
sudo apt-get install -y r-base r-base-dev libcurl4-openssl-dev
- name: install R packages
run: |
sudo Rscript -e "install.packages(c('curl', 'xml2', 'httr', 'devtools', 'testthat', 'knitr', 'rmarkdown', 'roxygen2', 'e1071', 'survival'), repos='https://cloud.r-project.org/')"
sudo Rscript -e "devtools::install_github('jimhester/lintr@v2.0.0')"
- name: package and install SparkR
run: ./R/install-dev.sh
- name: lint-r
run: ./dev/lint-r
2 changes: 1 addition & 1 deletion R/pkg/.lintr
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
linters: with_defaults(line_length_linter(100), multiple_dots_linter = NULL, object_name_linter = NULL, camel_case_linter = NULL, open_curly_linter(allow_single_line = TRUE), closed_curly_linter(allow_single_line = TRUE))
linters: with_defaults(line_length_linter(100), multiple_dots_linter = NULL, object_name_linter = NULL, camel_case_linter = NULL, open_curly_linter(allow_single_line = TRUE), closed_curly_linter(allow_single_line = TRUE), object_usage_linter = NULL, cyclocomp_linter = NULL)
exclusions: list("inst/profile/general.R" = 1, "inst/profile/shell.R")
8 changes: 4 additions & 4 deletions R/pkg/R/DataFrame.R
Original file line number Diff line number Diff line change
Expand Up @@ -2252,7 +2252,7 @@ setMethod("mutate",

# The last column of the same name in the specific columns takes effect
deDupCols <- list()
for (i in 1:length(cols)) {
for (i in seq_len(length(cols))) {
deDupCols[[ns[[i]]]] <- alias(cols[[i]], ns[[i]])
}

Expand Down Expand Up @@ -2416,7 +2416,7 @@ setMethod("arrange",
# builds a list of columns of type Column
# example: [[1]] Column Species ASC
# [[2]] Column Petal_Length DESC
jcols <- lapply(seq_len(length(decreasing)), function(i){
jcols <- lapply(seq_len(length(decreasing)), function(i) {
if (decreasing[[i]]) {
desc(getColumn(x, by[[i]]))
} else {
Expand Down Expand Up @@ -2749,7 +2749,7 @@ genAliasesForIntersectedCols <- function(x, intersectedColNames, suffix) {
col <- getColumn(x, colName)
if (colName %in% intersectedColNames) {
newJoin <- paste(colName, suffix, sep = "")
if (newJoin %in% allColNames){
if (newJoin %in% allColNames) {
stop("The following column name: ", newJoin, " occurs more than once in the 'DataFrame'.",
"Please use different suffixes for the intersected columns.")
}
Expand Down Expand Up @@ -3475,7 +3475,7 @@ setMethod("str",
cat(paste0("'", class(object), "': ", length(names), " variables:\n"))

if (nrow(localDF) > 0) {
for (i in 1 : ncol(localDF)) {
for (i in seq_len(ncol(localDF))) {
# Get the first elements for each column

firstElements <- if (types[i] == "character") {
Expand Down
8 changes: 4 additions & 4 deletions R/pkg/R/SQLContext.R
Original file line number Diff line number Diff line change
Expand Up @@ -166,9 +166,9 @@ writeToFileInArrow <- function(fileName, rdf, numPartitions) {
for (rdf_slice in rdf_slices) {
batch <- arrow::record_batch(rdf_slice)
if (is.null(stream_writer)) {
stream <- arrow::FileOutputStream(fileName)
stream <- arrow::FileOutputStream$create(fileName)
schema <- batch$schema
stream_writer <- arrow::RecordBatchStreamWriter(stream, schema)
stream_writer <- arrow::RecordBatchStreamWriter$create(stream, schema)
}

stream_writer$write_batch(batch)
Expand Down Expand Up @@ -197,7 +197,7 @@ getSchema <- function(schema, firstRow = NULL, rdd = NULL) {
as.list(schema)
}
if (is.null(names)) {
names <- lapply(1:length(firstRow), function(x) {
names <- lapply(seq_len(length(firstRow)), function(x) {
paste0("_", as.character(x))
})
}
Expand All @@ -213,7 +213,7 @@ getSchema <- function(schema, firstRow = NULL, rdd = NULL) {
})

types <- lapply(firstRow, infer_type)
fields <- lapply(1:length(firstRow), function(i) {
fields <- lapply(seq_len(length(firstRow)), function(i) {
structField(names[[i]], types[[i]], TRUE)
})
schema <- do.call(structType, fields)
Expand Down
2 changes: 1 addition & 1 deletion R/pkg/R/context.R
Original file line number Diff line number Diff line change
Expand Up @@ -416,7 +416,7 @@ spark.getSparkFiles <- function(fileName) {
#' @examples
#'\dontrun{
#' sparkR.session()
#' doubled <- spark.lapply(1:10, function(x){2 * x})
#' doubled <- spark.lapply(1:10, function(x) {2 * x})
#'}
#' @note spark.lapply since 2.0.0
spark.lapply <- function(list, func) {
Expand Down
2 changes: 1 addition & 1 deletion R/pkg/R/deserialize.R
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@ readDeserializeInArrow <- function(inputCon) {
# for now.
dataLen <- readInt(inputCon)
arrowData <- readBin(inputCon, raw(), as.integer(dataLen), endian = "big")
batches <- arrow::RecordBatchStreamReader(arrowData)$batches()
batches <- arrow::RecordBatchStreamReader$create(arrowData)$batches()

if (useAsTibble) {
as_tibble <- get("as_tibble", envir = asNamespace("arrow"))
Expand Down
2 changes: 1 addition & 1 deletion R/pkg/R/group.R
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ methods <- c("avg", "max", "mean", "min", "sum")
#' @note pivot since 2.0.0
setMethod("pivot",
signature(x = "GroupedData", colname = "character"),
function(x, colname, values = list()){
function(x, colname, values = list()) {
stopifnot(length(colname) == 1)
if (length(values) == 0) {
result <- callJMethod(x@sgd, "pivot", colname)
Expand Down
14 changes: 9 additions & 5 deletions R/pkg/R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ hashCode <- function(key) {
} else {
asciiVals <- sapply(charToRaw(key), function(x) { strtoi(x, 16L) })
hashC <- 0
for (k in 1:length(asciiVals)) {
for (k in seq_len(length(asciiVals))) {
hashC <- mult31AndAdd(hashC, asciiVals[k])
}
as.integer(hashC)
Expand Down Expand Up @@ -543,10 +543,14 @@ processClosure <- function(node, oldEnv, defVars, checkedFuncs, newEnv) {
funcList <- mget(nodeChar, envir = checkedFuncs, inherits = F,
ifnotfound = list(list(NULL)))[[1]]
found <- sapply(funcList, function(func) {
ifelse(identical(func, obj), TRUE, FALSE)
ifelse(
identical(func, obj) &&
# Also check if the parent environment is identical to current parent
identical(parent.env(environment(func)), func.env),
TRUE, FALSE)
})
if (sum(found) > 0) {
# If function has been examined, ignore.
# If function has been examined ignore
break
}
# Function has not been examined, record it and recursively clean its closure.
Expand Down Expand Up @@ -724,7 +728,7 @@ assignNewEnv <- function(data) {
stopifnot(length(cols) > 0)

env <- new.env()
for (i in 1:length(cols)) {
for (i in seq_len(length(cols))) {
assign(x = cols[i], value = data[, cols[i], drop = F], envir = env)
}
env
Expand All @@ -750,7 +754,7 @@ launchScript <- function(script, combinedArgs, wait = FALSE, stdout = "", stderr
if (.Platform$OS.type == "windows") {
scriptWithArgs <- paste(script, combinedArgs, sep = " ")
# on Windows, intern = F seems to mean output to the console. (documentation on this is missing)
shell(scriptWithArgs, translate = TRUE, wait = wait, intern = wait) # nolint
shell(scriptWithArgs, translate = TRUE, wait = wait, intern = wait)
} else {
# http://stat.ethz.ch/R-manual/R-devel/library/base/html/system2.html
# stdout = F means discard output
Expand Down
2 changes: 1 addition & 1 deletion R/pkg/inst/worker/worker.R
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ if (isEmpty != 0) {
} else {
# gapply mode
outputs <- list()
for (i in 1:length(data)) {
for (i in seq_len(length(data))) {
# Timing reading input data for execution
inputElap <- elapsedSecs()
output <- compute(mode, partition, serializer, deserializer, keys[[i]],
Expand Down
4 changes: 2 additions & 2 deletions R/pkg/tests/fulltests/test_sparkSQL.R
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ test_that("structField type strings", {
typeList <- c(primitiveTypes, complexTypes)
typeStrings <- names(typeList)

for (i in seq_along(typeStrings)){
for (i in seq_along(typeStrings)) {
typeString <- typeStrings[i]
expected <- typeList[[i]]
testField <- structField("_col", typeString)
Expand Down Expand Up @@ -203,7 +203,7 @@ test_that("structField type strings", {
errorList <- c(primitiveErrors, complexErrors)
typeStrings <- names(errorList)

for (i in seq_along(typeStrings)){
for (i in seq_along(typeStrings)) {
typeString <- typeStrings[i]
expected <- paste0("Unsupported type for SparkDataframe: ", errorList[[i]])
expect_error(structField("_col", typeString), expected)
Expand Down
9 changes: 9 additions & 0 deletions R/pkg/tests/fulltests/test_utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,15 @@ test_that("cleanClosure on R functions", {
actual <- get("y", envir = env, inherits = FALSE)
expect_equal(actual, y)

# Test for combination for nested and sequenctial functions in a closure
f1 <- function(x) x + 1
f2 <- function(x) f1(x) + 2
userFunc <- function(x) { f1(x); f2(x) }
cUserFuncEnv <- environment(cleanClosure(userFunc))
expect_equal(length(cUserFuncEnv), 2)
innerCUserFuncEnv <- environment(cUserFuncEnv$f2)
expect_equal(length(innerCUserFuncEnv), 1)

# Test for function (and variable) definitions.
f <- function(x) {
g <- function(y) { y * 2 }
Expand Down
2 changes: 1 addition & 1 deletion R/run-tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ FAILED=0
LOGFILE=$FWDIR/unit-tests.out
rm -f $LOGFILE

SPARK_TESTING=1 NOT_CRAN=true $FWDIR/../bin/spark-submit --driver-java-options "-Dlog4j.configuration=file:$FWDIR/log4j.properties" --conf spark.hadoop.fs.defaultFS="file:///" $FWDIR/pkg/tests/run-all.R 2>&1 | tee -a $LOGFILE
SPARK_TESTING=1 NOT_CRAN=true $FWDIR/../bin/spark-submit --driver-java-options "-Dlog4j.configuration=file:$FWDIR/log4j.properties" --conf spark.hadoop.fs.defaultFS="file:///" --conf spark.driver.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" --conf spark.executor.extraJavaOptions="-Dio.netty.tryReflectionSetAccessible=true" $FWDIR/pkg/tests/run-all.R 2>&1 | tee -a $LOGFILE
FAILED=$((PIPESTATUS[0]||$FAILED))

NUM_TEST_WARNING="$(grep -c -e 'Warnings ----------------' $LOGFILE)"
Expand Down
5 changes: 1 addition & 4 deletions appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,7 @@ install:
# Install maven and dependencies
- ps: .\dev\appveyor-install-dependencies.ps1
# Required package for R unit tests
- cmd: R -e "install.packages(c('knitr', 'rmarkdown', 'e1071', 'survival'), repos='https://cloud.r-project.org/')"
# Use Arrow R 0.14.1 for now. 0.15.0 seems not working for now. See SPARK-29378.
- cmd: R -e "install.packages(c('assertthat', 'bit64', 'fs', 'purrr', 'R6', 'tidyselect'), repos='https://cloud.r-project.org/')"
- cmd: R -e "install.packages('https://cran.r-project.org/src/contrib/Archive/arrow/arrow_0.14.1.tar.gz', repos=NULL, type='source')"
- cmd: R -e "install.packages(c('knitr', 'rmarkdown', 'e1071', 'survival', 'arrow'), repos='https://cloud.r-project.org/')"
# Here, we use the fixed version of testthat. For more details, please see SPARK-22817.
# As of devtools 2.1.0, it requires testthat higher then 2.1.1 as a dependency. SparkR test requires testthat 1.0.2.
# Therefore, we don't use devtools but installs it directly from the archive including its dependencies.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,36 +46,6 @@ public void equalsTest() {
assertEquals(i1, i6);
}

@Test
public void toStringTest() {
CalendarInterval i;

i = new CalendarInterval(0, 0, 0);
assertEquals("0 seconds", i.toString());

i = new CalendarInterval(34, 0, 0);
assertEquals("2 years 10 months", i.toString());

i = new CalendarInterval(-34, 0, 0);
assertEquals("-2 years -10 months", i.toString());

i = new CalendarInterval(0, 31, 0);
assertEquals("31 days", i.toString());

i = new CalendarInterval(0, -31, 0);
assertEquals("-31 days", i.toString());

i = new CalendarInterval(0, 0, 3 * MICROS_PER_HOUR + 13 * MICROS_PER_MINUTE + 123);
assertEquals("3 hours 13 minutes 0.000123 seconds", i.toString());

i = new CalendarInterval(0, 0, -3 * MICROS_PER_HOUR - 13 * MICROS_PER_MINUTE - 123);
assertEquals("-3 hours -13 minutes -0.000123 seconds", i.toString());

i = new CalendarInterval(34, 31, 3 * MICROS_PER_HOUR + 13 * MICROS_PER_MINUTE + 123);
assertEquals("2 years 10 months 31 days 3 hours 13 minutes 0.000123 seconds",
i.toString());
}

@Test
public void periodAndDurationTest() {
CalendarInterval interval = new CalendarInterval(120, -40, 123456);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,12 @@ private[ui] class ExecutorThreadDumpPage(
<th onClick="collapseAllThreadStackTrace(false)">Thread ID</th>
<th onClick="collapseAllThreadStackTrace(false)">Thread Name</th>
<th onClick="collapseAllThreadStackTrace(false)">Thread State</th>
<th onClick="collapseAllThreadStackTrace(false)">Thread Locks</th>
<th onClick="collapseAllThreadStackTrace(false)">
<span data-toggle="tooltip" data-placement="top"
title="Objects whose lock the thread currently holds">
Thread Locks
</span>
</th>
</thead>
<tbody>{dumpRows}</tbody>
</table>
Expand Down
5 changes: 4 additions & 1 deletion dev/lint-r
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,17 @@
# limitations under the License.
#

set -o pipefail
set -e

SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )"
SPARK_ROOT_DIR="$(dirname $SCRIPT_DIR)"
LINT_R_REPORT_FILE_NAME="$SPARK_ROOT_DIR/dev/lint-r-report.log"


if ! type "Rscript" > /dev/null; then
echo "ERROR: You should install R"
exit
exit 1
fi

`which Rscript` --vanilla "$SPARK_ROOT_DIR/dev/lint-r.R" "$SPARK_ROOT_DIR" | tee "$LINT_R_REPORT_FILE_NAME"
Expand Down
2 changes: 1 addition & 1 deletion dev/lint-r.R
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ if (! library(SparkR, lib.loc = LOCAL_LIB_LOC, logical.return = TRUE)) {
# Installs lintr from Github in a local directory.
# NOTE: The CRAN's version is too old to adapt to our rules.
if ("lintr" %in% row.names(installed.packages()) == FALSE) {
devtools::install_github("jimhester/lintr@5431140")
devtools::install_github("jimhester/lintr@v2.0.0")
}

library(lintr)
Expand Down
7 changes: 6 additions & 1 deletion dev/run-tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,15 +43,20 @@ def determine_modules_for_files(filenames):
"""
Given a list of filenames, return the set of modules that contain those files.
If a file is not associated with a more specific submodule, then this method will consider that
file to belong to the 'root' module.
file to belong to the 'root' module. GitHub Action and Appveyor files are ignored.

>>> sorted(x.name for x in determine_modules_for_files(["python/pyspark/a.py", "sql/core/foo"]))
['pyspark-core', 'sql']
>>> [x.name for x in determine_modules_for_files(["file_not_matched_by_any_subproject"])]
['root']
>>> [x.name for x in determine_modules_for_files( \
[".github/workflows/master.yml", "appveyor.yml"])]
[]
"""
changed_modules = set()
for filename in filenames:
if filename in (".github/workflows/master.yml", "appveyor.yml"):
continue
matched_at_least_one_module = False
for module in modules.all_modules:
if module.contains_file(filename):
Expand Down
45 changes: 45 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -1857,6 +1857,51 @@ Apart from these, the following properties are also available, and may be useful
driver using more memory.
</td>
</tr>
<tr>
<td><code>spark.scheduler.listenerbus.eventqueue.shared.capacity</code></td>
<td><code>spark.scheduler.listenerbus.eventqueue.capacity</code></td>
<td>
Capacity for shared event queue in Spark listener bus, which hold events for external listener(s)
that register to the listener bus. Consider increasing value, if the listener events corresponding
to shared queue are dropped. Increasing this value may result in the driver using more memory.
</td>
</tr>
<tr>
<td><code>spark.scheduler.listenerbus.eventqueue.appStatus.capacity</code></td>
<td><code>spark.scheduler.listenerbus.eventqueue.capacity</code></td>
<td>
Capacity for appStatus event queue, which hold events for internal application status listeners.
Consider increasing value, if the listener events corresponding to appStatus queue are dropped.
Increasing this value may result in the driver using more memory.
</td>
</tr>
<tr>
<td><code>spark.scheduler.listenerbus.eventqueue.executorManagement.capacity</code></td>
<td><code>spark.scheduler.listenerbus.eventqueue.capacity</code></td>
<td>
Capacity for executorManagement event queue in Spark listener bus, which hold events for internal
executor management listeners. Consider increasing value if the listener events corresponding to
executorManagement queue are dropped. Increasing this value may result in the driver using more memory.
</td>
</tr>
<tr>
<td><code>spark.scheduler.listenerbus.eventqueue.eventLog.capacity</code></td>
<td><code>spark.scheduler.listenerbus.eventqueue.capacity</code></td>
<td>
Capacity for eventLog queue in Spark listener bus, which hold events for Event logging listeners
that write events to eventLogs. Consider increasing value if the listener events corresponding to eventLog queue
are dropped. Increasing this value may result in the driver using more memory.
</td>
</tr>
<tr>
<td><code>spark.scheduler.listenerbus.eventqueue.streams.capacity</code></td>
<td><code>spark.scheduler.listenerbus.eventqueue.capacity</code></td>
<td>
Capacity for streams queue in Spark listener bus, which hold events for internal streaming listener.
Consider increasing value if the listener events corresponding to streams queue are dropped. Increasing
this value may result in the driver using more memory.
</td>
</tr>
<tr>
<td><code>spark.scheduler.blacklist.unschedulableTaskSetTimeout</code></td>
<td>120s</td>
Expand Down
Loading