-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-18817][SPARKR][SQL] change derby log output to temp dir #16330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
928c2f1
8fc7033
b001730
ca4f08e
4604a53
2eb75f8
ac9fbfc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -60,6 +60,7 @@ unsetHiveContext <- function() { | |
|
||
# Tests for SparkSQL functions in SparkR | ||
|
||
filesBefore <- list.files(path = sparkRDir, all.files = TRUE) | ||
sparkSession <- sparkR.session() | ||
sc <- callJStatic("org.apache.spark.sql.api.r.SQLUtils", "getJavaSparkContext", sparkSession) | ||
|
||
|
@@ -2909,6 +2910,39 @@ test_that("Collect on DataFrame when NAs exists at the top of a timestamp column | |
expect_equal(class(ldf3$col3), c("POSIXct", "POSIXt")) | ||
}) | ||
|
||
compare_list <- function(list1, list2) { | ||
# get testthat to show the diff by first making the 2 lists equal in length | ||
expect_equal(length(list1), length(list2)) | ||
l <- max(length(list1), length(list2)) | ||
length(list1) <- l | ||
length(list2) <- l | ||
expect_equal(sort(list1, na.last = TRUE), sort(list2, na.last = TRUE)) | ||
} | ||
|
||
# This should always be the **very last test** in this test file. | ||
test_that("No extra files are created in SPARK_HOME by starting session and making calls", { | ||
# Check that it is not creating any extra file. | ||
# Does not check the tempdir which would be cleaned up after. | ||
filesAfter <- list.files(path = sparkRDir, all.files = TRUE) | ||
|
||
expect_true(length(sparkRFilesBefore) > 0) | ||
# first, ensure derby.log is not there | ||
expect_false("derby.log" %in% filesAfter) | ||
# second, ensure only spark-warehouse is created when calling SparkSession, enableHiveSupport = F | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a little confused how these two setdiff commands map to with or without hive support. Can we make this a bit more easier to understand ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. agreed. |
||
# note: currently all other test files have enableHiveSupport = F, so we capture the list of files | ||
# before creating a SparkSession with enableHiveSupport = T at the top of this test file | ||
# (filesBefore). The test here is to compare that (filesBefore) against the list of files before | ||
# any test is run in run-all.R (sparkRFilesBefore). | ||
# sparkRWhitelistSQLDirs is also defined in run-all.R, and should contain only 2 whitelisted dirs, | ||
# here allow the first value, spark-warehouse, in the diff, everything else should be exactly the | ||
# same as before any test is run. | ||
compare_list(sparkRFilesBefore, setdiff(filesBefore, sparkRWhitelistSQLDirs[[1]])) | ||
# third, ensure only spark-warehouse and metastore_db are created when enableHiveSupport = T | ||
# note: as the note above, after running all tests in this file while enableHiveSupport = T, we | ||
# check the list of files again. This time we allow both whitelisted dirs to be in the diff. | ||
compare_list(sparkRFilesBefore, setdiff(filesAfter, sparkRWhitelistSQLDirs)) | ||
}) | ||
|
||
unlink(parquetPath) | ||
unlink(orcPath) | ||
unlink(jsonPath) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lengths should be equal if we get to this line ? Or am I missing something ?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the idea is to show enough information in the log without having to rerun the check manually to find out what is different.
the first check will show the numeric values but it wouldn't say how exactly they are different.
the next check (or moved to compare_list() here) will get testthat to dump the delta too, but first it must set the 2 lists into the same size etc..
in fact, all of these are well tested in "Check masked functions" test in test_context.R, just duplicated here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here's what it looks like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it - that sounds good