Skip to content

Commit

Permalink
Revert "[SPARK-40039][SS] Introducing a streaming checkpoint file man…
Browse files Browse the repository at this point in the history
…ager based on Hadoop's Abortable interface"

This reverts commit 7e4064c.
  • Loading branch information
HyukjinKwon committed Aug 27, 2022
1 parent 8fb8532 commit fb4dba1
Show file tree
Hide file tree
Showing 11 changed files with 59 additions and 539 deletions.
12 changes: 3 additions & 9 deletions docs/cloud-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -231,15 +231,9 @@ The size of the window needs to be set to handle this.
is no need for a workflow of write-then-rename to ensure that files aren't picked up
while they are still being written. Applications can write straight to the monitored directory.

1. In case of the default checkpoint file manager called `FileContextBasedCheckpointFileManager`
streams should only be checkpointed to a store implementing a fast and
atomic `rename()` operation. Otherwise the checkpointing may be slow and potentially unreliable.
On AWS S3 with Hadoop 3.3.1 or later using the S3A connector the abortable stream based checkpoint
file manager can be used (by setting the `spark.sql.streaming.checkpointFileManagerClass`
configuration to `org.apache.spark.internal.io.cloud.AbortableStreamBasedCheckpointFileManager`)
which eliminates the slow rename. In this case users must be extra careful to avoid the reuse of
the checkpoint location among multiple queries running parallelly as that could lead to corruption
of the checkpointing data.
1. Streams should only be checkpointed to a store implementing a fast and
atomic `rename()` operation.
Otherwise the checkpointing may be slow and potentially unreliable.

## Committing work into cloud storage safely and fast.

Expand Down
20 changes: 0 additions & 20 deletions hadoop-cloud/README.md

This file was deleted.

47 changes: 0 additions & 47 deletions hadoop-cloud/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,6 @@
<version>${project.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.binary.version}</artifactId>
Expand Down Expand Up @@ -219,22 +212,6 @@

<build>
<plugins>
<plugin>
<groupId>org.scalatest</groupId>
<artifactId>scalatest-maven-plugin</artifactId>
<executions>
<execution>
<id>test</id>
<phase>test</phase>
<goals>
<goal>test</goal>
</goals>
<configuration>
<tagsToExclude>org.apache.spark.internal.io.cloud.IntegrationTestSuite</tagsToExclude>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>build-helper-maven-plugin</artifactId>
Expand Down Expand Up @@ -320,30 +297,6 @@
</dependencies>
</profile>

<profile>
<id>integration-test</id>
<build>
<plugins>
<plugin>
<groupId>org.scalatest</groupId>
<artifactId>scalatest-maven-plugin</artifactId>
<executions>
<execution>
<id>test</id>
<phase>test</phase>
<goals>
<goal>test</goal>
</goals>
<configuration>
<tagsToExclude>None</tagsToExclude>
<tagsToInclude>org.apache.spark.internal.io.cloud.IntegrationTestSuite</tagsToInclude>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>
</profiles>

</project>

This file was deleted.

This file was deleted.

This file was deleted.

Loading

0 comments on commit fb4dba1

Please sign in to comment.