Skip to content

[SPARK-6578] [core] Fix thread-safety issue in outbound path of network library. #5336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented Apr 2, 2015

While the inbound path of a netty pipeline is thread-safe, the outbound
path is not. That means that multiple threads can compete to write messages
to the next stage of the pipeline.

The network library sometimes breaks a single RPC message into multiple
buffers internally to avoid copying data (see MessageEncoder). This can
result in the following scenario (where "FxBy" means "frame x, buffer y"):

           T1         F1B1            F1B2
                        \               \
                         \               \
           socket        F1B1   F2B1    F1B2  F2B2
                                 /             /
                                /             /
           T2                  F2B1         F2B2

And the frames now cannot be rebuilt on the receiving side because the
different messages have been mixed up on the wire.

The fix wraps these multi-buffer messages into a FileRegion object
so that these messages are written "atomically" to the next pipeline handler.

Marcelo Vanzin and others added 2 commits April 2, 2015 12:56
…rk library.

While the inbound path of a netty pipeline is thread-safe, the outbound
path is not. That means that multiple threads can compete to write messages
to the next stage of the pipeline.

The network library sometimes breaks a single RPC message into multiple
buffers internally to avoid copying data (see MessageEncoder). This can
result in the following scenario (where "FxBy" means "frame x, buffer y"):

               T1         F1B1            F1B2
                            \               \
                             \               \
               socket        F1B1   F2B1    F1B2  F2B2
                                     /             /
                                    /             /
               T2                  F2B1         F2B2

And the frames now cannot be rebuilt on the receiving side because the
different messages have been mixed up on the wire.

The fix wraps these multi-buffer messages into a `FileRegion` object
so that these messages are written "atomically" to the next pipeline handler.

Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes apache#5234 from vanzin/SPARK-6578 and squashes the following commits:

16b2d70 [Marcelo Vanzin] Forgot to update a type.
c9c2e4e [Marcelo Vanzin] Review comments: simplify some code.
9c888ac [Marcelo Vanzin] Small style nits.
8474bab [Marcelo Vanzin] Fix multiple calls to MessageWithHeader.transferTo().
e26509f [Marcelo Vanzin] Merge branch 'master' into SPARK-6578
c503f6c [Marcelo Vanzin] Implement a custom FileRegion instead of using locks.
84aa7ce [Marcelo Vanzin] Rename handler to the correct name.
432f3bd [Marcelo Vanzin] Remove unneeded method.
8d70e60 [Marcelo Vanzin] Fix thread-safety issue in outbound path of network library.

(cherry picked from commit f084c5d)

Conflicts:
	network/common/pom.xml
…hHeader.transferTo.

Author: Reynold Xin <rxin@databricks.com>

Closes apache#5319 from rxin/SPARK-6578 and squashes the following commits:

7c62a64 [Reynold Xin] Small rewrite to make the logic more clear in transferTo.

(cherry picked from commit 899ebcb)
@rxin
Copy link
Contributor

rxin commented Apr 2, 2015

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Apr 2, 2015

Test build #29622 has started for PR 5336 at commit 4d3395e.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Apr 2, 2015

Test build #29622 has finished for PR 5336 at commit 4d3395e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29622/
Test PASSed.

@rxin
Copy link
Contributor

rxin commented Apr 2, 2015

Thanks. Merging.

asfgit pushed a commit that referenced this pull request Apr 2, 2015
…ork library.

While the inbound path of a netty pipeline is thread-safe, the outbound
path is not. That means that multiple threads can compete to write messages
to the next stage of the pipeline.

The network library sometimes breaks a single RPC message into multiple
buffers internally to avoid copying data (see MessageEncoder). This can
result in the following scenario (where "FxBy" means "frame x, buffer y"):

               T1         F1B1            F1B2
                            \               \
                             \               \
               socket        F1B1   F2B1    F1B2  F2B2
                                     /             /
                                    /             /
               T2                  F2B1         F2B2

And the frames now cannot be rebuilt on the receiving side because the
different messages have been mixed up on the wire.

The fix wraps these multi-buffer messages into a `FileRegion` object
so that these messages are written "atomically" to the next pipeline handler.

Author: Reynold Xin <rxin@databricks.com>
Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #5336 from vanzin/SPARK-6578-1.2 and squashes the following commits:

4d3395e [Reynold Xin] [SPARK-6578] Small rewrite to make the logic more clear in MessageWithHeader.transferTo.
526f230 [Marcelo Vanzin] [SPARK-6578] [core] Fix thread-safety issue in outbound path of network library.
@rxin
Copy link
Contributor

rxin commented Apr 2, 2015

Mind closing the PR? Since this is not in master, github won't close the pr automatically.

@vanzin vanzin closed this Apr 2, 2015
@vanzin vanzin deleted the SPARK-6578-1.2 branch April 2, 2015 23:21
markhamstra pushed a commit to markhamstra/spark that referenced this pull request Apr 15, 2015
…ork library.

While the inbound path of a netty pipeline is thread-safe, the outbound
path is not. That means that multiple threads can compete to write messages
to the next stage of the pipeline.

The network library sometimes breaks a single RPC message into multiple
buffers internally to avoid copying data (see MessageEncoder). This can
result in the following scenario (where "FxBy" means "frame x, buffer y"):

               T1         F1B1            F1B2
                            \               \
                             \               \
               socket        F1B1   F2B1    F1B2  F2B2
                                     /             /
                                    /             /
               T2                  F2B1         F2B2

And the frames now cannot be rebuilt on the receiving side because the
different messages have been mixed up on the wire.

The fix wraps these multi-buffer messages into a `FileRegion` object
so that these messages are written "atomically" to the next pipeline handler.

Author: Reynold Xin <rxin@databricks.com>
Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes apache#5336 from vanzin/SPARK-6578-1.2 and squashes the following commits:

4d3395e [Reynold Xin] [SPARK-6578] Small rewrite to make the logic more clear in MessageWithHeader.transferTo.
526f230 [Marcelo Vanzin] [SPARK-6578] [core] Fix thread-safety issue in outbound path of network library.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants