Skip to content
This repository was archived by the owner on Nov 11, 2022. It is now read-only.

Backport apache/incubator-beam#1327 #508

Merged

Conversation

mizitch
Copy link
Contributor

@mizitch mizitch commented Dec 14, 2016

Includes:

  • Limit max memory for ExternalSorter and BufferedExternalSorter to 2047 MB to prevent int overflow within Hadoop's sorting library
  • Fix int overflow for large memory values in InMemorySorter
  • Add note about estimated disk use to README.MD
  • Fix to make Hadoop's sorting library put all temp files under the specified directory
  • Have Hadoop clean up the temp directory on exit
  • Stop shading hadoop dependencies. Some context:
    ** The existing shading is broken (modules that depend on this one cannot use it successfully).
    ** Hadoop's use of reflection in several instances makes shading the dependency "in a good way" nearly impossible. It requires a couple of rather brittle hacks, and, for clients that depend on certain conflicting versions of hadoop these hacks can mean it doesn't meet its intended goal of preventing conflicts anyway.
    ** From what I can tell, there's no good way to shade this to make it universally usable, so leaving it unshaded seems like a reasonable default.
    ** Without shading Hadoop, this module can be successfully used from Beam's wordcount example (which actually does have pre-existing hadoop dependencies already).

@mizitch mizitch changed the title Some minor changes and fixes for sorter module Backport: some minor changes and fixes for sorter module Dec 14, 2016
@mizitch mizitch changed the title Backport: some minor changes and fixes for sorter module Backport apache/incubator-beam#1327 Dec 14, 2016
Some minor changes and fixes for sorter module

Includes:

* Limit max memory for ExternalSorter and BufferedExternalSorter to 2047 MB to prevent int overflow within Hadoop's sorting library
* Fix int overflow for large memory values in InMemorySorter
* Add note about estimated disk use to README.MD
* Fix to make Hadoop's sorting library put all temp files under the specified directory
* Have Hadoop clean up the temp directory on exit
* Stop shading hadoop dependencies. Some context:
** The existing shading is broken (modules that depend on this one cannot use it successfully).
** Hadoop's use of reflection in several instances makes shading the dependency "in a good way" nearly impossible. It requires a couple of rather brittle hacks, and, for clients that depend on certain conflicting versions of hadoop these hacks can mean it doesn't meet its intended goal of preventing conflicts anyway.
** From what I can tell, there's no good way to shade this to make it universally usable, so leaving it unshaded seems like a reasonable default.
** Without shading Hadoop, this module can be successfully used from Beam's wordcount example (which actually does have pre-existing hadoop dependencies already).
@mizitch mizitch force-pushed the backport-sorter-changes3 branch from d22208e to defcd01 Compare December 14, 2016 21:19
@mizitch
Copy link
Contributor Author

mizitch commented Dec 14, 2016

R: @dhalperi

@dhalperi
Copy link
Contributor

LGTM, thanks.

@dhalperi dhalperi merged commit 8981893 into GoogleCloudPlatform:master Dec 16, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants