Skip to content

YARN-1964 Launching containers from docker #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

ashahab-altiscale
Copy link

This adds a new ContainerExecutor called DockerContainerExecutor.
This executor launches a container in a docker container, providing
a full filesystem namespace and software isolation for the container.

@ashahab-altiscale ashahab-altiscale force-pushed the aws-yarn-1964-trunk-docker branch from b06d4c5 to 2b1b75e Compare September 22, 2014 22:51
This adds a new ContainerExecutor called DockerContainerExecutor.
This executor launches a container in a docker container, providing
a full filesystem namespace and software isolation for the container.
@ashahab-altiscale ashahab-altiscale force-pushed the aws-yarn-1964-trunk-docker branch 4 times, most recently from bf1b521 to a9a896d Compare September 24, 2014 19:02
@ashahab-altiscale ashahab-altiscale deleted the aws-yarn-1964-trunk-docker branch September 29, 2014 02:35
asfgit pushed a commit that referenced this pull request Dec 2, 2014
lirui-apache pushed a commit to lirui-apache/hadoop that referenced this pull request Nov 12, 2015
tangzhankun referenced this pull request in tangzhankun/hadoop Feb 17, 2017
mekasone pushed a commit to mekasone/hadoop that referenced this pull request Feb 19, 2017
Slight improvements to the demo script
tangzhankun referenced this pull request in tangzhankun/hadoop Feb 24, 2017
add hadoop-yarn-mxnet project into HDL
isimonenko pushed a commit to metamx/hadoop that referenced this pull request Sep 15, 2017
leongu-tc pushed a commit to leongu-tc/hadoop that referenced this pull request Jul 15, 2019
steveloughran referenced this pull request in steveloughran/hadoop Oct 4, 2019
…tify slow points

Listing files is surprisingly slow.

Theories
* the listFiles() call is the wrong scan for local (and HDFS?)
* over use of java 8 streams/maps, etc

explore #2 and then worry about #1. We must stay with listFiles for the
magic committers scans of s3, but for the staging committers, we just need to
flat list the source dir with a filter

Change-Id: I7e29b6004e71b146500a95c9822c5eed17390fb4
asfgit pushed a commit that referenced this pull request Jan 25, 2020
Contributed by Steve Loughran.

This fixes two problems with S3Guard authoritative mode and
the auth directory flags which are stored in DynamoDB.

1. mkdirs was creating dir markers without the auth bit,
   forcing needless scans on newly created directories and
   files subsequently added; it was only with the first listStatus call
   on that directory that the dir would be marked as authoritative -even
   though it would be complete already.

2. listStatus(path) would reset the authoritative status bit of all
   child directories even if they were already marked as authoritative.

Issue #2 is possibly the most expensive, as any treewalk using listStatus
(e.g globfiles) would clear the auth bit for all child directories before
listing them. And this would happen every single time...
essentially you weren't getting authoritative directory listings.

For the curious, that the major bug was actually found during testing
-we'd all missed it during reviews.

A lesson there: the better the tests the fewer the bugs.

Maybe also: something obvious and significant can get by code reviews.

	modified:   hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
	modified:   hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/BulkOperationState.java
	modified:   hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/DynamoDBMetadataStore.java
	modified:   hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/LocalMetadataStore.java
	modified:   hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java
	modified:   hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/NullMetadataStore.java
	modified:   hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java
	modified:   hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3GuardWriteBack.java
	modified:   hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestRestrictedReadAccess.java
	modified:   hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/TestPartialDeleteFailures.java
	modified:   hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestDynamoDBMetadataStore.java
	modified:   hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestDynamoDBMetadataStoreAuthoritativeMode.java
	modified:   hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestDynamoDBMetadataStoreScale.java
	modified:   hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestS3GuardFsck.java
	modified:   hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStoreTestBase.java
	modified:   hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestS3Guard.java

Change-Id: Ic3ffda13f2af2430afedd50fd657b595c83e90a7
RogPodge pushed a commit to RogPodge/hadoop that referenced this pull request Mar 25, 2020
Contributed by Steve Loughran.

This fixes two problems with S3Guard authoritative mode and
the auth directory flags which are stored in DynamoDB.

1. mkdirs was creating dir markers without the auth bit,
   forcing needless scans on newly created directories and
   files subsequently added; it was only with the first listStatus call
   on that directory that the dir would be marked as authoritative -even
   though it would be complete already.

2. listStatus(path) would reset the authoritative status bit of all
   child directories even if they were already marked as authoritative.

Issue apache#2 is possibly the most expensive, as any treewalk using listStatus
(e.g globfiles) would clear the auth bit for all child directories before
listing them. And this would happen every single time...
essentially you weren't getting authoritative directory listings.

For the curious, that the major bug was actually found during testing
-we'd all missed it during reviews.

A lesson there: the better the tests the fewer the bugs.

Maybe also: something obvious and significant can get by code reviews.

	modified:   hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
	modified:   hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/BulkOperationState.java
	modified:   hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/DynamoDBMetadataStore.java
	modified:   hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/LocalMetadataStore.java
	modified:   hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java
	modified:   hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/NullMetadataStore.java
	modified:   hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/S3Guard.java
	modified:   hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3GuardWriteBack.java
	modified:   hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/auth/ITestRestrictedReadAccess.java
	modified:   hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/impl/TestPartialDeleteFailures.java
	modified:   hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestDynamoDBMetadataStore.java
	modified:   hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestDynamoDBMetadataStoreAuthoritativeMode.java
	modified:   hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestDynamoDBMetadataStoreScale.java
	modified:   hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestS3GuardFsck.java
	modified:   hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStoreTestBase.java
	modified:   hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/TestS3Guard.java

Change-Id: Ic3ffda13f2af2430afedd50fd657b595c83e90a7
steveloughran referenced this pull request in steveloughran/hadoop Sep 3, 2020
* move from -expect to -min and -max; easier for CLI testing. Plus works
* in -nonauth mode, even when policy == keep, files not in an auth path
  count as failure.
* bucket-info option also prints out the authoritative path, so you have
  more idea what is happening
* reporting of command failure more informative

The reason for change #2 is a workflow where you want to audit a dir, even
though you are in keep mode, and you don't have any auth path. You'd expect
-nonauth to say "no auth path", but instead it treats the whole dir as
auth.

Change-Id: Ib310e321e5862957fbd92bebfade93231f92b16f
qinghui-xu pushed a commit to qinghui-xu/hadoop that referenced this pull request Feb 4, 2022
…o allow downgrade to 2.6 (apache#2)

For details see https://confluence.criteois.com/display/HAD/Namenode+downgrade

Co-authored-by: William Montaz <w.montaz@criteo.com>
bachue referenced this pull request in qiniu/hadoop Apr 6, 2023
Qiniu test use xml file & Sync project
Yifan122 added a commit to Yifan122/hadoop-apache that referenced this pull request Apr 19, 2023
@steveloughran steveloughran mentioned this pull request Oct 9, 2023
4 tasks
NyteKnight pushed a commit to NyteKnight/hadoop that referenced this pull request Jun 25, 2024
NyteKnight pushed a commit to NyteKnight/hadoop that referenced this pull request Jun 25, 2024
singer-bin pushed a commit to singer-bin/hadoop that referenced this pull request Dec 19, 2024
Reopening Parquet/parquet-mr#403 against the new Apache repository.

Author: Matthieu Martin <ma.tt.b.ma.rt.in+parquet@gmail.com>

Closes apache#2 from matt-martin/master and squashes the following commits:

99bb5a3 [Matthieu Martin] Minor javadoc and whitespace changes. Also added the FileStatusWrapper class to ParquetInputFormat to make sure that the debugging log statements print out meaningful paths.
250a398 [Matthieu Martin] Be less aggressive about checking whether the underlying file has been appended to/overwritten/deleted in order to minimize the number of namenode interactions.
d946445 [Matthieu Martin] Add javadocs to parquet.hadoop.LruCache.  Rename cache "entries" as cache "values" to avoid confusion with java.util.Map.Entry (which contains key value pairs whereas our old "entries" really only refer to the values).
a363622 [Matthieu Martin] Use LRU caching for footers in ParquetInputFormat.
singer-bin pushed a commit to singer-bin/hadoop that referenced this pull request Dec 19, 2024
…2 api

Currently for creating a user defined predicate using the new filter api, no value can be passed to create a dynamic filter at runtime. This reduces the usefulness of the user defined predicate, and meaningful predicates cannot be created. We can add a generic Object value that is passed through the api, which can internally be used in the keep function of the user defined predicate for creating many different types of filters.
For example, in spark sql, we can pass in a list of filter values for a where IN clause query and filter the row values based on that list.

Author: Yash Datta <Yash.Datta@guavus.com>
Author: Alex Levenson <alexlevenson@twitter.com>
Author: Yash Datta <saucam@gmail.com>

Closes apache#73 from saucam/master and squashes the following commits:

7231a3b [Yash Datta] Merge pull request apache#3 from isnotinvain/alexlevenson/fix-binary-compat
dcc276b [Alex Levenson] Ignore binary incompatibility in private filter2 class
7bfa5ad [Yash Datta] Merge pull request apache#2 from isnotinvain/alexlevenson/simplify-udp-state
0187376 [Alex Levenson] Resolve merge conflicts
25aa716 [Alex Levenson] Simplify user defined predicates with state
51952f8 [Yash Datta] PARQUET-116: Fix whitespace
d7b7159 [Yash Datta] PARQUET-116: Make UserDefined abstract, add two subclasses, one accepting udp class, other accepting serializable udp instance
40d394a [Yash Datta] PARQUET-116: Fix whitespace
9a63611 [Yash Datta] PARQUET-116: Fix whitespace
7caa4dc [Yash Datta] PARQUET-116: Add ConfiguredUserDefined that takes a serialiazble udp directly
0eaabf4 [Yash Datta] PARQUET-116: Move the config object from keep method to a configure method in udp predicate
f51a431 [Yash Datta] PARQUET-116: Adding type safety for the filter object to be passed to user defined predicate
d5a2b9e [Yash Datta] PARQUET-116: Enforce that the filter object to be passed must be Serializable
dfd0478 [Yash Datta] PARQUET-116: Add a test case for passing a filter object to user defined predicate
4ab46ec [Yash Datta] PARQUET-116: Pass a filter object to user defined predicate in filter2 api
singer-bin pushed a commit to singer-bin/hadoop that referenced this pull request Dec 19, 2024
This buffer initializes itself to a default size when instantiated.
This leads to a lot of unused small buffers when there are a lot of empty columns.

Author: Alex Levenson <alexlevenson@twitter.com>
Author: julien <julien@twitter.com>
Author: Julien Le Dem <julien@twitter.com>

Closes apache#98 from julienledem/avoid_wasting_64K_per_empty_buffer and squashes the following commits:

b0200dd [julien] add license
a1b278e [julien] Merge branch 'master' into avoid_wasting_64K_per_empty_buffer
5304ee1 [julien] remove unused constant
81e399f [julien] Merge branch 'avoid_wasting_64K_per_empty_buffer' of github.com:julienledem/incubator-parquet-mr into avoid_wasting_64K_per_empty_buffer
ccf677d [julien] Merge branch 'master' into avoid_wasting_64K_per_empty_buffer
37148d6 [Julien Le Dem] Merge pull request apache#2 from isnotinvain/PR-98
b9abab0 [Alex Levenson] Address Julien's comment
965af7f [Alex Levenson] one more typo
9939d8d [Alex Levenson] fix typos in comments
61c0100 [Alex Levenson] Make initial slab size heuristic into a helper method, apply in DictionaryValuesWriter as well
a257ee4 [Alex Levenson] Improve IndexOutOfBoundsException message
64d6c7f [Alex Levenson] update comments
8b54667 [Alex Levenson] Don't use CapacityByteArrayOutputStream for writing page chunks
6a20e8b [Alex Levenson] Remove initialSlabSize decision from InternalParquetRecordReader, use a simpler heuristic in the column writers instead
3a0f8e4 [Alex Levenson] Use simpler settings for column chunk writer
b2736a1 [Alex Levenson] Some cleanup in CapacityByteArrayOutputStream
1df4a71 [julien] refactor CapacityByteArray to be aware of page size
95c8fb6 [julien] avoid wasting 64K per empty buffer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant