Skip to content

ODP-2462: HDFS-3246|HDFS-14564|HDFS-14304|HDFS-14033|HDFS-14267|HDFS-15977|HDFS-14846|HDFS-14111|HDFS-14478 Making Hadoop 3.2.3.3.2.3.3-2 compatible with Impala 4.4.0 #60

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 40 commits into from

Conversation

deepakdamri
Copy link

Branch is getting compiled

Manual change to comment @OverRide for readfully method in hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/CryptoInputStream.java

Open Source commits:

HDFS-3246: pRead equivalent for direct read path (apache#597)
HDFS-14564: Add libhdfs APIs for readFully; add readFully to ByteBufferPositionedReadable (apache#963) Contributed by Sahil Takiar
HDFS-14304: High lock contention on hdfsHashMutex in libhdfs
HDFS-14033. [libhdfs++] Disable libhdfs++ build on systems that do not support thread_local. Contributed by Anatoli Shein.
HDFS-14267. Add test_libhdfs_ops to libhdfs tests, mark libhdfs_read/write.c as examples. Contributed by Sahil Takiar. Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org>
HDFS-15977. Call explicit_bzero only if it is available. (apache#2914)
HDFS-14846: libhdfs tests are failing on trunk due to jni usage bugs
HDFS-14111: hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
HDFS-14478: Add libhdfs APIs for openFile

sahilTakiar and others added 20 commits December 2, 2024 19:03
…erPositionedReadable (apache#963) Contributed by Sahil Takiar.

Reviewed-by: Siyao Meng <smeng@cloudera.com>
HDFS-3246: pRead equivalent for direct read path

Contributed by Sahil Takiar
…t support thread_local. Contributed by Anatoli Shein.

(cherry picked from commit 9c438ab)
…write.c as examples. Contributed by Sahil Takiar.

Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org>
This closes apache#595

Signed-off-by: Todd Lipcon <todd@apache.org>
Reviewed-by: Masatake Iwasaki <iwasakims@apache.org>
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
(cherry picked from commit f0241ec)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/lib/x-platform/syscall_linux.cc
… Contributed by Syed Shameerur Rahman.

Reviewed-by: Akira Ajisaka <aajisaka@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
(cherry picked from commit 3b9faf6)
Contributed by Daniel Templeton

(cherry picked from commit 5446e3c)
(cherry picked from commit 11d6c4e)

Co-authored-by: Eric Yang <eyang@apache.org>
… Node Placement. Contributed by Prabhu Joseph and Qi Zhu (#43)

(cherry picked from commit bc815b3)
…4 upgrade. (apache#5311) (#47)

* MAPREDUCE-7431. ShuffleHandler refactor and fix after Netty4 upgrade. (apache#5311)

(cherry picked from commit 151b71d)
(cherry picked from commit 8271c03)

* MAPREDUCE-7431: fix compile

(cherry picked from commit 1aaf4a6)

---------

Co-authored-by: Tamas Domok <tdomok@cloudera.com>
…oder.java (apache#5388) (#48)

(cherry picked from commit e4b5314)
(cherry picked from commit df1cf3e)

Co-authored-by: Tamas Domok <tdomok@cloudera.com>
…mas Domok (#49)

(cherry picked from commit 8f6be36)
(cherry picked from commit 14a608b)

Co-authored-by: Szilard Nemeth <snemeth@apache.org>
#50)

(cherry picked from commit 1fddf35)

Co-authored-by: manishsinghmowall <manishsingh@acceldata.io>
(cherry picked from commit d6b90a7)
(cherry picked from commit 7e2516f)
<artifactId>jna</artifactId>
<version>${jna.version}</version>
</dependency>
<dependency>
Copy link

@prisma-cloud-devsecops prisma-cloud-devsecops bot Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.xerial.snappy:snappy-java / pom.xml

🎉   All vulnerabilities were fixed

Total vulnerabilities: 4

Critical: 0 High: 2 Medium: 2 Low: 0
Vulnerability IDSeverityCVSSFixed inStatus
CVE-2023-34455 HIGH HIGH 7.5 1.1.10.1 Fixed
CVE-2023-43642 HIGH HIGH 7.5 1.1.10.4 Fixed
CVE-2023-34453 MEDIUM MEDIUM 5.9 1.1.10.1 Fixed
CVE-2023-34454 MEDIUM MEDIUM 5.9 1.1.10.1 Fixed
Vulnerabilities scan results were updated by commit 4f924c2

@@ -1214,9 +1192,9 @@
</exclusions>
</dependency>
<dependency>
Copy link

@prisma-cloud-devsecops prisma-cloud-devsecops bot Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.apache.avro:avro / pom.xml

🎉   All vulnerabilities were fixed

Total vulnerabilities: 2

Critical: 1 High: 1 Medium: 0 Low: 0
Vulnerability IDSeverityCVSSFixed inStatus
CVE-2024-47561 CRITICAL CRITICAL 9.8 1.11.4 Fixed
CVE-2023-39410 HIGH HIGH 7.5 1.11.3 Fixed
Vulnerabilities scan results were updated by commit 4f924c2

<groupId>io.netty</groupId>
<artifactId>netty-buffer</artifactId>
<version>${netty4.version}</version>
</dependency>

<dependency>
Copy link

@prisma-cloud-devsecops prisma-cloud-devsecops bot Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io.netty:netty / pom.xml

🎉   All vulnerabilities were fixed

Total vulnerabilities: 8

Critical: 1 High: 2 Medium: 5 Low: 0
Vulnerability IDSeverityCVSSFixed inStatus
CVE-2019-20444 CRITICAL CRITICAL 9.1 - Fixed
CVE-2021-37136 HIGH HIGH 7.5 - Fixed
CVE-2021-37137 HIGH HIGH 7.5 - Fixed
CVE-2019-20445 MEDIUM MEDIUM 9.1 - Fixed
CVE-2021-21290 MEDIUM MEDIUM 6.2 - Fixed
CVE-2021-21295 MEDIUM MEDIUM 5.9 - Fixed
CVE-2021-21409 MEDIUM MEDIUM 5.9 - Fixed
CVE-2021-43797 MEDIUM MEDIUM 6.5 - Fixed
Vulnerabilities scan results were updated by commit 4f924c2

@@ -1410,7 +1371,7 @@
<dependency>
Copy link

@prisma-cloud-devsecops prisma-cloud-devsecops bot Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.codehaus.jettison:jettison / pom.xml

🎉   All vulnerabilities were fixed

Total vulnerabilities: 5

Critical: 0 High: 4 Medium: 1 Low: 0
Vulnerability IDSeverityCVSSFixed inStatus
CVE-2022-40150 HIGH HIGH 7.5 1.5.2 Fixed
CVE-2022-45685 HIGH HIGH 7.5 1.5.2 Fixed
CVE-2022-45693 HIGH HIGH 7.5 1.5.2 Fixed
CVE-2023-1436 HIGH HIGH 7.5 1.5.4 Fixed
CVE-2022-40149 MEDIUM MEDIUM 6.5 1.5.1 Fixed
Vulnerabilities scan results were updated by commit 4f924c2

@@ -1031,9 +943,9 @@
<version>${commons-logging-api.version}</version>
</dependency>
<dependency>
Copy link

@prisma-cloud-devsecops prisma-cloud-devsecops bot Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log4j:log4j / pom.xml

🎉   All vulnerabilities were fixed

Total vulnerabilities: 7

Critical: 3 High: 3 Medium: 1 Low: 0
Vulnerability IDSeverityCVSSFixed inStatus
CVE-2022-23305 CRITICAL CRITICAL 9.8 2.0 Fixed
CVE-2019-17571 CRITICAL CRITICAL 9.8 2.0 Fixed
CVE-2020-9493 CRITICAL CRITICAL 9.8 2.0 Fixed
CVE-2022-23302 HIGH HIGH 8.8 2.0 Fixed
CVE-2022-23307 HIGH HIGH 8.8 2.0 Fixed
CVE-2023-26464 HIGH HIGH 7.5 2.0 Fixed
CVE-2021-4104 MEDIUM MEDIUM 6.6 - Fixed
Vulnerabilities scan results were updated by commit 4f924c2

@@ -1159,6 +1112,26 @@
<artifactId>woodstox-core</artifactId>
<version>${woodstox.version}</version>
</dependency>
<dependency>
Copy link

@prisma-cloud-devsecops prisma-cloud-devsecops bot Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.codehaus.jackson:jackson-mapper-asl / pom.xml

🎉   All vulnerabilities were fixed

Total vulnerabilities: 2

Critical: 1 High: 1 Medium: 0 Low: 0
Vulnerability IDSeverityCVSSFixed inStatus
CVE-2019-10202 CRITICAL CRITICAL 9.8 - Fixed
CVE-2019-10172 HIGH HIGH 7.5 - Fixed
Vulnerabilities scan results were updated by commit 4f924c2

tasanuma and others added 5 commits January 21, 2025 12:45
…eateFile();

S3A to implement S3 Select through this API.

The new openFile() API is asynchronous, and implemented across FileSystem and FileContext.

The MapReduce V2 inputs are moved to this API, and you can actually set must/may
options to pass in.

This is more useful for setting things like s3a seek policy than for S3 select,
as the existing input format/record readers can't handle S3 select output where
the stream is shorter than the file length, and splitting plain text is suboptimal.
Future work is needed there.

In the meantime, any/all filesystem connectors are now free to add their own filesystem-specific
configuration parameters which can be set in jobs and used to set filesystem input stream
options (seek policy, retry, encryption secrets, etc).

Contributed by Steve Loughran
Contributed by Steve Loughran.

This complements the StreamCapabilities Interface by allowing applications to probe for a specific path on a specific instance of a FileSystem client
to offer a specific capability.

This is intended to allow applications to determine

* Whether a method is implemented before calling it and dealing with UnsupportedOperationException.
* Whether a specific feature is believed to be available in the remote store.

As well as a common set of capabilities defined in CommonPathCapabilities,
file systems are free to add their own capabilities, prefixed with
 fs. + schema + .

The plan is to identify and document more capabilities -and for file systems which add new features, for a declaration of the availability of the feature to always be available.

Note

* The remote store is not expected to be checked for the feature;
  It is more a check of client API and the client's configuration/knowledge
  of the state of the remote system.
* Permissions are not checked.

Change-Id: I80bfebe94f4a8bdad8f3ac055495735b824968f5
apache#1761). Contributed by Steve Loughran

* Enhanced builder + FS spec
* s3a FS to use this to skip HEAD on open
* and to use version/etag when opening the file

works with S3AFileStatus FS and S3ALocatedFileStatus
@deepakdamri deepakdamri force-pushed the ODP-2462_III branch 4 times, most recently from cda90c1 to 09a1093 Compare January 21, 2025 09:05
steveloughran and others added 9 commits January 21, 2025 14:46
…1)

This defines standard option and values for the
openFile() builder API for opening a file:

fs.option.openfile.read.policy
 A list of the desired read policy, in preferred order.
 standard values are
 adaptive, default, random, sequential, vector, whole-file

fs.option.openfile.length
 How long the file is.

fs.option.openfile.split.start
 start of a task's split

fs.option.openfile.split.end
 end of a task's split

These can be used by filesystem connectors to optimize their
reading of the source file, including but not limited to
* skipping existence/length probes when opening a file
* choosing a policy for prefetching/caching data

The hadoop shell commands which read files all declare "whole-file"
and "sequential", as appropriate.

Contributed by Steve Loughran.

Change-Id: Ia290f79ea7973ce8713d4f90f1315b24d7a23da1
This is the API and implementation classes of HADOOP-16830,
which allows callers to query IO object instances
(filesystems, streams, remote iterators, ...) and other classes
for statistics on their I/O Usage: operation count and min/max/mean
durations.

New Packages

org.apache.hadoop.fs.statistics.
  Public API, including:
    IOStatisticsSource
    IOStatistics
    IOStatisticsSnapshot (seralizable to java objects and json)
    +helper classes for logging and integration
    BufferedIOStatisticsInputStream
       implements IOStatisticsSource and StreamCapabilities
     BufferedIOStatisticsOutputStream
       implements IOStatisticsSource, Syncable and StreamCapabilities

org.apache.hadoop.fs.statistics.impl
  Implementation classes for internal use.

org.apache.hadoop.util.functional
  functional programming support for RemoteIterators and
  other operations which raise IOEs; all wrapper classes
  implement and propagate IOStatisticsSource

Contributed by Steve Loughran.

Change-Id: If56e8db2981613ff689c39239135e44feb25f78e
Contributed by Steve Loughran and Daryn Sharp.
…apache.hadoop.util.

Contributed by Abhishek Modi
…ze().

Contributed by Steve Loughran.

This FileSystem instantiation so if an IOException or RuntimeException is
raised in the invocation of FileSystem.initialize() then a best-effort
attempt is made to close the FS instance; exceptions raised that there
are swallowed.

The S3AFileSystem is also modified to do its own cleanup if an
IOException is raised during its initialize() process, it being the
FS we know has the "potential" to leak threads, especially in
extension points (e.g AWS Authenticators) which spawn threads.

Change-Id: Ib84073a606c9d53bf53cbfca4629876a03894f04
… a file is forbidden

Contributed by Steve Loughran.

Not all stores do complete validation here; in particular the S3A
Connector does not: checking up the entire directory tree to see if a path matches
is a file significantly slows things down.

This check does take place in S3A mkdirs(), which walks backwards up the list of
parent paths until it finds a directory (success) or a file (failure).
In practice production applications invariably create destination directories
before writing 1+ file into them -restricting check purely to the mkdirs()
call deliver significant speed up while implicitly including the checks.

Change-Id: I2c9df748e92b5655232e7d888d896f1868806eb0
…ream applications call it explicitly. Contributed by Konstantin V Shvachko.

(cherry picked from commit b3786d6)
…(and tombstones).

Contributed by Gabor Bota.

Change-Id: I73a2d2861901dedfe7a0e783b310fbb95e7c1af9
… inconsistent read after replace/overwrite.

Contributed by Ben Roling.

S3Guard will now track the etag of uploaded files and, if an S3
bucket is versioned, the object version.

You can then control how to react to a mismatch between the data
in the DynamoDB table and that in the store: warn, fail, or, when
using versions, return the original value.

This adds two new columns to the table: etag and version.
This is transparent to older S3A clients -but when such clients
add/update data to the S3Guard table, they will not add these values.
As a result, the etag/version checks will not work with files uploaded by older clients.

For a consistent experience, upgrade all clients to use the latest hadoop version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.