Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-16202. Enhance openFile() for better read performance against object stores #2584

Commits on Apr 1, 2022

  1. HADOOP-16202. Enhance openFile()

    Roll-up of the previous PR
    
    Change-Id: Ib0aec173afcd8aae33f52da3f99ac813bd38c32f
    
    HADOOP-16202. javadocs and style
    
    Change-Id: Id4294ac7034155a10be22fb4631edf43cbadc22b
    
    HADOOP-16202. openFile: read policies
    
    Change "fs.option.openfile.fadvise" to "fs.option.openfile.read.policy"
    and expand with "vectored", "parquet" and "orc", all of which map in s3a
    to random.
    
    The concept is that by choosing a read policy you can do more than just
    change seek policy -it could switch buffering, caching etc.
    
    Change-Id: I2147840f58fb54853c797d2cab5d668c3d1d2541
    
    HADOOP-16202: documentation changes and IOStatistics of open operations
    
    * Thomas Marquardt's suggestions on the docs
    * standard action name for file opened
    * S3AInputStream measures the count and duration of this, and reports it
    
    Change-Id: I7feacf4eb4d6494bb93b3dfc05b060ad75e52c18
    
    HADOOP-16202. rebase to trunk; add "whole-file" option
    
    +slacken checks on Open contract tests so that if tested against an external connector
     things are less likely to fail.
    
     TODO: make that a compliance switch
    
    Change-Id: I9a4535d785949822752571f82f9448b9aac66aad
    
    HADOOP-16202: remove the orc, parquet and vectored options from read policy
    
    Going through Thomas's feedback...
    
    Change-Id: Ibdf2c4ec64c54704f8631d5775d83444660c923a
    steveloughran committed Apr 1, 2022
    Configuration menu
    Copy the full SHA
    c5f543c View commit details
    Browse the repository at this point in the history
  2. HADOOP-16202. enhance-openfile

    * reinstate "vector" as mukund is about to merge that patch into a feature branch
    * remove refs to orc/parquet in docs
    * all style, deprecation, spotbugs, EOLs
    
    Change-Id: Id44617916e60688bdce2e1f107082704567e3515
    steveloughran committed Apr 1, 2022
    Configuration menu
    Copy the full SHA
    068eb16 View commit details
    Browse the repository at this point in the history
  3. HADOOP-16202. enhance-openfile

    * review docs, including hrefs
    * s3a maps vector to random
    * FileUtil uses whole-file in its copy. this matters when using the cli on a system
      where the s3a policy is set to random
    
    excluding complaints from yetus, i think this is ready to go in
    
    Change-Id: I54d43c5b4947e9e7ee91fa9c3feb0a075b4b4527
    steveloughran committed Apr 1, 2022
    Configuration menu
    Copy the full SHA
    c0dfe72 View commit details
    Browse the repository at this point in the history
  4. HADOOP-16202. yetus

    Change-Id: Ibe564e22d336a5ff8a85e1fc678dcd06fa99bb9d
    steveloughran committed Apr 1, 2022
    Configuration menu
    Copy the full SHA
    97ebbef View commit details
    Browse the repository at this point in the history
  5. HADOOP-16202. resync with trunk

    imports were invalid
    
    Change-Id: Ie80fe283a88e390c968d8136f45cb6ce41f29143
    steveloughran committed Apr 1, 2022
    Configuration menu
    Copy the full SHA
    756935f View commit details
    Browse the repository at this point in the history
  6. HADOOP-16202. s3a openfile to support a new drain policy fs.s3a.input…

    ….async.drain.threshold
    
    If the #of bytes is above this, the stream is drained in a separate thread;
    the threshold is there because for small amounts, scheduling the work seems
    to be slower than the actual processing.
    
    draining is also done via a 16K buffer, which reduces a lot of OS API calls.
    
    Change-Id: I1b2a71a05c37ada289cf23e128da0a6b01452ee6
    steveloughran committed Apr 1, 2022
    Configuration menu
    Copy the full SHA
    1bc73e6 View commit details
    Browse the repository at this point in the history
  7. HADOOP-16202. one of Thomas's comments i'd missed

    ChecksumFileSystem simplification
    
    Change-Id: I9637d386c5e9aea95f568d3f80104bdff99ffc31
    steveloughran committed Apr 1, 2022
    Configuration menu
    Copy the full SHA
    cf13d12 View commit details
    Browse the repository at this point in the history

Commits on Apr 5, 2022

  1. HADOOP-16202. build warnings (not yet the spotbugs), and

    clean the draining code
    
    Change-Id: I602e0414004dd2806e6e942b553c069770bd1250
    steveloughran committed Apr 5, 2022
    Configuration menu
    Copy the full SHA
    f1a68eb View commit details
    Browse the repository at this point in the history

Commits on Apr 6, 2022

  1. HADOOP-16202. improve s3a opening; fix tests

    * fix build
    * mock unbuffer test working by mocking all read() calls and turning off async drain
    * all file open options moved into OpenFileHelper; these can directly update the
      S3AReadOpContext builder.
    
    This makes the s3a fs class slightly smaller, as a few fields have been cut.
    
    +fix style, javadoc complaints
    
    Change-Id: I36d888fba40152328eeeb6d17ceb192530ef76e3
    steveloughran committed Apr 6, 2022
    Configuration menu
    Copy the full SHA
    2843391 View commit details
    Browse the repository at this point in the history

Commits on Apr 7, 2022

  1. Revert "HADOOP-16202. one of Thomas's comments i'd missed"

    This reverts commit cf13d12.
    
    Change-Id: I56d69c1a6ae2692607009e8da089b11315ef08ae
    steveloughran committed Apr 7, 2022
    Configuration menu
    Copy the full SHA
    c9c989e View commit details
    Browse the repository at this point in the history
  2. HADOOP-16202. spotbugs

    Change-Id: I56185c5f866269dc2e253570a6564a3dad074666
    steveloughran committed Apr 7, 2022
    Configuration menu
    Copy the full SHA
    8cc26f9 View commit details
    Browse the repository at this point in the history

Commits on Apr 12, 2022

  1. HADOOP-16202. review comments

    Change-Id: I4d2fe8936e066970c2c54a4a053b06025845646c
    steveloughran committed Apr 12, 2022
    Configuration menu
    Copy the full SHA
    e7b29ef View commit details
    Browse the repository at this point in the history

Commits on Apr 19, 2022

  1. HADOOP-16202. enhance-openfile: review feedback

    Feedback from dannycjones.
    
    Change-Id: I546f28411c2475e1254b259c7e0734cc868ea9f0
    steveloughran committed Apr 19, 2022
    Configuration menu
    Copy the full SHA
    98ebf76 View commit details
    Browse the repository at this point in the history
  2. Merge branch 'trunk' into s3/HADOOP-16202-enhance-openfile

    Change-Id: If30684e9b4d39e9d1ba9cfdf50963b655c20144f
    steveloughran committed Apr 19, 2022
    Configuration menu
    Copy the full SHA
    bf8e1d4 View commit details
    Browse the repository at this point in the history

Commits on Apr 22, 2022

  1. HADOOP-16202. checkstyle; unused import

    Change-Id: I64ac45369e4a6e1e9cd651b01acd380d258782fb
    steveloughran committed Apr 22, 2022
    Configuration menu
    Copy the full SHA
    60cb6b5 View commit details
    Browse the repository at this point in the history