Skip to content

Conversation

cardil
Copy link
Contributor

@cardil cardil commented Jul 3, 2019

Installing modules from Puppet Forge is extremely slow. It's due to extraction of tarballs using Puppet::ModuleTool::Tar::Mini implementation. The unpacking is 100x times slower then executing a tar xzvf system command!

As a walkaround I'm switching to Puppet::ModuleTool::Tar::Gnu implementation as default.

Also, adding a pending performance test, that clearly shows that Puppet::ModuleTool::Tar::Mini is about 100x times slower then system tar xzvf command. Test should be unskipped if performance of Puppet::ModuleTool::Tar::Mini is fixed somehow.

Issue: https://tickets.puppetlabs.com/browse/PUP-9813

@cardil cardil requested a review from a team July 3, 2019 23:11
@puppetlabs-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

Adding a pending performance test that clearly shows that
ModuleTool::Tar::Mini is about 100x times slower then system `tar xzvf`
command.

Test should be unskipped if performance of ModuleTool::Tar::Mini is
fixed.

As a walkaround I'm switching to ModuleTool::Tar::Gnu implementation as
default.
@puppetcla
Copy link

CLA signed by all contributors.

@cardil cardil changed the title (PUP-9813) Performance test for ModuleTool::Tar::Mini (PUP-9813) Performance walkaround for ModuleTool::Tar::Mini, plus a perf test Jul 4, 2019
@joshcooper
Copy link
Contributor

joshcooper commented Jul 15, 2019

Thanks for your contribution @cardil! We currently default to minitar to prevent problems like https://tickets.puppetlabs.com/browse/PUP-7390 and https://tickets.puppetlabs.com/browse/PUP-3198. I thought I read somewhere that the ruby minitar implementation performs fsync unnecessarily for each file and directory it extracts. I say unnecessarily, because puppet extracts tar entries into a temp directory and then renames the directory to commit the changes. So we don't need to perform all of those intermediate fsync operations. It would be helpful to see if we can fix this in the minitar gem, at least to make the fsync calls optional.

@joshcooper
Copy link
Contributor

Using minitar:

# time puppet module install puppetlabs-stdlib --version 4.25.1
Notice: Preparing to install into /etc/puppetlabs/code/environments/production/modules ...
Notice: Downloading from https://forgeapi.puppet.com ...
Notice: Installing -- do not interrupt ...
/etc/puppetlabs/code/environments/production/modules
└── puppetlabs-stdlib (v4.25.1)

real	0m8.973s
user	0m1.360s
sys	0m0.132s

Using patched minitar to avoid fsync calls:

# time puppet module install puppetlabs-stdlib --version 4.25.1
Notice: Preparing to install into /etc/puppetlabs/code/environments/production/modules ...
Notice: Downloading from https://forgeapi.puppet.com ...
Notice: Installing -- do not interrupt ...
/etc/puppetlabs/code/environments/production/modules
└── puppetlabs-stdlib (v4.25.1)

real	0m2.352s
user	0m1.172s
sys	0m0.076s

I don't think we can directly compare the above times to executing the tar system command, because the puppet module install time include the time to load ruby, load puppet code and providers, apply settings catalogs, etc. However, it does provide the fastest time to unpack a module:

# time bash -c 'curl -s https://forge.puppet.com/v3/files/puppetlabs-stdlib-4.25.1.tar.gz -o puppetlabs-stdlib-4.25.1.tar.gz && tar xzf puppetlabs-stdlib-4.25.1.tar.gz'

real	0m0.157s
user	0m0.036s
sys	0m0.020s
root@lbshku31ofxu

If we include puppet --version to force ruby/puppet/facter loading, then using system tar is 2.3 times faster than optimized minitar (2.352/1.003)

# time bash -c 'puppet --version && curl -s https://forge.puppet.com/v3/files/puppetlabs-stdlib-4.25.1.tar.gz -o puppetlabs-stdlib-4.25.1.tar.gz && tar xzf puppetlabs-stdlib-4.25.1.tar.gz'
6.7.0

real	0m1.003s
user	0m0.768s
sys	0m0.084s

Given that I'm inclined to fix minitar, as in the maintenance costs with having different tar implementations (gnu/bsd/ustar/pax) outweighs the 2.3x performance benefit.

@cardil
Copy link
Contributor Author

cardil commented Jul 15, 2019

@joshcooper I think Roby native implementation should be default as it is, but only when it performs well. Unfortunately now it performs awful.

Switching to Gnu implementation on compatible platforms (Linux, Macos) should be a temporary solution. I can rewrite the code to default Minitar on other platforms (Windows, Solaris, HP-UX). Minitar performance should be boosted as a final solution.

@cardil
Copy link
Contributor Author

cardil commented Jul 15, 2019

Given that I'm inclined to fix minitar, as in the maintenance costs with having different tar implementations (gnu/bsd/ustar/pax) outweighs the 2.3x performance benefit.

I've actually written in PUP-9813 that "t might be at most 2x times slower then system command" so I think your fixed minitar implementation achieve that goal 🍻

@cardil
Copy link
Contributor Author

cardil commented Jul 17, 2019

@joshcooper I've written a test in this PR that isolate extraction with minitar and compares it to tar system command.

I'm wondering how your patched minitar performs in that test. Could you execute that test?

@joshcooper
Copy link
Contributor

Thanks for submitting this @cardil. I'm going to close this as we need to update puppet to call the new minitar method that allows us to skip fsync'ing every file and directory. See https://tickets.puppetlabs.com/browse/PUP-10013. We'll be sure to compare that performance with similar native tar. Hopefully minitar should be less than 2x slower than native tar.

@joshcooper joshcooper closed this Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants