Skip to content

Conversation

@kuahyeow
Copy link

@kuahyeow kuahyeow commented Aug 2, 2022

What was the end-user or developer problem that led to this PR?

This mitigates issues such as cache poisoning, or dependency confusion. If the remote gem server has gems compromised via those issues, then the application installing such gems can use the locked checksum to warn
there is a potential issue. e.g. GHSA-2jmx-8mh8-pm8w

See also discussion in #3379

What is your fix for the problem, implemented in this PR?

Lock the checksum of each gem file in the lockfile, in a new CHECKSUMS section

Note only installed gems will lock the checksum. Gem that are not matching the current local platform will not lock its checksum (because it was never resolved) - however if they had a previous locked checksum it should be preserved)

Once we have locked checksums, we can then later add install-time checking (not in this PR)

Make sure the following tasks are checked

This is still in draft, but it shows one way to add a backwards compatible Checksums section to the lockfile

TODOs:

  • The format can be changed. It's on two lines currently for ease of parsing so I can re-use the NAME_VERSION regex
  • What format for path sources ?
  • Do we want to be explicit on what checksum type is locked (c1191bd01cbe54c3bec85a33e16f87162f47f9ab08d0cbe6e64232b52cd84d17 vs sha256-c1191bd01cbe54c3bec85a33e16f87162f47f9ab08d0cbe6e64232b52cd84d17)
  • Some checksums need to be calculated in the spec because the gem is generated via build_gem
  • Not lock checksums on legacy lockfiles (lockfiles with only ruby PLATFORM)
  • Lots of specs still fail because the expectation needs to add CHECKSUMS section
  • When running specs in parallel, some fail because the cached gem repositories have in-consistent checksums for the same gem file (e.g. rack-1.0.0)
  • Check that lockfiles with multiple platforms work as expected

@welcome
Copy link

welcome bot commented Aug 2, 2022

Thanks for opening a pull request and helping make RubyGems and Bundler better! Someone from the RubyGems team will take a look at your pull request shortly and leave any feedback. Please make sure that your pull request has tests for any changes or added functionality.

We use GitHub Actions to test and make sure your change works functionally and uses acceptable conventions, you can review the current progress of GitHub Actions in the PR status window below.

If you have any questions or concerns that you wish to ask, feel free to leave a comment in this PR or join our #rubygems or #bundler channel on Slack.

For more information about contributing to the RubyGems project feel free to review our CONTRIBUTING guide

@kuahyeow kuahyeow marked this pull request as draft August 2, 2022 22:34
@kuahyeow
Copy link
Author

kuahyeow commented Aug 2, 2022

There's a lot of work in updating the specs (see above TODOs), so please provide feedback firstly on:

  • format of lockfile
  • whether this feature should be enabled by default
  • backwards compatibility concerns
  • multi platform concerns

@indirect
Copy link
Contributor

format of lockfile

I believe a new section is the best option, since older versions of Bundler should be able to ignore new lockfile sections. You might also be able to experiment with adding the checksum after the gem name and version number and see if the parser ignores the extra field.

whether this feature should be enabled by default

Yes, I think it should be enabled by default. Gems changing content between installs is always a bad sign, because the goal of Bundler is for gems to not change content between installs. :)

backwards compatibility concerns

I think the way to test this would be to test the lockfile from this PR against the lockfile parser from Bundler 1.0.x, 1.1.x, 1.12.x, and 2.0.x, so that we know what warnings to print at the time lockfiles are upgraded to include checksums. It would still be ok to ship this PR even if some older Bundler versions can't parse the lockfile, but we would want to know which versions work and don't work before shipping.

multi platform concerns

As long as the checksums support separate checksums for the same gem and version but different platforms, this should be ok. For examples of this kind of gem, try checking out Nokogiri 1.13.8, which has at least 10 separate .gem files with 10 separate checksums for various platforms.

@deivid-rodriguez
Copy link
Contributor

I agree with @indirect.

By the way I'm sorry we hardcode so many lock files in specs, makes this kind of mass lock file updates painful, but I guess it also will highlight a bunch of edge cases, so probably worth going through it anyways.

@kuahyeow kuahyeow force-pushed the bundler_lockfile_checksums branch 5 times, most recently from 697ed2b to 0f1e31d Compare September 18, 2022 03:35
@kuahyeow kuahyeow force-pushed the bundler_lockfile_checksums branch 4 times, most recently from a329261 to b8d0871 Compare September 28, 2022 22:01
@kuahyeow
Copy link
Author

All specs now pass, save for one. I think this is a legit case I did not handle.

I will be away for a month - so will pick this back up in November


(arm64) tkgl2:bundler tkuah$ bin/rspec spec/commands/install_spec.rb:962
Run options:
  include {:locations=>{"./spec/commands/install_spec.rb"=>[962]}}
  exclude {:man=>false, :truffleruby_only=>true, :jruby_only=>true, :readline=>false, :permissions=>false, :no_color_tty=>false, :ruby_repo=>false, :rubygems=>"!= 3.3.16", :bundler=>"!= 2", :git=>"!= 2.36.1", :realworld=>true, :sudo=>true}

Randomized with seed 9765
F

Failures:

  1) bundle install with gem sources with missing platform specific gems in lockfile automatically fixes the lockfile
     Failure/Error:
       expect(lockfile).to eq <<~L
         GEM
           remote: https://gem.repo4/
           specs:
             crass (1.0.6)
             loofah (2.12.0)
               crass (~> 1.0.2)
               nokogiri (>= 1.5.9)
             nokogiri (1.12.4-x86_64-darwin)
               racc (~> 1.4)
     
     
       Commands:
       $ gem build racc
       Successfully built RubyGem
         Name: racc
         Version: 1.5.2
         File: racc-1.5.2.gem
       # $? => 0
     
       $ gem build nokogiri
       Successfully built RubyGem
         Name: nokogiri
         Version: 1.12.4
         File: nokogiri-1.12.4-x86_64-darwin.gem
       # $? => 0
     
       $ gem build nokogiri
       Successfully built RubyGem
         Name: nokogiri
         Version: 1.12.4
         File: nokogiri-1.12.4-x86_64-linux.gem
       # $? => 0
     
       $ gem build crass
       Successfully built RubyGem
         Name: crass
         Version: 1.0.6
         File: crass-1.0.6.gem
       # $? => 0
     
       $ gem build loofah
       Successfully built RubyGem
         Name: loofah
         Version: 2.12.0
         File: loofah-2.12.0.gem
       WARNING:  open-ended dependency on nokogiri (>= 1.5.9) is not recommended
         if nokogiri is semantically versioned, use:
           add_runtime_dependency 'nokogiri', '~> 1.5', '>= 1.5.9'
       WARNING:  See https://guides.rubygems.org/specification-reference/ for help
       # $? => 0
     
       $ gem generate_index
       Generating Marshal quick index gemspecs for 5 gems
       .....
       Complete
       Generated Marshal quick index gemspecs: 0.002s
       Generating specs index
       Generated specs index: 0.000s
       Generating latest specs index
       Generated latest specs index: 0.000s
       Generating prerelease specs index
       Generated prerelease specs index: 0.000s
       Compressing indices
       Compressed indices: 0.002s
       # $? => 0
     
       $ /Users/tkuah/.rbenv/versions/2.7.5/bin/ruby -I/Users/tkuah/code/rubygems/bundler/spec -r/Users/tkuah/code/rubygems/bundler/spec/support/artifice/fail.rb -r/Users/tkuah/code/rubygems/bundler/spec/support/hax.rb /Users/tkuah/code/rubygems/bundler/tmp/1/gems/system/bin/bundle config set --local path vendor/bundle
       # $? => 0
     
       $ /Users/tkuah/.rbenv/versions/2.7.5/bin/ruby -I/Users/tkuah/code/rubygems/bundler/spec -r/Users/tkuah/code/rubygems/bundler/spec/support/artifice/compact_index.rb -r/Users/tkuah/code/rubygems/bundler/spec/support/hax.rb /Users/tkuah/code/rubygems/bundler/tmp/1/gems/system/bin/bundle install
       Fetching gem metadata from https://gem.repo4/.
       Resolving dependencies...
       Using bundler 2.4.0.dev
       Fetching racc 1.5.2
       Fetching crass 1.0.6
       Installing racc 1.5.2
       Installing crass 1.0.6
       Fetching nokogiri 1.12.4 (x86_64-linux)
       Installing nokogiri 1.12.4 (x86_64-linux)
       Fetching loofah 2.12.0
       Installing loofah 2.12.0
       Bundle complete! 1 Gemfile dependency, 5 gems now installed.
       Bundled gems are installed into `./vendor/bundle`
       # $? => 0
     
       expected: "GEM\n  remote: https://gem.repo4/\n  specs:\n    crass (1.0.6)\n    loofah (2.12.0)\n      crass (~>...077c27ed0aad679b12d4568ead3a7546\n\nRUBY VERSION\n   ruby 2.7.5p203\n\nBUNDLED WITH\n   2.4.0.dev\n"
            got: "GEM\n  remote: https://gem.repo4/\n  specs:\n    crass (1.0.6)\n    loofah (2.12.0)\n      crass (~>...077c27ed0aad679b12d4568ead3a7546\n\nRUBY VERSION\n   ruby 2.7.5p203\n\nBUNDLED WITH\n   2.4.0.dev\n"
     
       (compared using ==)
     
       Diff:
       @@ -23,8 +23,6 @@
            e940c93a46e3ed1ead8ad611d940b79c907e4724ea16a82f07dfa16117898c31
          loofah (2.12.0)
            9fcbeb860a2404d0d95b076fc0a531593cdcbec9e04f5d653666261e547d2899
       -  nokogiri (1.12.4-x86_64-darwin)
       -    729cd133c08e33f2722284603fbcbefa2fbb1c17b080f12cb01b8af6ea134550
          nokogiri (1.12.4-x86_64-linux)
            5944ba9447ada8b81589f8c299c08f5cede7c2d071c58e0c60c886908be47c27
          racc (1.5.2)
       
     # ./spec/commands/install_spec.rb:977:in `block (3 levels) in <top (required)>'
     # ./spec/spec_helper.rb:105:in `block (4 levels) in <top (required)>'
     # ./spec/spec_helper.rb:105:in `block (3 levels) in <top (required)>'
     # ./spec/support/helpers.rb:350:in `block in with_gem_path_as'
     # ./spec/support/helpers.rb:364:in `without_env_side_effects'
     # ./spec/support/helpers.rb:345:in `with_gem_path_as'
     # ./spec/spec_helper.rb:104:in `block (2 levels) in <top (required)>'
     # ./spec/support/rubygems_ext.rb:103:in `load'
     # ./spec/support/rubygems_ext.rb:103:in `gem_load_and_activate'
     # ./spec/support/rubygems_ext.rb:18:in `gem_load'

Finished in 3.18 seconds (files took 0.11469 seconds to load)
1 example, 1 failure

Failed examples:

rspec ./spec/commands/install_spec.rb:962 # bundle install with gem sources with missing platform specific gems in lockfile automatically fixes the lockfile

Randomized with seed 9765

@kuahyeow kuahyeow force-pushed the bundler_lockfile_checksums branch 3 times, most recently from 1af2ae4 to 40df63a Compare November 7, 2022 02:47
@kuahyeow
Copy link
Author

kuahyeow commented Nov 8, 2022

I think I have updated all the specs now to use the new CHECKSUMS section. The nice thing is that the checksums in specs are asserted using helpers I created in spec/support/checksums.rb so it should be easy-ish to switch formats.

Only https://github.com/kuahyeow/rubygems/actions/runs/3407168585/jobs/5668783991 is failing now for Windows - no idea why.

Talking about formats....

  # Two lines
  CHECKSUMS
    platform_specific (1.0)
       63982d8d6d9cd8ea115bd8af6de28ff03674d81e858e5b1ce24270801d3f584c
    platform_specific (1.0-java)
       6c193ab5f1c4aa804709be9b5ffb276e89df1afa7d0d131730dffd7dea849137

  # One line
  CHECKSUMS
    platform_specific (1.0) 63982d8d6d9cd8ea115bd8af6de28ff03674d81e858e5b1ce24270801d3f584c
    platform_specific (1.0-java) 6c193ab5f1c4aa804709be9b5ffb276e89df1afa7d0d131730dffd7dea849137

  # One line, with prefix
  CHECKSUMS
    platform_specific (1.0) sha256-63982d8d6d9cd8ea115bd8af6de28ff03674d81e858e5b1ce24270801d3f584c
    platform_specific (1.0-java) sha256-6c193ab5f1c4aa804709be9b5ffb276e89df1afa7d0d131730dffd7dea849137

Any preferences ?

For reference, here's what yarn.lock, and go.mod have

"@_ueberdosis/prosemirror-tables@1.1.3", "@_ueberdosis/prosemirror-tables@^1.1.3":
  version "1.1.3"
  resolved "https://registry.yarnpkg.com/@_ueberdosis/prosemirror-tables/-/prosemirror-tables-1.1.3.tgz#56fdbc8b1d6ec43e7b7beb21e213c131eec451cd"
  integrity sha512-su3pbFi1DT89g6Cuh72TE0MWWKHmWgHcQJ3ODRkm6XfIppWaGpU49t02ur3sgJc7hUhfQXjB93aSkDgOmIii2w==
  dependencies:
    prosemirror-keymap "^1.1.2"
    prosemirror-model "^1.8.1"
    prosemirror-state "^1.3.1"
    prosemirror-transform "^1.2.1"
    prosemirror-view "^1.13.3"
gopkg.in/yaml.v2 v2.4.0/go.mod h1:RDklbk79AGWmwhnvt/jBztapEOGDOx6ZbXqjP6csGnQ=

Comment on lines 74 to 102
if spec.is_a?(LazySpecification) && spec.generic_platform_with_specific_installed_platform?
# If a spec started off as a generic ruby platform, and is resolved
# to something more specific on install time, we skip recording the
# checksum.
# For example if we have:
#
# specs:
# pry (0.11.3)
#
# PLATFORMS
# ruby
# java
#
# Recording the checksum for ruby does not make sense, because we
# will end up with something like:
#
# specs:
# pry (0.11.3)
#
# CHECKSUMS
# pry(0.11.3-java)
# 1f0bd9ce0282e8a20151011e4677746206a0c84b648e0a1c981b3232bb47a3b9
# pry(0.11.3-java)
# 1f0bd9ce0282e8a20151011e4677746206a0c84b648e0a1c981b3232bb47a3b9
#
# PLATFORMS
# ruby
# java
next
Copy link
Author

@kuahyeow kuahyeow Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to #3379 (comment), I had to skip "generic" platforms like "ruby".

I think this means gems like nokogiri will not have their checksums locked under legacy lockfiles where ruby is the only PLATFORMS.

Copy link
Author

@kuahyeow kuahyeow Dec 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the cache of compact entries, here is a non-exhaustive list of gems that have more than one platform

index = Bundler::Fetcher::CompactIndex.new(nil, Bundler::Source::Rubygems::Remote.new(Bundler::URI("https://rubygems.org")), nil)
dependencies = Dir[index.send(:compact_index_client).instance_variable_get(:@cache).directory.join("info").join('*')].map {|d| File.basename d}
dependencies.select do |d|
  index.send(:compact_index_client).instance_variable_get(:@cache).dependencies(d).group_by(&:first).values.last.size > 1
end
["nio4r", "aws-crt", "msgpack", "mixlib-shellout", "ffi-yajl", "concurrent-ruby-ext", "strscan", "mysql2", "sorted_set", "puma", "rest-client", "io-wait", "hpricot", "io-console", "libxml-ruby", "unf_ext", "grpc", "win32console", "google-protobuf", "ruby-prof", "bcrypt", "chef", "hamlit", "digest", "date", "websocket-driver", "hiredis", "bcrypt_pbkdf", "binding_ninja", "murmurhash3", "ptools", "atomic", "eventmachine", "mongrel", "pg", "psych", "cgi", "bcrypt-ruby", "win32-api", "sassc", "json", "stringio", "snappy", "mixlib-archive", "ffi", "sqlite3", "racc", "pry", "libv8", "thread_safe", "jaro_winkler", "character_set", "sys-admin", "ed25519", "nokogiri", "jruby-pageant", "regexp_property_values", "linecache"]

@indirect
Copy link
Contributor

indirect commented Nov 8, 2022

Personally I would lean towards the "one line with prefix" format, but I think any of these would be a great improvement over the lack of lockfile checksums that we have now. :)

@kuahyeow kuahyeow force-pushed the bundler_lockfile_checksums branch from f9310b5 to 7fdd674 Compare November 20, 2022 23:06
@kuahyeow kuahyeow force-pushed the bundler_lockfile_checksums branch 5 times, most recently from 4c52b43 to 0a7b217 Compare December 11, 2022 23:35
@kuahyeow kuahyeow force-pushed the bundler_lockfile_checksums branch from 0a7b217 to 54b8da6 Compare December 27, 2022 06:09
@kuahyeow kuahyeow force-pushed the bundler_lockfile_checksums branch 4 times, most recently from 7568170 to 4c55c26 Compare December 27, 2022 07:43
We lock the checksum for each resolved spec under a new CHECKSUMS
section in the lockfile.

The checksum is calculated from the SHA256 checksum of the cached gem.

If the locked spec does not resolve for the local platform, we preserve
the locked checksum, similar to how we preserve specs.
@kuahyeow kuahyeow force-pushed the bundler_lockfile_checksums branch 5 times, most recently from 727808e to b4ee6bb Compare December 28, 2022 20:04
Thong Kuah added 5 commits December 29, 2022 17:33
If ruby platform specs resolve into specific platforms,
skip locking checksum for that spec, as its legacy behaviour
We switch to a different gem name to avoid this conflict
@kuahyeow kuahyeow force-pushed the bundler_lockfile_checksums branch 3 times, most recently from 0b1b831 to e19b499 Compare December 30, 2022 02:51
The special case of adding checksum for gems with ruby only lockfile,
and ruby only gem (i.e. no platform specific gems) is probably YAGNI.
@kuahyeow kuahyeow force-pushed the bundler_lockfile_checksums branch from e19b499 to 440e787 Compare January 7, 2023 22:32
@kuahyeow
Copy link
Author

kuahyeow commented Jan 7, 2023

  1. I have updated to this be a one-line format with sha-256 prefix.

  2. Also I have moved the checksum locking from checking the gem cache to checking the remote checksum (EndpointSpecification's #checksum). This makes this more predictable as I do not know if all gems generate the same gem consistently in all cases (e.g. private forks)

  3. Multiple platforms - see Keep track of checksum per gem in the lockfile #3379 (comment). I tried to make LazySpecification "pretend" to materialize for other platforms but I think I am in the wrong place. Maybe Make lockfiles generated on macOS include a lock for Linux by default #5700 is a better approach

Checksum locking only makes sense on install. The compact index
information is only available then.
@kuahyeow kuahyeow force-pushed the bundler_lockfile_checksums branch from 440e787 to 597932a Compare January 7, 2023 23:10
@martinemde
Copy link
Contributor

Merged in #6374.

@martinemde martinemde closed this Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants