Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add language support for hosts files #6391

Merged
merged 9 commits into from
Jun 8, 2023
Merged

Add language support for hosts files #6391

merged 9 commits into from
Jun 8, 2023

Conversation

Alhadis
Copy link
Collaborator

@Alhadis Alhadis commented Apr 25, 2023

Description

This PR adds language support for hosts files, as detailed by @DandelionSprout in #6389.

Usage

A filename search for hosts yields over ~25,200 results. Most of these are legit hosts(5) files, but a non-trivial number of them aren't (such as this). I contemplating registering this a generic extension, but quickly realised this feature only accommodates file-extensions, not unabridged filenames (which are handled by two different classification strategies). I'm wondering if the “generic filetypes” feature added to Linguist in 2020 should accommodate “generic filenames” as well. @lildude, any thoughts?

Checklist

Closes #6389

Footnotes

  1. File is ineligible for copyright, at least according to my reading of Wikipedia.

  2. Concatenated from several configuration examples embedded in markdown files.

Still incomplete; real-world usage suggests that generic filenames might
be necessary (we currently only support generic *file extensions*).
@Alhadis Alhadis requested a review from a team as a code owner April 25, 2023 14:32
@Alhadis Alhadis mentioned this pull request Apr 25, 2023
@Alhadis
Copy link
Collaborator Author

Alhadis commented Apr 25, 2023

@lildude Any idea how we can get our tests to pass when both hosts and HOSTS are expected to exist simultaneously under samples/Hosts File/filenames/…? Assuming a case-insensitive filesystem and everything…

Failed tests
  1) Failure:
TestSamples#test_Hosts File_has_samples [/home/runner/work/linguist/linguist/test/test_samples.rb:96]:
Missing sample in "samples/Hosts File/filenames/HOSTS". See https://github.com/github/linguist/blob/master/CONTRIBUTING.md

  2) Failure:
TestSamples#test_INI_has_samples [/home/runner/work/linguist/linguist/test/test_samples.rb:96]:
Missing sample in "samples/INI/filenames/HOSTS". See https://github.com/github/linguist/blob/master/CONTRIBUTING.md

(This would be a non-issue if Linguist matched filenames case-insensitively, the way it does with file extensions).

@lildude
Copy link
Member

lildude commented May 17, 2023

I'm wondering if the “generic filetypes” feature added to Linguist in 2020 should accommodate “generic filenames” as well. @lildude, any thoughts?

I thought about this in #6364 and I'm still not sure it's worth the effort, but I'm open to being convinced with a compelling argument 😁.

Any idea how we can get our tests to pass when both hosts and HOSTS are expected to exist simultaneously under samples/Hosts File/filenames/…?

Fudge it 😁.

This simplest "solution" is skip the tests just for this language.

Are more convoluted "solution" is we adjust the various logic and use something like an underscore after these filenames, eg hosts and HOSTS_, to give these files unique names.

Either way, I don't think we need to get too fancy right now as I don't expect this to be particularly common.

@Alhadis
Copy link
Collaborator Author

Alhadis commented May 17, 2023

Either way, I don't think we need to get too fancy right now as I don't expect this to be particularly common.

Yeah, me neither. 😀 I've added a comment with a permalink to your response (since this is the sort of last-minute hack that's all but indecipherable to my future self, give or take 2-3 years…)

@Alhadis
Copy link
Collaborator Author

Alhadis commented May 17, 2023

@lildude I require your Rubyist expertise. Why has RubyGems suddenly carked it?

Output of $ bundle exec rake samples
--- ERROR REPORT TEMPLATE -------------------------------------------------------

```
Gem::GemNotFoundException: can't find gem bundler (= 2.4.5) with executable bundle
  /usr/local/lib/ruby/site_ruby/3.2.0/rubygems.rb:261:in `find_spec_for_exe'
  /usr/local/lib/ruby/site_ruby/3.2.0/rubygems.rb:241:in `bin_path'
  /usr/local/lib/ruby/site_ruby/3.2.0/bundler/rubygems_integration.rb:178:in `bin_path'
  /usr/local/lib/ruby/site_ruby/3.2.0/bundler/shared_helpers.rb:282:in `set_bundle_variables'
  /usr/local/lib/ruby/site_ruby/3.2.0/bundler/shared_helpers.rb:76:in `set_bundle_environment'
  /usr/local/lib/ruby/site_ruby/3.2.0/bundler/cli/exec.rb:20:in `run'
  /usr/local/lib/ruby/site_ruby/3.2.0/bundler/cli.rb:491:in `exec'
  /usr/local/lib/ruby/site_ruby/3.2.0/bundler/vendor/thor/lib/thor/command.rb:27:in `run'
  /usr/local/lib/ruby/site_ruby/3.2.0/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
  /usr/local/lib/ruby/site_ruby/3.2.0/bundler/vendor/thor/lib/thor.rb:392:in `dispatch'
  /usr/local/lib/ruby/site_ruby/3.2.0/bundler/cli.rb:34:in `dispatch'
  /usr/local/lib/ruby/site_ruby/3.2.0/bundler/vendor/thor/lib/thor/base.rb:485:in `start'
  /usr/local/lib/ruby/site_ruby/3.2.0/bundler/cli.rb:28:in `start'
  /usr/local/Cellar/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/bundler-2.4.10/exe/bundle:45:in `block in <top (required)>'
  /usr/local/lib/ruby/site_ruby/3.2.0/bundler/friendly_errors.rb:117:in `with_friendly_errors'
  /usr/local/Cellar/ruby/3.2.2/lib/ruby/gems/3.2.0/gems/bundler-2.4.10/exe/bundle:33:in `<top (required)>'
  /usr/local/opt/ruby/bin/bundle:25:in `load'
  /usr/local/opt/ruby/bin/bundle:25:in `<main>'

```

## Environment

```
Bundler       2.4.5
  Platforms   ruby, x86_64-darwin-21
Ruby          3.2.2p53 (2023-03-30 revision e51014f9c05aa65cbf203442d37fef7c12390015) [x86_64-darwin-21]
  Full Path   /usr/local/opt/ruby/bin/ruby
  Config Dir  /usr/local/Cellar/ruby/3.2.2/etc
RubyGems      3.4.5
  Gem Home    /usr/local/lib/ruby/gems/3.2.0
  Gem Path    /Users/Alhadis/.gem/ruby/3.2.0:/usr/local/lib/ruby/gems/3.2.0:/usr/local/Cellar/ruby/3.2.2/lib/ruby/gems/3.2.0
  User Home   /Users/Alhadis
  User Path   /Users/Alhadis/.gem/ruby/3.2.0
  Bin Dir     /usr/local/lib/ruby/gems/3.2.0/bin
Tools         
  Git         2.40.1
  RVM         not installed
  rbenv       not installed
  chruby      not installed
Gem.ruby      /usr/local/opt/ruby/bin/ruby
bundle #!     /usr/local/Cellar/ruby/3.2.2/bin/ruby

```

## Bundler Build Metadata

```
Built At          2023-01-21
Git SHA           d8ff3b6e4a
Released Version  true
```

## Bundler settings

```
build.charlock_holmes
  Set for the current user (/Users/Alhadis/.bundle/config): "--with-icu-dir=/usr/local/opt/icu4c"
path
  Set for your local app (/Users/Alhadis/Forks/GitHub-Linguist/.bundle/config): "vendor/gems"
```

## Gemfile

### Gemfile

```ruby
source 'https://rubygems.org'
gemspec :name => "github-linguist"

group :debug do
  gem 'byebug' if RUBY_VERSION >= '2.2'
end
```

### Gemfile.lock

```
PATH
  remote: .
  specs:
    github-linguist (7.25.0)
      cgi
      charlock_holmes (~> 0.7.7)
      mini_mime (~> 1.0)
      rugged (~> 1.0)

GEM
  remote: https://rubygems.org/
  specs:
    addressable (2.8.4)
      public_suffix (>= 2.0.2, < 6.0)
    byebug (11.1.3)
    cgi (0.3.6)
    charlock_holmes (0.7.7)
    coderay (1.1.3)
    dotenv (2.8.1)
    faraday (2.7.4)
      faraday-net_http (>= 2.0, < 3.1)
      ruby2_keywords (>= 0.0.4)
    faraday-net_http (3.0.2)
    json (2.6.3)
    licensed (4.3.1)
      json (~> 2.6)
      licensee (~> 9.16)
      parallel (~> 1.22)
      pathname-common_prefix (~> 0.0.1)
      reverse_markdown (~> 2.1)
      ruby-xxHash (~> 0.4.0)
      thor (~> 1.2)
      tomlrb (~> 2.0)
    licensee (9.16.0)
      dotenv (~> 2.0)
      octokit (>= 4.20, < 7.0)
      reverse_markdown (>= 1, < 3)
      rugged (>= 0.24, < 2.0)
      thor (>= 0.19, < 2.0)
    method_source (1.0.0)
    mini_mime (1.1.2)
    minitest (5.18.0)
    mocha (1.16.1)
    nokogiri (1.15.0-x86_64-darwin)
      racc (~> 1.4)
    octokit (6.1.1)
      faraday (>= 1, < 3)
      sawyer (~> 0.9)
    parallel (1.23.0)
    pathname-common_prefix (0.0.1)
    plist (3.7.0)
    pry (0.14.2)
      coderay (~> 1.1)
      method_source (~> 1.0)
    public_suffix (5.0.1)
    racc (1.6.2)
    rake (13.0.6)
    rake-compiler (0.9.9)
      rake
    reverse_markdown (2.1.1)
      nokogiri
    ruby-xxHash (0.4.0.2)
    ruby2_keywords (0.0.5)
    rugged (1.6.3)
    sawyer (0.9.2)
      addressable (>= 2.3.5)
      faraday (>= 0.17.3, < 3)
    thor (1.2.2)
    tomlrb (2.0.3)
    yajl-ruby (1.4.3)

PLATFORMS
  x86_64-darwin-21

DEPENDENCIES
  bundler (~> 2.0)
  byebug
  github-linguist!
  licensed (~> 4.0)
  licensee (~> 9.15)
  minitest (~> 5.15)
  mocha (~> 1.3)
  plist (~> 3.1)
  pry (~> 0.14)
  rake (~> 13.0)
  rake-compiler (~> 0.9)
  yajl-ruby (~> 1.4)

BUNDLED WITH
   2.4.5
```

## Gemspecs

### github-linguist.gemspec

```ruby
require File.expand_path('../lib/linguist/version', __FILE__)

Gem::Specification.new do |s|
  s.name    = 'github-linguist'
  s.version = ENV['GEM_VERSION'] || Linguist::VERSION
  s.summary = "GitHub Language detection"
  s.description = 'We use this library at GitHub to detect blob languages, highlight code, ignore binary files, suppress generated files in diffs, and generate language breakdown graphs.'

  s.authors  = "GitHub"
  s.homepage = "https://github.com/github/linguist"
  s.license  = "MIT"
  s.metadata = {
    "github_repo" => "ssh://github.com/github/linguist"
  }

  s.files = Dir['{lib,ext}/**/*', 'grammars/*', 'LICENSE'] - Dir['lib/linguist/linguist.{so,bundle}']
  s.platform = Gem::Platform::RUBY
  s.executables = ['github-linguist', 'git-linguist']
  s.extensions = ['ext/linguist/extconf.rb']
  s.require_paths = ['lib', 'ext']

  s.add_dependency 'cgi',             '>= 0'
  s.add_dependency 'charlock_holmes', '~> 0.7.7'
  s.add_dependency 'mini_mime',       '~> 1.0'
  s.add_dependency 'rugged',          '~> 1.0'

  s.add_development_dependency 'minitest', '~> 5.15'
  s.add_development_dependency 'rake-compiler', '~> 0.9'
  s.add_development_dependency 'mocha', '~> 1.3'
  s.add_development_dependency 'plist', '~>3.1'
  s.add_development_dependency 'pry', '~> 0.14'
  s.add_development_dependency 'rake', '~> 13.0'
  s.add_development_dependency 'yajl-ruby', '~> 1.4'
  s.add_development_dependency 'licensed', '~> 4.0'
  s.add_development_dependency 'licensee', '~> 9.15'
  s.add_development_dependency 'bundler', '~> 2.0'
end
```

--- TEMPLATE END ----------------------------------------------------------------

Unfortunately, an unexpected error occurred, and Bundler cannot continue.

First, try this link to see if there are any existing issue reports for this error:
https://github.com/rubygems/rubygems/search?q=can%27t+find+gem+bundler+%28%3D+2.4.5%29+with+executable+bundle&type=Issues

If there aren't any reports for this error yet, please fill in the new issue form located at https://github.com/rubygems/rubygems/issues/new?labels=Bundler&template=bundler-related-issue.md, and copy and paste the report template above in there.

This is stopping me from running tests locally (hence the uncaught test failure above).

UPDATE: Waaaait, never mind. I forgot there's two package managers(?) for Ruby: gem and bundle. I ran bundle install, but not gem update. My bad. 😠

Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline comments

test/test_samples.rb Outdated Show resolved Hide resolved
test/test_samples.rb Outdated Show resolved Hide resolved
@DandelionSprout
Copy link
Contributor

DandelionSprout commented May 23, 2023

If I understand the CSON syntax correctly (There's a high chance that I don't), then I can't seem to get (?:\\d+\\.){3,}\\d+(?=\\s|$) from https://github.com/Alhadis/language-etc/blob/master/grammars/etc.cson#L422 to match any of these most common IP syntaxes, at least not in Sublime Text 4:

127.0.0.1 example.org
23d4:affb:0000:6547:8654:8906:7548:9675 example.org
::1 example.org www.example.org
:: example.org

I therefore wonder if ^((\d{1,3}\.){3,}\d{1,3}|[0-9a-f:]{1,5}:([0-9a-f:]{1,34})?) would work better.

@Alhadis
Copy link
Collaborator Author

Alhadis commented May 23, 2023

I therefore wonder if ^((\d{1,3}\.){3,}\d{1,3}|[0-9a-f:]{1,5}:([0-9a-f:]{1,34})?) would work better.

@DandelionSprout I've already started work on improving the accuracy of matching IP addresses in HOSTS files; it's currently staged locally. I hope to get it done before this PR's merged and/or @lildude cuts the next release.

FYI, "work" includes reading up on CIDR notation, IPv6 addresses, subnet masks, and various other concepts I was entirely unfamiliar with until now. Highlighting IPv6 addresses in a self-validating manner might be impossible (or at least unreasonable), so expect some leniency in terms of digit-omission.

test/test_samples.rb Outdated Show resolved Hide resolved
@lildude lildude requested a review from a team as a code owner June 6, 2023 15:17
@Alhadis Alhadis added this pull request to the merge queue Jun 8, 2023
Merged via the queue into master with commit af34cb5 Jun 8, 2023
@Alhadis Alhadis deleted the hosts branch June 8, 2023 00:11
@DandelionSprout
Copy link
Contributor

DandelionSprout commented Jun 8, 2023

So, now that this has been merged, which colours will end up being used in Hosts files and for which things, and is there a timeline for when such colours would begin to appear in GitHub's web GUI?

Edit: I now see that # comments are shown in grey, at least. I'm hoping for the IP addresses to get shown as red (or blue or such; --color-prettylights-syntax-keyword), but have struggled to figure out Linguist language formats.

@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Language: Hosts
4 participants