Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Linguist to detect the language #537

Merged
merged 1 commit into from
Mar 19, 2017

Conversation

pchaigno
Copy link
Contributor

As discussed in github-linguist/linguist#2497, this pull requests makes use of Linguist::Blob to detect the language of a markup document.

Linguist::Blob is not yet part of Linguist and this pull request still has failing tests so it's work in progress.

/cc @bkeepers

Gemfile Outdated
@@ -11,3 +11,4 @@ gem "creole", "~>0.3.6"
gem "wikicloth", "=0.8.1", :platforms => :ruby
gem "asciidoctor", "= 1.5.2"
gem "rake"
gem "github-linguist"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the purposes of getting this branch passing, you could use :github => "pchaigno/linguist", :branch => "blob_in_memory" here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done but the Travis build still fails because the samples.json file of my Linguist fork isn't generated :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkeepers Anything I can do to fix that?

@bkeepers
Copy link
Contributor

bkeepers commented Sep 4, 2015

This is a great start!

I'm curious, what if linguist was responsible for deciding which markup format to use? Something like markup_format could be added to the language definitions in linguist, and then here:

    def renderer(filename, content = nil)
      blob = Linguist::Blob.new(filename, content)
      if language = Linguist::Language.detect(blob)
        markups.find { |impl| impl.name == language.markup_format }
      end
    end

Enabling markup rendering for a language would just involve adding a markup_format to the language definition (for example Literate CoffeeScript or RMarkdown).

Would that work?

/cc @arfon @aroben @vmg

@aroben
Copy link

aroben commented Sep 4, 2015

Enabling markup rendering for a language would just involve adding a markup_format to the language definition (for example Literate CoffeeScript or RMarkdown).

So for Textile files, would the markup_format be "Textile" or "Redcloth"? If the former, I'm not sure how that's different from the language name. If the latter, that seems mysteriously tied to the implementation of github/markup; i.e., someone adding or modifying a markup_format entry would need to know what renderers github/markup supports.

@pchaigno pchaigno force-pushed the detect-language-using-linguist branch from 402a7e8 to fa488ce Compare September 4, 2015 13:54
@bkeepers
Copy link
Contributor

bkeepers commented Sep 4, 2015

So for Textile files, would the markup_format be "Textile" or "Redcloth"?

I would think "Textile".

If the former, I'm not sure how that's different from the language name.

For the default case it wouldn't be, but it would allow other languages to opt-in to being rendered. Literate CoffeeScript and RMarkdown are literally the only two examples I can think of right now, so maybe it's not a justifiable abstraction right now.

@kivikakk
Copy link
Contributor

👋 Is this PR still something you're interested in working on? It looks like tests are passing; I think you merge master (and resolve conflicts) the one failing test in 343d993 is marked as allowed-to-fail in master!

@pchaigno pchaigno force-pushed the detect-language-using-linguist branch from 343d993 to 179819e Compare March 16, 2017 06:44
@pchaigno
Copy link
Contributor Author

Is this PR still something you're interested in working on?

Definitely! I've updated the branch :-)

@pchaigno pchaigno changed the title [WIP] Use Linguist to detect the language Use Linguist to detect the language Mar 16, 2017
@kivikakk
Copy link
Contributor

I love it! ❤️ Thank you so much!

@kivikakk kivikakk merged commit b298378 into github:master Mar 19, 2017
@pchaigno pchaigno deleted the detect-language-using-linguist branch March 19, 2017 09:41
@kivikakk
Copy link
Contributor

This is now in use on the live site. Thanks a bunch! ✨

@andrew
Copy link

andrew commented Mar 20, 2017

github-linguist should probably be added as dependency in the gemspec too otherwise people will run into errors like the following if they don't already have it installed:

LoadError: cannot load such file -- linguist

@kivikakk
Copy link
Contributor

@andrew colour me confused:

markup/Gemfile

Line 15 in 5685837

gem "github-linguist", ">= 5.0.7"

Can you paste a more complete traceback?

@andrew
Copy link

andrew commented Mar 20, 2017

The Gemfile is different to the gemspec, I'm away from the computer at the moment but it either needs to be in both or the Gemfile can pull dependencies from the gemspec: http://yehudakatz.com/2010/12/16/clarifying-the-roles-of-the-gemspec-and-gemfile/

@kivikakk
Copy link
Contributor

@andrew Thanks for that. I've added it to the gemspec accordingly.

@PikachuEXE
Copy link
Contributor

Linguist requires

  • rugged: Just cannot be installed
  • charlock_holmes cannot be installed without installing something extra
    Can it be optional?

@kivikakk
Copy link
Contributor

@PikachuEXE I'm not quite sure what you mean by "Just cannot be installed" re: rugged. Either way, we cannot make it an optional dependency as of this PR.

@dometto
Copy link
Contributor

dometto commented Mar 26, 2017

This PR breaks support for JRuby. This may be what @PikachuEXE was referring to, as both rugged and charlock_holmes (which are dependencies of linguist) require C-extensions, which is the cause of errors on JRuby.

Of course, these errors on JRuby have gone unnoticed because it seems that .travis.yml explicitly allows JRuby failures.

So this ends up breaking gollum at least, and possibly more projects (such as @PikachuEXE 's). I must say this is rather disappointing and and unexpected behavior, especially since this was apparently not even considered a breaking change and was released as part of a patch-level update.

I would appreciate it if these changes could be reverted in the 1.4.x gems. Ideally, linguist support would be introduced as an MRI-only option in 1.5.x, as @PikachuEXE suggests.

@PikachuEXE
Copy link
Contributor

@dometto
Yes that's what I mean
( Although I am not working on any JRuby project :P )

This PR looks more like a new feature than a bug fix
It requires extra external libraries (not just C extensions)

Forcing people to install those dependencies looks a bit "breaking" to me
Which is why I want these things to be optional (Opt-in)

@kivikakk
Copy link
Contributor

Understood. I think I'm 👍 on reverting the change in the final 1.4 release, and re-introducing as an optional dependency in 1.5.0. I'll fix this up tomorrow. ❤️

@PikachuEXE
Copy link
Contributor

@kivikakk
Thanks!

Off topic:
I like that anime with the character on your avatar :P

@kivikakk
Copy link
Contributor

@PikachuEXE hahaha :) Kobayashi and I have a lot in common, it turns out.

kivikakk pushed a commit that referenced this pull request Mar 27, 2017
kivikakk pushed a commit that referenced this pull request Mar 27, 2017
@kivikakk
Copy link
Contributor

kivikakk commented Mar 27, 2017

Change is reverted in v1.4.9, reintroduced in v1.5.0 with automatic fallback (and no hard dependency).

@PikachuEXE
Copy link
Contributor

Created #1033 to update change log

@dometto
Copy link
Contributor

dometto commented Mar 27, 2017

Many thanks for the quick response and release!

@bartkamphorst
Copy link
Contributor

Terrific, thanks @kivikakk for resolving this so quickly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants