Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for crystal language #441

Merged
merged 1 commit into from
Jul 23, 2018

Conversation

splattael
Copy link
Contributor

Heavily based on the Ruby lexer.

@splattael
Copy link
Contributor Author

@jneen Do you need more info/support from me to get this PR merged? :-)
I'd love to help here ❤️

@ryphill
Copy link

ryphill commented Apr 27, 2016

Not the author, but this seems nearly identical to the Ruby lexer so you might be better off subclassing it to prevent unnecessary duplication. The C/C++/ObjC lexers are good examples of how to do this.

@splattael
Copy link
Contributor Author

@ryphill You're right. Both lexers are almost identical because Crystal syntax is very similar to Ruby's.

I've copied the lexer on purpose because Crystal is still under heavy development so syntax changes are likely.

Once Crystal's syntax has settled one can think of re-using the Ruby lexer :-)

@jneen
Copy link
Member

jneen commented Jun 2, 2016

Hi @splattael, thanks for the pull request. In this case, given the sheer complexity of the ruby lexer, I'd prefer re-using the ruby lexer and breaking off a new lexer for crystal if enough breaking syntax changes are made.

@jneen jneen added discussion-open lexer-request A request for a lexer to be developed. labels Jun 2, 2016
@splattael
Copy link
Contributor Author

Hey @jneen! Yes, re-using the Ruby lexer is definitely an option!

In order to make the Ruby lexer re-usable, we have to tweak it, I guess.
TBH, I'm not sure I'm confident enough to make those changes in the Ruby lexer w/o the fear to break things there 😨

What I am confident about is to clean up the current Crystal lexer a bit (removing some items from builtins_*) 😎

What do you think?

Making the Ruby lexer re-usable would definitely take longer and need some assistance from you (and/or from other contributors), I guess.
Tweaking just the Crystal lexer is easy-peasy 😏

Thanks 💚,
Peter

@jneen
Copy link
Member

jneen commented Jun 2, 2016

Hm. I wonder, how much of ruby syntax does Crystal support?

e: also, could you give me a summary of the differences between this lexer and the ruby lexer?

@splattael
Copy link
Contributor Author

AFAIK, Crystal started as a subset of Ruby's syntax.

Since then, the syntax has diverged (map-syntax, annotations, variable types) so now it's NOT a subset any more :neckbeard:

The differences (for now) in the Crystal lexer are mostly in keywords and keywords_pseudo (oops, I lied about builtins_* earlier, sorry!). I've also added one single rule for the decorator thingy (e.g. @[Link("libnotify")]).

Here's the (shortened) diff:

       keywords = %w(
         BEGIN END alias begin break case defined\? do else elsif end
-        ensure for if in next redo rescue raise retry return super then
-        undef unless until when while yield
+        ensure for if ifdef in next redo rescue raise retry return super then
+        undef unless until when while yield lib fun type of as
       )

       keywords_pseudo = %w(
         initialize new loop include extend raise attr_reader attr_writer
         attr_accessor alias_method attr catch throw private module_function
         public protected true false nil __FILE__ __LINE__
+        getter getter? getter! property property? property! struct record
       )

       builtins_g = %w(
@@ -166,6 +165,8 @@
         rule /0b[01]+(?:_[01]+)*/, Num::Bin
         rule /[\d]+(?:_\d+)*/, Num::Integer

+        rule /@\[([^\]]+)\]/, Name::Decorator
+

Again, the Crystal lexer still needs some ❤️ (mostly tweaking keywords etc.) but it's works just fine 😎.

@jneen
Copy link
Member

jneen commented Jun 2, 2016

Okay, so the process to use subclassing instead of copy-pasta would be this:

  • use static methods for keywords and keywords_pseudo instead of local variables
  • subclass the ruby lexer, override those methods with super + %w(...)
  • use a prepend or append to add in your extra rule

This should be doable without too much effort, yes?

@splattael
Copy link
Contributor Author

@jneen If you wish to subclass instead of copy-paste I'll try it. Stay tuned :-)

@splattael
Copy link
Contributor Author

Hey @jneen!

Sorry for the delay. I'm trying to implement the Crystal lexer as a subclass of the Ruby lexer. As suggested.

I am having problems defining e.g. keywords as a static method and using it within the rule statement.

I'm talking about this (https://github.com/jneen/rouge/blob/c8a9194b4637eec6595182e6ded9766c7e9b0e0e/lib/rouge/lexers/ruby.rb#L179-L180):

        rule /(?:#{keywords.join('|')})\b/, Keyword, :expr_start
        rule /(?:#{keywords_pseudo.join('|')})\b/, Keyword::Pseudo, :expr_start

Defining keywords and keywords_pseudo as static methods won't work because this won't work:

        rule /(?:#{self.class.keywords.join('|')})\b/, Keyword, :expr_start
        rule /(?:#{self.class.keywords_pseudo.join('|')})\b/, Keyword::Pseudo, :expr_start

(Note the self.class.)

Defining a generic rule like the C lexer does (https://github.com/jneen/rouge/blob/c8a9194b4637eec6595182e6ded9766c7e9b0e0e/lib/rouge/lexers/c.rb#L116-L130) won't work either because AFAIK a rule block does only return a token and no next_state as required for the "keyword" rule above (Keyword and :expr_start).

I fear I need help :-(

I can push my not-working code if you need more insights.

Kind regards,
Peter

@jneen
Copy link
Member

jneen commented Jun 13, 2016

Hi @splattael! Thanks for getting this far :]. Of those options, I'd prefer the generic rule like the C lexer (it's what ruby should have used anyways...)

A rule block can call the #push method with a new state:

rule /someregex/ do
  # ...
  push :state_name
end

It can also call pop!, stash arbitrary state in instance variables, and a number of other cool things.

@splattael
Copy link
Contributor Author

Thanks! I will try it. Expect broken code soonish %-)

@faustinoaq
Copy link

@splattael splattael force-pushed the crystal-lang branch 3 times, most recently from 300986f to c68a152 Compare July 25, 2017 07:06
@splattael
Copy link
Contributor Author

@faustinoaq Thanks for waking me up ;)

So, yes, I did not have time to implement @jneen suggestions from #441 (comment) 😞

I'd also love ❤️ to see Crystal being highlighted beautifully on GitLab 😄 as I also use Crystal on GitLab a lot 😊

@jneen Is there a way to merge 🔀 this PR without subclassing the Ruby lexer FOR NOW? 😃 🙏
I would start working on this right after the merge. I promise 🤞 :neckbeard:

The changes between Ruby lexer and the Crystal lexer are mostly keywords and highlighting annotations like @[AlwaysInline]:

--- lib/rouge/lexers/ruby.rb    2017-07-25 08:54:39.629975025 +0200
+++ lib/rouge/lexers/crystal.rb 2017-07-25 09:06:11.014454805 +0200
@@ -2,19 +2,17 @@
 
 module Rouge
   module Lexers
-    class Ruby < RegexLexer
-      title "Ruby"
-      desc "The Ruby programming language (ruby-lang.org)"
-      tag 'ruby'
-      aliases 'rb'
-      filenames '*.rb', '*.ruby', '*.rbw', '*.rake', '*.gemspec', '*.podspec',
-                'Rakefile', 'Guardfile', 'Gemfile', 'Capfile', 'Podfile',
-                'Vagrantfile', '*.ru', '*.prawn', 'Berksfile'
+    class Crystal < RegexLexer
+      title "Crystal"
+      desc "Crystal The Programming Language (crystal-lang.org)"
+      tag 'crystal'
+      aliases 'cr'
+      filenames '*.cr'
 
-      mimetypes 'text/x-ruby', 'application/x-ruby'
+      mimetypes 'text/x-crystal', 'application/x-crystal'
 
       def self.analyze_text(text)
-        return 1 if text.shebang? 'ruby'
+        return 1 if text.shebang? 'crystal'
       end
 
       state :symbols do
@@ -103,41 +101,33 @@
 
       keywords = %w(
         BEGIN END alias begin break case defined\? do else elsif end
-        ensure for if in next redo rescue raise retry return super then
-        undef unless until when while yield
+        ensure for if ifdef in next redo rescue raise retry return super then
+        undef unless until when while yield lib fun type of as
       )
 
       keywords_pseudo = %w(
-        loop include extend raise
-        alias_method attr catch throw private module_function
+        initialize new loop include extend raise attr_reader attr_writer
+        attr_accessor alias_method attr catch throw private module_function
         public protected true false nil __FILE__ __LINE__
+        getter getter? getter! property property? property! struct record
       )
 
       builtins_g = %w(
-        attr_reader attr_writer attr_accessor
-
-        __id__ __send__ abort ancestors at_exit autoload binding callcc
-        caller catch chomp chop class_eval class_variables clone
-        const_defined\? const_get const_missing const_set constants
+        abort ancestors at_exit
+        caller catch chomp chop clone constants
         display dup eval exec exit extend fail fork format freeze
         getc gets global_variables gsub hash id included_modules
         inspect instance_eval instance_method instance_methods
-        instance_variable_get instance_variable_set instance_variables
-        lambda load local_variables loop method method_missing
+        lambda loop method method_missing
         methods module_eval name object_id open p print printf
-        private_class_method private_instance_methods private_methods proc
-        protected_instance_methods protected_methods public_class_method
-        public_instance_methods public_methods putc puts raise rand
-        readline readlines require require_relative scan select self send set_trace_func
-        singleton_methods sleep split sprintf srand sub syscall system
-        taint test throw to_a to_s trace_var trap untaint untrace_var warn
+        proc putc puts raise rand
+        readline readlines require scan select self send
+        sleep split sprintf srand sub syscall system
+        test throw to_a to_s trap warn
       )
 
       builtins_q = %w(
-        autoload block_given const_defined eql equal frozen
-        include instance_of is_a iterator kind_of method_defined
-        nil private_method_defined protected_method_defined
-        public_method_defined respond_to tainted
+        eql equal include is_a iterator kind_of nil
       )
 
       builtins_b = %w(chomp chop exit gsub sub)
@@ -169,6 +159,8 @@
         rule /\d+\.\d+(e[\+\-]?\d+)?/, Num::Float
         rule /[\d]+(?:_\d+)*/, Num::Integer
 
+        rule /@\[([^\]]+)\]/, Name::Decorator
+
         # names
         rule /@@[a-z_]\w*/i, Name::Variable::Class
         rule /@[a-z_]\w*/i, Name::Variable::Instance

@jneen WDYT? :)

@faustinoaq
Copy link

faustinoaq commented Aug 24, 2017

Hi @jneen do you think we can merge this PR? 😅

@Ilya-Kuchaev
Copy link

Any news here?

@matiasgarciaisaia
Copy link

👋 Hi there!

Just jumping in to offer some help to get this merged.

Crystal's own website uses this gem - so Crystal's site can't highlight Crystal yet 😨

@Jcambass
Copy link

Jcambass commented Apr 4, 2018

@jneen Thanks for this wonderful gem. I personally think that subclassing the ruby lexer is the wrong direction since unlike C++ being a superset of C, crystal isn't a superset of ruby (anymore).

This sort of DRY code should be avoided since it introduces coupling and direct dependency for technical reasons only.
The Crystal Lexer isn't really a Ruby Lexer it may share some code and for now some logic. This shared logic can be extracted into modules if we really wanna reuse them and avoid code duplication.

But subclassing the Lexer feels kinda wrong for me unless the Language is an actual superset since future changes in the ruby syntax will most likely not be present in crystal in the same form.

But those are just my 2 cents.

@splattael
Copy link
Contributor Author

Rebased against latest master.

qrghii2

💜

@RX14
Copy link

RX14 commented Jul 15, 2018

Crystal isn't ruby and it doesn't aim for compatability. Asking for the crystal lexer to subclass the ruby lexer is like asking for the java lexer to subclass the C lexer because they share syntax heritage. In fact, there's little point since crystal will not be adopting any future ruby syntax features.

Please merge this PR (or #863), it's far better than the nothing we have now on Gitlab, the crystal blog, etc.

Heavily based on the Ruby lexer.
splattael pushed a commit to splattael/rouge-lexers-crystal that referenced this pull request Jul 16, 2018
@splattael
Copy link
Contributor Author

splattael commented Jul 16, 2018

In the meantime...

As @ysbaddaden suggested I've (finally) extracted the Crystal lexer into an own gem: rouge-lexers-crystal.

Put in your Gemfile:

gem "rouge"
gem "rouge-lexers-crystal"

The lexer for Crystal should be available then.

Additionally, I've created a merge request on GitLab to use this plugin in order to make Crystal code look great 🤞

screenshot from 2018-07-16 17-01-04

Wish us luck 😁

@dblessing
Copy link
Collaborator

I will review this week. Given the discussion I think not subclassing Ruby is the right solution. I also would rather merge this than see GitLab add a dependency on another gem.

@splattael
Copy link
Contributor Author

@dblessing Awesome, thank you 💜

@dblessing dblessing merged commit d445dc5 into rouge-ruby:master Jul 23, 2018
@bcardiff
Copy link

bcardiff commented Sep 1, 2018

Hi @splattael I was checking to use this in crystal's website, I've found a couple of keywords missing like abstract, module, spawn. Could you clarify to me the difference between keywords and keywords_pseudo so could add them? Thanks!

@splattael
Copy link
Contributor Author

@bcardiff Crystal's lexer is heavily based on Ruby's lexer.

I suspect that keywords would be the language's reserved words whereas keywords_pseudo defines a list a keyword-like words. I am not sure how it applies to Crystal, TBH.

I'd add abstract to keywords and spawn to keywords_pseudo 👍
module is already handled in the lexer so no need to add it.

I hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion-open lexer-request A request for a lexer to be developed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants