rouge-ruby · pyrmont · Jun 21, 2019 · Jun 1, 2019 · Jun 3, 2019 · Jun 3, 2019
diff --git a/docs/LexerDevelopment.md b/docs/LexerDevelopment.md
@@ -30,11 +30,11 @@ This guide assumes a familiarity with git. If you're new to git, GitHub has
 Rouge automatically loads lexers saved in the `lib/rouge/lexers/` directory and
 so if you're submitting a new lexer, that's the right place to put it.
 
-Your lexer needs to be a subclass of the {Rouge::Lexer Lexer} abstract class.
-Most lexers are in fact subclassed from {Rouge::RegexLexer RegexLexer} as the
-simplest way to define the states of a lexer is to use rules consisting of
-regular expressions. The remainder of this guide assumes your lexer is
-subclassed from {Rouge::RegexLexer RegexLexer}.
+Your lexer needs to be a subclass of the {Rouge::Lexer} abstract class.  Most
+lexers are in fact subclassed from {Rouge::RegexLexer} as the simplest way to
+define the states of a lexer is to use rules consisting of regular expressions.
+The remainder of this guide assumes your lexer is subclassed from
+{Rouge::RegexLexer}.
 
 You can learn a lot by reading through some of the existing lexers. A good
 example that's not too long is [the JSON lexer][json-lexer].
@@ -68,31 +68,30 @@ To be usable by Rouge, a lexer should declare a **title**, a **description**, a
 title "JSON"
 ```
 
-The title of the lexer. It is declared using the {Rouge::Lexer.title
-Lexer.title} method.
+The title of the lexer. It is declared using the {Rouge::Lexer.title} method.
 
-Note: As a subclass of {Rouge::RegexLexer RegexLexer}, the JSON lexer inherits
-this method (and its inherited methods) into its namespace and can call those
-methods without needing to prefix each with `Rouge::Lexer`.  This is the case
-with all of the property defining methods.
+Note: As a subclass of {Rouge::RegexLexer}, the JSON lexer inherits this method
+(and its inherited methods) into its namespace and can call those methods
+without needing to prefix each with `Rouge::Lexer`.  This is the case with all
+of the property defining methods.
 
 #### Description
 
 ```rb
 desc "JavaScript Object Notation (json.org)"
 ```
 
-The description of the lexer. It is declared using the {Rouge::Lexer.desc
-Lexer.desc} method.
+The description of the lexer. It is declared using the {Rouge::Lexer.desc}
+method.
 
 #### Tag
 
 ```rb
 tag "json"
 ```
 
-The tag associated with the lexer. It is declared using the {Rouge::Lexer.tag
-Lexer.tag} method.
+The tag associated with the lexer. It is declared using the {Rouge::Lexer.tag}
+method.
 
 A tag provides a way to specify the lexer that should apply to text within a
 given code block. In various flavours of Markdown, it's used after the opening
@@ -110,8 +109,8 @@ https://github.com/rouge-ruby/rouge/blob/master/lib/rouge/lexers/ruby.rb
 #### Aliases
 
 The aliases associated with a lexer. These are declared using the
-{Rouge::Lexer.aliases Lexer.aliases} method. Aliases are alternative ways that
-the lexer can be identified.
+{Rouge::Lexer.aliases}  method. Aliases are alternative ways that the lexer can
+be identified.
 
 The JSON lexer does not define any aliases but [the Ruby one][ruby-lexer] does.
 We can see how it could be used by looking at another example in Markdown. This
@@ -129,7 +128,7 @@ filenames "*.json"
 ```
 
 The filename(s) associated with a lexer. These are declared using the
-{Rouge::Lexer.filenames Lexer.filenames} method.
+{Rouge::Lexer.filenames}  method.
 
 Filenames are declared as "globs" that will match a particular pattern. A
 "glob" may be merely the specific name of a file (eg. `Rakefile`) or it could
@@ -142,25 +141,25 @@ mimetypes "application/json", "application/vnd.api+json", "application/hal+json"
 ```
 
 The mimetype(s) associated with a lexer. These are declared using the
-{Rouge::Lexer.mimetypes Lexer.mimetypes} method.
+{Rouge::Lexer.mimetypes} method.
 
 ### Lexer States
 
 The other major element of a lexer is the collection of one or more states.
-For lexers that subclass {Rouge::RegexLexer RegexLexer}, a state will consist
+For lexers that subclass {Rouge::RegexLexer}, a state will consist
 of one or more rules with a rule consisting of a regular expression and an
 action. The action yields tokens and manipulates the _state stack_.
 
 #### The State Stack
 
-The state stack represents the series of states through which the lexer has
-passed. States are added and removed from the "top" of the stack. The oldest
-state is on the bottom of the stack and the newest state is on the top.
+The state stack represents an ordered sequence of states the lexer is currently
+processing. States are added and removed from the "top" of the stack. The
+oldest state is on the bottom of the stack and the newest state is on the top.
 
 The initial (and therefore bottommost) state is the `:root` state. The lexer
 works by looking at the rules that are in the state that is on top of the
 stack. These are tried _in order_ until a match is found. At this point, the
-action defined in the rule is run, the match is removed from the input stream
+action defined in the rule is run, the head of the input stream is advanced
 and the process is repeated with the state that is now on top of the stack.
 
 Now that we've explained the concepts, let's look at how you actually define
@@ -174,14 +173,14 @@ state :root do
 end
 ```
 
-A state is defined using the {Rouge::RegexLexer.state RegexLexer.state} method.
+A state is defined using the {Rouge::RegexLexer.state} method.
 The method consists of the name of the state as a `Symbol` and a block
 specifying the rules that Rouge will try to match as it parses the text.
 
 #### Rules
 
-A rule is defined using the {Rouge::RegexLexer::StateDSL#rule StateDSL#rule}
-method. The `rule` method can define either "simple" rules or "complex" rules.
+A rule is defined using the {Rouge::RegexLexer::StateDSL#rule}e method. The
+`rule` method can define either "simple" rules or "complex" rules.
 
 *Simple Rules*
 
@@ -232,9 +231,9 @@ The block called can take one argument, usually written as `m`, that contains
 the regular expression match object.
 
 These kind of rules allow for more fine-grained control of the state stack.
-Inside a complex rule's block, it's possible to {Rouge::RegexLexer#push push},
-{Rouge::RegexLexer#pop! pop}, {Rouge::RegexLexer#token yield a token} and
-{Rouge::RegexLexer#delegate delegate to another lexer}.
+Inside a complex rule's block, it's possible to call {Rouge::RegexLexer#push},
+{Rouge::RegexLexer#pop!}, {Rouge::RegexLexer#token} and
+{Rouge::RegexLexer#delegate}.
 
 You can see an example of these more complex rules in [the Ruby
 lexer][ruby-lexer].
@@ -256,19 +255,22 @@ Rouge will attempt to guess the appropriate lexer if it is not otherwise clear.
 If Rouge is unable to do this on the basis of any tag, associated filename or
 associated mimetype, it will try to detect the appopriate lexer on the basis of
 the text itself (the source). This is done by calling `self.detect?` on the
-possible lexer (a default `self.detect?` method is defined in {Rouge::Lexer
-Lexer} and simply returns `false`).
+possible lexer (a default `self.detect?` method is defined in {Rouge::Lexer}
+and simply returns `false`).
 
 A lexer can implement its own `self.detect?` method that takes as a parameter a
-{Rouge::TextAnalyzer TextAnalyzer} object. If the `self.detect?` method returns
-true, the lexer will be selected as the appropriate lexer.
+{Rouge::TextAnalyzer} object. If the `self.detect?` method returns true, the
+lexer will be selected as the appropriate lexer.
 
-The `self.detect?` method is intended to work by looking at the shebang or
-doctype that identifies a piece of text. To make this easier, Rouge provides
-the {Rouge::TextAnalyzer#shebang TextAnalyzer#shebang} method and the
-{Rouge::TextAnalyzer#doctype TextAnalyzer#doctype} method. For more general
-disambiguation between different lexers, see [Conflicting Filename
-Globs][conflict-globs] below.
+It is important to note that `self.detect?` should _only_ return `true` if it
+is 100% sure that the language is detected. The most common ways for source
+code to identify the language it's written in is with a shebang or a doctype
+and Rouge provides the {Rouge::TextAnalyzer#shebang} method and the
+{Rouge::TextAnalyzer#doctype} method specifically for use with `self.detect?`
+to make these checks easy to perform.
+
+For more general disambiguation between different lexers, see [Conflicting
+Filename Globs][conflict-globs] below.
 
 [conflict-globs]: #Conflicting_Filename_Globs
 
@@ -280,7 +282,7 @@ for these words easier, many lexers will put the applicable keywords in an
 array and make them available in a particular way (be it as a local variable,
 an instance variable or what have you).
 
-We recommend lexers use a class method:
+For performance and safety, we strongly recommend lexers use a class method:
 
 ```rb
 module Rouge
@@ -297,10 +299,24 @@ module Rouge
 end
 ```
 
-These keywords can then be included in a regular expression like so:
+These keywords can then be used like so:
 
 ```rb
-rule /(#{keywords.join('|')})\b/, Keyword
+rule /\w+/ do |m|
+  if self.class.keywords.include?(m[0])
+    token Keyword
+  elsif
+    token Name
+  end
+end
+```
+
+In some cases, you may want to interpolate your keywords into a regular
+expression. If you do, be careful to use the `\b` anchor to avoid inadvertently
+matching part of a longer word (eg. `if` matching `iff`)::
+
+```rb
+rule /\b(#{keywords.join('|')})\b/, Keyword
 ```
 
 #### Startup
@@ -312,16 +328,16 @@ start do
 end
 ```
 
-The {Rouge::RegexLexer.start RegexLexer.start} method can take a block that
+The {Rouge::RegexLexer.start} method can take a block that
 will be called when the lexer commences lexing. This provides a way to enter
 into a special state "before" entering into the `:root` state (the `:root`
 state is still the bottommost state in the state stack; the state pushed by
 `start` sits "on top" but is the state in which the lexer begins.
 
 Why would you want to do this? In some languages, there may be language
-structures that can appear at the beginning of a file. {Rouge::RegexLexer.start
-RegexLexer.start} provides a way to parse these structures. An example is a
-preprocessor directive in C. You can see how these are lexed in [the C
+structures that can appear at the beginning of a file.
+{Rouge::RegexLexer.start} provides a way to parse these structures. An example
+is a preprocessor directive in C. You can see how these are lexed in [the C
 lexer][c-lexer].
 
 [c-lexer]: https://github.com/rouge-ruby/rouge/blob/master/lib/rouge/lexers/c.rb 
@@ -340,13 +356,12 @@ lexer][cpp-lexer] and [the JSX lexer][jsx-lexer] for examples.
 #### Conflicting Filename Globs
 
 If two or more lexers define the same filename glob, this will cause an
-{Rouge::Guesser::Ambiguous Ambiguous} error to be raised by certain guessing
-methods (including the one used by the `assert_guess` method used in your
-spec).
+{Rouge::Guesser::Ambiguous} error to be raised by certain guessing methods
+(including the one used by the `assert_guess` method used in your spec).
 
 The solution to this is to define a disambiguation procedure in the
-{Rouge::Guessers::Disambiguation Disambiguation} class. Here's the procedure
-for the `*.pl` filename glob as an example:
+{Rouge::Guessers::Disambiguation} class. Here's the procedure for the `*.pl`
+filename glob as an example:
 
 ```rb
 disambiguate "*.pl" do
@@ -431,6 +446,11 @@ returns true should be tested.
 The demo file is tested automatically as part of Rouge's test suite. The file
 should be able to be parsed without producing any `Error` tokens.
 
+The demo is also used on [rouge.jneen.net][hp] as the default text to display
+when a lexer is chosen. It should be short (less than 20 lines if possible).
+
+[hp]: http;//rouge.jneen.net/
+
 ### Visual Samples
 
 While the visual sample is tested by the testing suite to ensure that it does