Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(clj) Fix broken/inconsistent highlighting for $, Numbers, namespaced maps, HINT mode. Add charachter, regex and punctuation modes. #3397

Merged
merged 13 commits into from
Nov 10, 2021

Conversation

MrEbbinghaus
Copy link
Contributor

@MrEbbinghaus MrEbbinghaus commented Nov 10, 2021

Hey, it's me again.

I added some more fixes to Clojure highlighting. Sorry if the scope of this PR is too big, but I couldn't be bothered to open 7 different PRs. :-)

Here is a write-down in your format. I don't know if I should add this to the CHANGES.md in this format? Or everything together with my last line into one enh(clj) Fixup a lot of things.?

fix(clj) `$` in symbol breaks highlighting
fix(clj) Add complete regex for number detection
enh(clj) Add character mode for character literals
fix(clj) Inconsistent namespaced map highlighting
enh(clj) Add `regex` mode to regex literal
fix(clj) Remove inconsistent/broken highlighting for metadata
enh(clj) Add `punctuation` mode for commas.

Checklist

  • Added markup tests.
  • Updated the changelog at CHANGES.md

Following is the list of stuff this PR fixes, together with reference images and my reasoning.

$ in symbol breaks highlighting

Clojure symbols can contain $. I fixed it.
Current:
image

GitHub:

(.getEnclosingClass java.util.Map$Entry)
(java.util.Map$Entry. .getEnclosingClass)

Number highlighting just matched integer.

This is how valid Clojure numbers are highlighted currently:
image

I added a regex to include them all.

GitHub does it wrong as well:

; integer ; BigInt
00          42N
42          0N

; octal   ; hex
052       0x2a
00N       0x0N

; radix.  ; radix BigInt
2r101010  2r101010N
8r52      8r52N
16r2a     16r2aN
36r16     36r16N


;; ratios
1/2
-1/2

;; floats
42.0
-42.0
42.

; BigDecimal ; with Exponent
42.0M        42.0E2
-42M         -42.0E+9
42.M         42E-0
42M          42.0E2M
             42E+9M

Missing character literals

Clojure has different character literals, they were highlighted not at all or wrong. I added the character mode to them.
Current:
image

Namespaced map keys highlighting inconsistent with itself and other highlighters

Current:
#:ship and #:person have different colours.
image

I changed it to be always not coloured.

GitHub:

#:person{:first "Han"
         :last "Solo"
         :ship #:ship{:name "Millennium Falcon"}}
#::{:a 1, :b 2}

Cursive for IntelliJ:
image
Calva for VS Code highlights them in the colour of the rainbow parentheses:
image

Add Regex Highlighting

Clojure has a regex literal #"My\sRegex!", I added the regex scope for this.

GitHub:

#"\bMy\sRegex!\b"   ; Special regex stuff is highlighted
# "\bMy\sRegex!\b"  ; # and string have different highlighting.

Remove HINT mode.

Metadata (which can be a keyword, symbol, collection, ...) was highlighted like a comment, and it wasn't consistent.
image

I removed this mode. You can have huge nested maps of metadata if you like. Removing highlighting from them doesn't make sense.

GitHub:

; different highlighting for meta data
^String
^{:tag String}
^{:tag java.lang.String}
^{:any ["collection" {:I #{:like}}]}

^:private
^{:private true}

; deprecated but still working:
#^String
#^{:tag String}
#^:private

Cursive:
image
Calva:
image

Add punctuation scope for ,

In Clojure, the comma , is treated like a whitespace. It is sometimes used to make compact maps more readable:
{:a 1, :b 2} is equal to: {:a 1 :b 2}
I have seen editor themes that made them slightly opaque.

@joshgoebel
Copy link
Member

Remove HINT mode.

What is the purpose of metadata hints in Clojure?

@joshgoebel
Copy link
Member

I don't know if I should add this to the CHANGES.md

Please, the more detail the better though if you did a "lots of fixes" with a second level bullet list under that that would be even better.

@MrEbbinghaus
Copy link
Contributor Author

MrEbbinghaus commented Nov 10, 2021

What is the purpose of metadata hints in Clojure?

General meta information. Anything really.

The test function in clojure.core executes the :test meta key.

(defn my-function
  "this function adds two numbers"
  {:test #(do
            (assert (= (my-function 2 3) 5))
            (assert (= (my-function 4 4) 8)))}
  ([x y] (+ x y)))
  
(meta #'my-function)
; => {:arglists ([x y]), 
;     :doc "this function adds two numbers", 
;     :test #object[user$fn__143 0x4482469c "user$fn__143@4482469c"], 
;     :line 1, :column 1, :file "NO_SOURCE_PATH", 
;     :name my-function, 
;     :ns #object[clojure.lang.Namespace 0x3703bf3c "user"]}

(test #'my-function)
; => :ok

clojure.core uses it for documentation and compiler hints.
image

image

CIDER the Clojure Emacs IDE uses it for breakpoints.

(dotimes [i 10]
  #dbg ^{:break/when (= i 7)}
  (prn i))

koacha, a test runner, uses them to focus on specific sets of tests

(deftest ^:yyy other-test
  (is (= 3 3)))

meta-merge uses keys in metadata to specify per-key merge behaviour.

(meta-merge {:a [:b :c]} {:a ^:prepend [:d]})
; => {:a [:d :b :c]}
(meta-merge {:a [:b :c]} {:a ^:replace [:d]})
; => {:a [:d]}
(meta-merge {:a [:b :c]} {:a ^:displace [:d]})
; => {:a [:b :c]}

https://clojure.org/reference/reader#_metadata

Metadata (^)

Metadata is a map associated with some kinds of objects: Symbols, Lists, Vector, Sets, Maps, tagged literals returning an IMeta, and record, type, and constructor calls. The metadata reader macro first reads the metadata and attaches it to the next form read (see with-meta to attach meta to an object):
^{:a 1 :b 2} [1 2 3] yields the vector [1 2 3] with a metadata map of {:a 1 :b 2}.

A shorthand version allows the metadata to be a simple symbol or string, in which case it is treated as a single entry map with a key of :tag and a value of the (resolved) symbol or string, e.g.:
^String x is the same as ^{:tag java.lang.String} x

Such tags can be used to convey type information to the compiler.

Another shorthand version allows the metadata to be a keyword, in which case it is treated as a single entry map with a key of the keyword and a value of true, e.g.:
^:dynamic x is the same as ^{:dynamic true} x

Metadata can be chained in which case they are merged from right to left.

I noticed that other highlighters make mistakes with these, maybe this is close enough to a universal test set?
@MrEbbinghaus
Copy link
Contributor Author

highlight.js does not have regex languages, does it?

Maybe an idea for the future, as GitHub supports regex highlighting.

#"\bMy\sRegex!\b"   ; Special regex stuff is highlighted
"\bMy\sRegex!\b"  

But not in all languages 🤔

const RE = /\bMy\sRegex!\b/;

@joshgoebel
Copy link
Member

Special regex stuff is highlighted

See the history behind #2751. I'm open to this, but we do not need (or want) a whole regex grammar. (there is sadly no good way to reuse it properly at the moment). If we picked a few key things to focus on and decided to do that within our existing regex support, I think that could be nice. And it would only involve changing: REGEXP_MODE and most every grammar would benefit.

One easy start would be start by moving the existing escapes we recognize into char.escape scope. If you wanted to tackle that separately...

{match: /\\o[0-3]?[0-7]{1,2}/}, // Unicode Octal 0 - 377
{match: /\\u[0-9a-fA-F]{4}/}, // Unicode Hex 0000 - FFFF
{match: /\\(newline|space|tab|formfeed|backspace|return)/}, // special characters
{match: /\\\S/} // any non-whitespace char
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so tiny a match... can we do any looking forward to qualify it? I'm imagining false positives with any language that allows \SOMETHING style tags... if we can't qualify it I think we should make it relevance: 0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, it's [literal \][non space] (even broader than I was first reading it)... yeah I think we should add relevance: 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a literal and can appear anywhere, so you are right with relevance: 0.
I didn't put any thought in this or the other modes, what the relevance may be. What is "typically Clojure"?

It would be a nice exercise to generate possible values for modes and try to match them against other languages to automatically calculate the relevance. 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added relevance: 0.

Although, I am not sure how many programming languages support the full first Unicode plane. \⠓ is a valid Clojure literal.

@joshgoebel
Copy link
Member

Looking good, almost there!

{match: /\\o[0-3]?[0-7]{1,2}/}, // Unicode Octal 0 - 377
{match: /\\u[0-9a-fA-F]{4}/}, // Unicode Hex 0000 - FFFF
{match: /\\(newline|space|tab|formfeed|backspace|return)/}, // special characters
{match: /\\\S/} // any non-whitespace char
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, it's [literal \][non space] (even broader than I was first reading it)... yeah I think we should add relevance: 0.

@joshgoebel joshgoebel merged commit 0f73808 into highlightjs:main Nov 10, 2021
@joshgoebel
Copy link
Member

@MrEbbinghaus Thanks so much for help on this!

@MrEbbinghaus MrEbbinghaus deleted the more-clj-fixes branch November 10, 2021 22:59
@MrEbbinghaus
Copy link
Contributor Author

@joshgoebel Thank you for the clear and fast review. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants