Skip to content

Conversation

ggrossetie
Copy link
Contributor

@ggrossetie ggrossetie commented Dec 11, 2020

resolves #2869

Changes

  • Remove obsolete syntax 'emphasis'
  • Support unconstrained emphasis: That's fan__freakin__tastic!
  • Improve constrained and unconstrained syntax highlighting

Checklist

  • Added markup tests, or they don't apply here because...
  • Updated the changelog at CHANGES.md
  • Added myself to AUTHORS.txt, under Contributors

using an explicit link:http://example.com[link prefix].

* single quotes around a phrase place 'emphasis'
* single quotes around a phrase does not place 'emphasis'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This syntax is obsolete and unsupported in Asciidoctor.
Since the Asciidoctor project now maintains the official definition of the AsciiDoc syntax I think we should drop it.

Copy link
Member

@joshgoebel joshgoebel Dec 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could all be my fault for not bring up sooner but you do realize AsciiDoc and AsciiDoctor are two DIFFERENT things?

https://asciidoctor.org/docs/asciidoc-asciidoctor-diffs/

I think we may have made a misstep here. Our existing grammar is named and described as:

Language: AsciiDoc
Website: http://asciidoc.org

Does this change everything?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AsciiDoc Python and Asciidoctor are both implementations of the AsciiDoc syntax.
Historically, http://asciidoc.org was managed by the AsciiDoc Python project but AsciiDoc is now a registered "trademark" by Eclipse on the behalf of the AsciiDoc WG.
As a result, http://asciidoc.org will be updated to clarify the situation.

AsciiDoc Python is now deprecated, you can read on http://asciidoc.org:

Asciidoctor provides a modern, compliant, and substantially faster implementation of the AsciiDoc processor written in Ruby. This implementation can also be run on the JVM (with AsciidoctorJ) or using JavaScript (with Asciidoctor.js). The Asciidoctor project now maintains the official definition of the AsciiDoc syntax.

I think this is confusing because AsciiDoc (Python) is named AsciiDoc but AsciiDoc is a syntax not a specific processor.

Likewise, Asciidoctor is not a syntax, it's a processor. Having said that, and until the AsciiDoc WG produces a specification, it's where the official definition of the AsciiDoc syntax is maintained.

Does this change everything?

No, we are all good!
We are implementing a syntax highlighter for AsciiDoc (the syntax).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the lead of the Asciidoctor implementation and the chair of the AsciiDoc Working Group, I back up this statement by @Mogztter. The language is AsciiDoc and the asciidoc.org website is currently in the process of being transitioned to reflect that fact. The content that currently lives there is being phased out (and moved).

Copy link
Member

@joshgoebel joshgoebel Dec 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mojavelinux So you're the lead on the Ruby implementation and an asciidoc Python maintainer also? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshgoebel That's correct. For AsciiDoc.py, I'm really more than just a maintainer. I'm also the co-administrator with Lex, a duty we took on when Stuart left the project.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I'm not familiar with the term "co-administator", but I think I get the gist. Here I typically say I'm the current "maintainer" (historically maintainers change every few years, churn) and that we have other core team members. We have no co-administrators I don't think. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you understood my point.


- escape characters are supported
- you can escape a quote inside emphasized text like 'here\'s johnny!'
- you can escape underscores like \_surrounded by underscores!_
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@joshgoebel
Copy link
Member

joshgoebel commented Dec 15, 2020

I guess the part that concerned me was the intro (emphasis mine):

Asciidoctor aims to be compliant with the AsciiDoc syntax, but there are some differences to keep in mind. Many of these differences exist so that Asciidoctor can:

Additionally, Asciidoctor offers new syntax, attributes, and features to help you write, style, and publish your documents.

The known cases where Asciidoctor differs from AsciiDoc are categorized and listed in the sections below.

So it seems we need to decide if we're supporting the "AsciiDoc syntax" referred to here or the Asciidoctor "new syntax" plus "differences". The intro and list makes it clear there are behavioral differences, no?

Perhaps none of the differences matter for our uses, but when you start talking about "obsolete" syntax I wonder if you're talking about some of these differences... and if so that means we're forced to decide which version of asciidoc we intend to support.

@ggrossetie
Copy link
Contributor Author

So it seems we need to decide if we're supporting the "AsciiDoc syntax" referred to here or the Asciidoctor "new syntax" plus "differences". The intro and list makes it clear there are behavioral differences, no?

Simply put, Asciidoctor is the current version of the AsciiDoc language and AsciiDoc Python is the "previous version". I believe that Highlight.js supports the latest syntax available for a given language? For instance, the syntax highlighting should recognize Java 15 new language features, right?

Perhaps none of the differences matter for our uses, but when you start talking about "obsolete" syntax I wonder if you're talking about some of these differences... and if so that means we're forced to decide which version of asciidoc we intend to support.

I was referring to the single quote character to emphasize text.
AsciiDoc Python actually supports both notations:

Emphasized text
Word phrases 'enclosed in single quote characters' (acute accents) or _underline characters_ are emphasized.

Asciidoctor has a compat-mode to recognize this syntax so we might be able to support both notations (even if single quote character is deprecated).
I will give it a try.

@joshgoebel
Copy link
Member

This is helpful perhaps:

https://github.com/asciidoctor/asciidoctor.org/blob/master/docs/_includes/migrating-from-asciidoc-python.adoc

Simply put, Asciidoctor is the current version of the AsciiDoc language and AsciiDoc Python is the "previous version".

Is that what all the Python people think? IE, is this the universal agreement? Is the Python version is dead and no longer maintained and literally being replaced by the AsciiDoctor Ruby project? Their GitHub seems to say the opposite.

I'm not trying to be difficult at all, just make sure we understand the bigger picture fully.

I believe that Highlight.js supports the latest syntax available for a given language? For instance, the syntax highlighting should recognize Java 15 new language features, right?

Generally. We have typically been "slow" to drop old keywords and such things as languages develop in order to continue to well highlight "older" code.

I'm not sure that's what is happening here though. To me (at a glance from the sidelines) this feels perhaps more like a sorta fork... Perl 6 vs Perl 5, etc... if some people continue to use the "old" AsciiDoc and there is even a "compat" mode (for AsciiDoctor) then we definitely have to stop and ask the question of how that impacts us and whether we're "breaking" anything with these changes. (such as when you suggest removing "obsolete" things)

If we supported "compat" mode before and now (lets say) we're choosing to drop it (which might even be the right call) - that's a change we need to be fully aware of - and making very intentionally. If there are incompatibilities (and both versions plan to co-existing in the world) I could see someone arguing we need both an asciidoc and asciidoctor. This I would like to avoid if at all possible.

Asciidoctor has a compat-mode to recognize this syntax so we might be able to support both notations (even if single quote character is deprecated).

I think it would be optimal if we just supported ALL the syntax (old and new) if that was not too hard to do. Optimally if someone throws either AsciiDoc OR AsciiDoctor (ie, compat vs new) code at us we should do a "nice" job of highlighting it with our singular "asciidoc" grammar.

If possible the grammar should probably be renamed to AsciiDoctor or AsciiDoc / AsciiDoctor.

@joshgoebel
Copy link
Member

joshgoebel commented Dec 17, 2020

Asciidoctor has a compat-mode to recognize this syntax so we might be able to support both notations

To be clear this shouldn't mean detecting a :compat-mode!: tag and then doing some crazy hacks to "switch" into compatibility mode (with sublanguages or something super complex)... preferably we simply want to recognize as much valid markup as possible and highlight it as best as we can - regardless of new vs compat. And if that's truly not possible and we have real conflicts (between compat and not) we'd need to look at that very carefully.

I think given the hopeful goal of a single grammar that "adding new features" is a reasonable to do just so long as we aren't "breaking old/obsolete features"... I'm not yet as convinced as you are that AsciiDoctor (Ruby) is the v2 and Asciidoc (Python) is the v1. @Mogztter Do you have any idea of the actual politics and interplay between the Python and Ruby projects (which both seem to be in active development) - and how they think of the format? I feel like your prior explanation was perhaps over-simplifying? The Python project refers to AsciiDoc plenty yet doesn't mention AsciiDoctor a single time.

I will circle back to this in the next few days and try to review this more thoroughly (and the original PR) and see if it feels like we're still on an "ok" path... @Mogztter I recognize the effort you've put into this... so don't worry about us throwing it away or anything (that would be rude), but I think we're just asking if perhaps we really might need asciidoc vs asciidoctor as two discrete things... in which case we'd revert the prior changes and then likely add all your work as asciidoctor... I'd like to avoid that, but if we have two diverging formats it may get messy.

@allejo @egor-rogov Any thoughts here or knowledge of the AsciiDoc(tor) community?

@ggrossetie
Copy link
Contributor Author

ggrossetie commented Dec 17, 2020

Is that what all the Python people think? IE, is this the universal agreement?

I think the "universal agreement" is that the Working Group will drive the standardization of the AsciiDoc language.
Anyone is welcome to join this effort and we certainly reached out to the AsciiDoc Python community (among other communities):

Is the Python version is dead and no longer maintained and literally being replaced by the AsciiDoctor Ruby project?

I won't say that the Python implementation is being replaced by Asciidoctor. Currently, the Asciidoctor project maintains the official definition of the AsciiDoc syntax but ultimately it's the Working Group job to define what is AsciiDoc and establish a language specification.

Once the specification is established, both AsciiDoc Python and Asciidoctor will have to comply with the specification and pass the technology compatibility kit (TCK).
In other words, both can coexist and as part of the Working Group we will also work on a new Java/JVM implementation.

Their GitHub seems to say the opposite.

asciidoc/asciidoc states: "This repository (asciidoc-py2) is no longer supported as python 2 has entered end-of-life." but some community members are working on https://github.com/asciidoc/asciidoc-py3.

I'm not trying to be difficult at all, just make sure we understand the bigger picture fully.

No worries, I think it's beneficial and it emphasizes the importance of the AsciiDoc Working Group.

Generally. We have typically been "slow" to drop old keywords and such things as languages develop in order to continue to well highlight "older" code.

Fair enough.

I'm not sure that's what is happening here though. To me (at a glance from the sidelines) this feels perhaps more like a sorta fork... Perl 6 vs Perl 5, etc... if some people continue to use the "old" AsciiDoc and there is even a "compat" mode (for AsciiDoctor) then we definitely have to stop and ask the question of how that impacts us and whether we're "breaking" anything with these changes. (such as when you suggest removing "obsolete" things)

It's definitely not a fork but I agree we should not get ahead of the AsciiDoc specification. Today, AsciiDoc and Asciidoctor (with "compat" mode) support single quote to emphasize text so I think we should keep it.

If we supported "compat" mode before and now (lets say) we're choosing to drop it (which might even be the right call) - that's a change we need to be fully aware of - and making very intentionally.

No, I think it was a mistake, we should keep it.

If there are incompatibilities (and both versions plan to co-existing in the world) I could see someone arguing we need both an asciidoc and asciidoctor. This I would like to avoid if at all possible.

Asciidoctor is not a language, we only need a syntax highlighter for the AsciiDoc language.

Asciidoctor has a compat-mode to recognize this syntax so we might be able to support both notations (even if single quote character is deprecated).

Yes 👍

I think it would be optimal if we just supported ALL the syntax (old and new) if that was not too hard to do. Optimally if someone throws either AsciiDoc OR AsciiDoctor (ie, compat vs new) code at us we should do a "nice" job of highlighting it with our singular "asciidoc" grammar.

From a syntax highlighter point of view, I think it's doable.

If possible the grammar should probably be renamed to AsciiDoctor or AsciiDoc / AsciiDoctor.

No, the grammar is AsciiDoc and should be named AsciiDoc.
Again, Asciidoctor is just one implementation.
For instance, Ruby has several implementations: JRuby (Java), IronRuby (.Net), Rubinius (Ruby)... but they all implement the same language Ruby.
It's the same with AsciiDoc, one language with several implementations.

@ggrossetie
Copy link
Contributor Author

To be clear this shouldn't mean detecting a :compat-mode!: tag and then doing some crazy hacks to "switch" into compatibility mode (with sublanguages or something super complex)...

Agreed and in any case it's not even possible to do so 😃

preferably we simply want to recognize as much valid markup as possible and highlight it as best as we can - regardless of new vs compat. And if that's truly not possible and we have real conflicts (between compat and not) we'd need to look at that very carefully.

Yes 💯

I think given the hopeful goal of a single grammar that "adding new features" is a reasonable to do just so long as we aren't "breaking old/obsolete features"... I'm not yet as convinced as you are that AsciiDoctor (Ruby) is the v2 and Asciidoc (Python) is the v1. @Mogztter Do you have any idea of the actual politics and interplay between the Python and Ruby projects (which both seem to be in active development) - and how they think of the format? I feel like your prior explanation was perhaps over-simplifying?

Yes, it was overly-simplified. I think the common agreement is that Asciidoctor project now maintains the official definition of the AsciiDoc syntax (as stated on https://asciidoc.org).
Please keep in mind that there's an on going process to hand over this responsibility to the AsciiDoc Working Group (a vendor-neutral space where the specification will be established).

To be honest, I don't think there's a huge gap (if any) between AsciiDoc Python and Asciidoctor with "compat" mode.
The Asciidoctor community was very careful/conservative regarding language changes. That's one of the reason why the Working Group was initiated, to allow us to move forward with the language. Otherwise, any language change would have been perceived as a one-sided decision from the Asciidoctor project.

The Python project refers to AsciiDoc plenty yet doesn't mention AsciiDoctor a single time.

Again, AsciiDoc is the name of the language and Asciidoctor is the name of an other implementation so it's not really a surprise.

I will circle back to this in the next few days and try to review this more thoroughly (and the original PR) and see if it feels like we're still on an "ok" path... @Mogztter I recognize the effort you've put into this... so don't worry about us throwing it away or anything (that would be rude),

No worries, it's a good thing to be thorough and dropping this syntax was a mistake (or at the very least overzealous).
I will reinstate the "single quote to emphasize text" feature.

but I think we're just asking if perhaps we really might need asciidoc vs asciidoctor as two discrete things... in which case we'd revert the prior changes and then likely add all your work as asciidoctor... I'd like to avoid that, but if we have two diverging formats it may get messy.

asciidoctor as a syntax/language does not make any sense, so we should definitely stay with a single language asciidoc.

@mojavelinux
Copy link
Contributor

mojavelinux commented Dec 17, 2020

I think the common agreement is that Asciidoctor project now maintains the official definition of the AsciiDoc syntax (as stated on asciidoc.org).

This is correct and has been verified and approved by the AsciiDoc Working Group. There are no outstanding objections (legal or otherwise), even from the maintainers of the AsciiDoc.py project (of which I am one). It is Asciidoctor from which the initial contribution for the AsciiDoc language specification at Eclipse has been submitted (https://github.com/asciidoctor/asciidoc-docs/). That further confirms that it is Asciidoctor that is currently maintaining the definition of AsciiDoc (until the specification is drafted and ratified). Some of the statements found on asciidoctor.org are very outdated and make statements that predate the current situation. We are working on getting these lingering statements removed / revised.

@joshgoebel
Copy link
Member

joshgoebel commented Dec 18, 2020

Asciidoctor is not a language, we only need a syntax highlighter for the AsciiDoc language.

Yes, I'm pretty sure I got it now. It's a bit confusing when the website talks about ~"syntax differences between AsciiDoc and AsciiDoctor"... and my two choices to install with Homebrew are asciidoc and asciidoctor. :) Easy to get into the trap of talking about them using the implementation names.

But if the goal is for the working group to define the blessed spec and then everyone catches up then it seems clear our AsciiDoc grammar should try and maintain pace with the "official spec". The problems will come if we run into conflicts between the "compat" mode and the newer spec. At that point we might need to make a choice to only support the newer syntax or to split into asciidoc and asciidoc_compat, although I think it's more likely we'd just drop the compat stuff... (or allow someone to maintain it as a 3rd party grammar)

But hopefully we won't come to that bridge for a bit.

So would referring to the two as "AsciiDoc" and "AsciiDoc (compat mode)" be an improvement? If not, what wording should I use?


So given this understanding I don't think we removed anything in the first PR, so we should be good and I'll try to find time in the next few days to review this. If this "deprecated" any compat syntax obviously we should put that back in if at all possible and if not look at the issues.

No, I think it was a mistake, we should keep it.

I don't think you'd removed anything yet, had you?

@joshgoebel
Copy link
Member

I don't think you'd removed anything yet, had you?

Yes, looks like we did. So we'd need to revert those changes - to still allow the compat syntax.

@joshgoebel joshgoebel added this to the 10.5 milestone Dec 18, 2020
@ggrossetie
Copy link
Contributor Author

Yes, looks like we did. So we'd need to revert those changes - to still allow the compat syntax.

I reverted this change.

- Remove obsolete syntax 'emphasis'
- Support unconstrained emphasis: That's fan__freakin__tastic!
- Improve constrained and unconstrained syntax highlighting
@ggrossetie ggrossetie force-pushed the issue-2869-followup-emphasis-mono branch from 233f6fb to 9ef1cc8 Compare December 18, 2020 12:29
Copy link
Member

@joshgoebel joshgoebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good other than the one comment!

@mojavelinux
Copy link
Contributor

But if the goal is for the working group to define the blessed spec and then everyone catches up then it seems clear our AsciiDoc grammar should try and maintain pace with the "official spec".

This ☝️

The problems will come if we run into conflicts between the "compat" mode and the newer spec.

At this point, I would just forget about the compat mode. It was a transitional that we added as a way to help people transition when we updated the syntax in 2014 and hardly any documents remain that use it. It's behind us.

So would referring to the two as "AsciiDoc" and "AsciiDoc (compat mode)" be an improvement? If not, what wording should I use?

Just use AsciiDoc. I think you are reading way too much into this. highlight.js shouldn't be the one determining what AsciiDoc is.

@joshgoebel
Copy link
Member

joshgoebel commented Dec 18, 2020

I think you are reading way too much into this.

Perhaps. But I was only doing due diligence. :) If compat mode were still super popular or this change had happened 2 months ago (vs 6 years ago) then those would all be very relevant factors to the choices we should make regarding the grammar.

Your showing up and clarifying the history is certainly helpful. Thanks again.

highlight.js shouldn't be the one determining what AsciiDoc is.

Well, not in the "working group official AsciiDoc standard" sense no - and that's not what I was doing; but so long as the grammar is first-party we are forced to decide when we will adopt new things, deprecate old things, etc... or whether we might adapt things that seem likely but that the working group hasn't fully signed off on yet (I'd think not), etc... and we of course have certain "library wide" ideas of how we handle certain things - though that's probably less of an issue with markup languages (which tend to have a lot more semantic consistency) than other programming languages.

It's also easy to publish a 3rd party grammar and bypass our thinking entirely on such matters. :) IE, the AsciiDoc working group could publish their own 3rd party grammar module and do anything they wanted with it. :-)

I'm not sure that'll be necessary here, but it's always an option.


At this point, I would just forget about the compat mode. It was a transitional that we added as a way to help people transition when we updated the syntax in 2014 and hardly any documents remain that use it. It's behind us.

Well that is certainly very helpful knowledge. 🙂 I don't think I realized this was all "so long ago in a galaxy far far away" I think I might agree with you now that we can just ignore "compat" it if it dates back to 2014 and "hardly any documents remain that use it".

@Mogztter Thoughts? I wouldn't have made such a big deal about it if I had known how long ago all this had transpired (and that such documents are super rare today). Sound like you could/should just git revert your last commit.

@joshgoebel
Copy link
Member

joshgoebel commented Dec 18, 2020

Quoting myself: ...or whether we might adapt things that seem likely but that the working group hasn't fully signed off on yet (I'd think not)...

Off the top of my head I would presume we should be willing to add things as they are approved (not proposed). How we might handle removals from the spec that happen in 2021 (vs 2014) might have to be decided on a case-by-case basis. I assume though (for everyones sanity) that the spec will be semi-stable over time. :)

@mojavelinux One other consideration: we often are used highlight a lot of static/historic content... someone blogs about XYZLang v4 code on their blog and we highlight it nicely. When XYZLang v5 comes out (even if XYZLang v4 is deprecated, no longer maintained, etc.) we are still somewhat cognizant of the fact that that XYZLang v4 code might exist on that blog forever and we are still being trusted to highlight it properly. (StackOverflow being a great example of this)

In some ways it's an impossible problem and certainly (for everyones sanity) we often have to break the highlighting of XYZLang v4 at some point, but we try not to be in a hurry to do so - and when doing so we try to understand the context fully so we can know who we are breaking things for (if anyone).

@ggrossetie
Copy link
Contributor Author

Thoughts? I wouldn't have made such a big deal about it if I had known how long ago all this had transpired (and that such documents are super rare today). Sound like you could/should just git revert your last commit.

I don't mind keeping this syntax around. As you mentioned, Highlight.js is conservative and as long as it does not significantly increase complexity or induce side effects I think we can keep it.

In any case, if we decide to remove it, I think we should do it in a dedicated pull request.

@joshgoebel
Copy link
Member

As you mentioned, Highlight.js is conservative and as long as it does not significantly increase complexity or induce side effects I think we can keep it.

I have a feeling eventually someone will say "hey this shouldn't be highlighted", but I'm ok waiting for that day.

In any case, if we decide to remove it, I think we should do it in a dedicated pull request.

I can live with that, even if it's only because you're just tired of going back and forth on this LOL.

Copy link
Member

@joshgoebel joshgoebel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changelog entry please?

@joshgoebel
Copy link
Member

Just need that changlog entry then this will ship in 10.5. :)

@joshgoebel joshgoebel merged commit 04f3904 into highlightjs:master Dec 22, 2020
@joshgoebel
Copy link
Member

@Mogztter Thanks much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(asciidoc) Unconstrained text formatting is not correctly highlighted
3 participants