Word/character counter includes Markdown syntax and HTML tags #3009

xplosionmind · 2020-04-09T14:56:51Z

Joplin version: Joplin 1.0.197 (prod, darwin)
Platform: MacOS 10.15.4 (19E266)

Description

<br> tags and markdown syntax elements, such ## before a heading or - [ ] for checkboxes are included in word counting.

steps to reproduce

When I open the word counter for this text:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip

<br>
<br>
<br>

ex ea commodo consequat.

- [ ] Duis aute irure dolor in reprehenderit in voluptate velit
- [ ] esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

I get this:

when I remove syntax and tags:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip


ex ea commodo consequat.

 Duis aute irure dolor in reprehenderit in voluptate velit
 esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

I get a different count:

Describe what you expected to happen

The system should recognize HTML and Markdown syntax and exclude it from character/word counting.

The text was updated successfully, but these errors were encountered:

laurent22 · 2020-04-09T21:40:49Z

For this, the note preview feature, and the auto title, we need a method that parses the Markdown and gives back the plain text only.

tessus · 2020-04-10T02:55:33Z

I don't see this is a bug. If I had implemeted this, I would have done it the same way. It's a counter for the source text (in the editor), not for the rendered text. Or do you want to count images and references and whatnot too?

If we really want to make this available, we'd need two columns (source, rendered). IMO, removing one for the other is not ok.

xplosionmind · 2020-04-10T08:02:51Z

Sorry, I interpreted it as a bug because I thought counting the source text isn't of much use. I think having to columns would be perfect. Thanks a lot for your interest and help, anyway!

tessus · 2020-04-10T08:28:02Z

Laurent has the last say in what is a bug or not, so maybe I'm wrong. I'm just saying that a second column would make the most sense.

coderrsid · 2020-04-10T17:38:29Z

Yeah, tessus i agree. A second column for all counts of rendered would be better, because it's implemented for word count whether after render or before render, it shouldn't matter i guess.

RedDocMD · 2020-04-11T16:04:11Z

I am currently working on this enhancement. By what I understand from the comments above, it is desired that the word counting feature shows side by side the counts in both the source and the stripped down Markdown.

RedDocMD · 2020-04-11T17:12:36Z

@tessus , @laurent22 There is a library called remove-markdown which does exactly as its name suggests. Link to npmjs
I intend to use it to strip out the markdown and make the stripped out text for the additional column.
It seems to be a pretty small library. So is it okay if I use it?

taw00 · 2020-04-21T17:21:15Z

I like the idea of two columns. One for the source text and one for the rendered text.

Personally, I think 9 times out of 10, people want the numbers to reflect the rendered text. But, others would prefer it to represent the source (the editor text). For similar reasons, number of lines is almost meaningless except for in the editor context—it's the number of hard new lines in the file -- w/ rendered text, it's not calculable, for same reasons number of pages is incalculable—but then again, it could be the soft newlines as well. Oi!

The problems with columns: There should also be a column for Selected text (see #160 (comment)). So now you are looking at three maybe four columns: Editor, Editor Selected, Rendered, Rendered Selected. Maybe "Rendered Selected" makes no sense.

I suppose you could have a tabbed document statistics window. One tab is for the editor text and one for the rendered document.

On the whole ... these stats are inexact anyway. But they are helpful. I suppose the more markdown you have the less sensible the stats starts to become. Unless of course, it only matters to you in the context of the editor.

… and HTML tags (#3037) * Updated commit * Update package.json * Update package.json Co-authored-by: Laurent Cozic <laurent22@users.noreply.github.com>

xplosionmind added the bug It's a bug label Apr 9, 2020

laurent22 added high High priority issues enhancement Feature requests and code enhancements and removed bug It's a bug labels Apr 9, 2020

RedDocMD mentioned this issue Apr 13, 2020

Desktop: Fixes #3009: Word/character counter includes Markdown syntax and HTML tags #3037

Merged

laurent22 closed this as completed in #3037 May 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Word/character counter includes Markdown syntax and HTML tags #3009

Word/character counter includes Markdown syntax and HTML tags #3009

xplosionmind commented Apr 9, 2020

laurent22 commented Apr 9, 2020

tessus commented Apr 10, 2020

xplosionmind commented Apr 10, 2020

tessus commented Apr 10, 2020

coderrsid commented Apr 10, 2020

RedDocMD commented Apr 11, 2020

RedDocMD commented Apr 11, 2020

taw00 commented Apr 21, 2020 •

edited

Loading

Word/character counter includes Markdown syntax and HTML tags #3009

Word/character counter includes Markdown syntax and HTML tags #3009

Comments

xplosionmind commented Apr 9, 2020

Description

steps to reproduce

Describe what you expected to happen

laurent22 commented Apr 9, 2020

tessus commented Apr 10, 2020

xplosionmind commented Apr 10, 2020

tessus commented Apr 10, 2020

coderrsid commented Apr 10, 2020

RedDocMD commented Apr 11, 2020

RedDocMD commented Apr 11, 2020

taw00 commented Apr 21, 2020 • edited Loading

taw00 commented Apr 21, 2020 •

edited

Loading