Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word/character counter includes Markdown syntax and HTML tags #3009

Closed
xplosionmind opened this issue Apr 9, 2020 · 8 comments · Fixed by #3037
Closed

Word/character counter includes Markdown syntax and HTML tags #3009

xplosionmind opened this issue Apr 9, 2020 · 8 comments · Fixed by #3037
Labels
enhancement Feature requests and code enhancements high High priority issues

Comments

@xplosionmind
Copy link

Joplin version: Joplin 1.0.197 (prod, darwin)
Platform: MacOS 10.15.4 (19E266)

Description

<br> tags and markdown syntax elements, such ## before a heading or - [ ] for checkboxes are included in word counting.

steps to reproduce

When I open the word counter for this text:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip

<br>
<br>
<br>

ex ea commodo consequat.

- [ ] Duis aute irure dolor in reprehenderit in voluptate velit
- [ ] esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

I get this:
Screenshot 2020-04-09 at 4 42 05 PM

when I remove syntax and tags:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip


ex ea commodo consequat.

 Duis aute irure dolor in reprehenderit in voluptate velit
 esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

I get a different count:
Screenshot 2020-04-09 at 4 42 37 PM

Describe what you expected to happen

The system should recognize HTML and Markdown syntax and exclude it from character/word counting.

@xplosionmind xplosionmind added the bug It's a bug label Apr 9, 2020
@laurent22
Copy link
Owner

For this, the note preview feature, and the auto title, we need a method that parses the Markdown and gives back the plain text only.

@laurent22 laurent22 added high High priority issues enhancement Feature requests and code enhancements and removed bug It's a bug labels Apr 9, 2020
@tessus
Copy link
Collaborator

tessus commented Apr 10, 2020

I don't see this is a bug. If I had implemeted this, I would have done it the same way. It's a counter for the source text (in the editor), not for the rendered text. Or do you want to count images and references and whatnot too?

If we really want to make this available, we'd need two columns (source, rendered). IMO, removing one for the other is not ok.

@xplosionmind
Copy link
Author

Sorry, I interpreted it as a bug because I thought counting the source text isn't of much use. I think having to columns would be perfect. Thanks a lot for your interest and help, anyway!

@tessus
Copy link
Collaborator

tessus commented Apr 10, 2020

Laurent has the last say in what is a bug or not, so maybe I'm wrong. I'm just saying that a second column would make the most sense.

@coderrsid
Copy link
Contributor

Yeah, tessus i agree. A second column for all counts of rendered would be better, because it's implemented for word count whether after render or before render, it shouldn't matter i guess.

@RedDocMD
Copy link
Contributor

I am currently working on this enhancement. By what I understand from the comments above, it is desired that the word counting feature shows side by side the counts in both the source and the stripped down Markdown.

@RedDocMD
Copy link
Contributor

@tessus , @laurent22 There is a library called remove-markdown which does exactly as its name suggests. Link to npmjs
I intend to use it to strip out the markdown and make the stripped out text for the additional column.
It seems to be a pretty small library. So is it okay if I use it?

@taw00
Copy link
Contributor

taw00 commented Apr 21, 2020

I like the idea of two columns. One for the source text and one for the rendered text.

Personally, I think 9 times out of 10, people want the numbers to reflect the rendered text. But, others would prefer it to represent the source (the editor text). For similar reasons, number of lines is almost meaningless except for in the editor context—it's the number of hard new lines in the file -- w/ rendered text, it's not calculable, for same reasons number of pages is incalculable—but then again, it could be the soft newlines as well. Oi!

The problems with columns: There should also be a column for Selected text (see #160 (comment)). So now you are looking at three maybe four columns: Editor, Editor Selected, Rendered, Rendered Selected. Maybe "Rendered Selected" makes no sense.

I suppose you could have a tabbed document statistics window. One tab is for the editor text and one for the rendered document.

On the whole ... these stats are inexact anyway. But they are helpful. I suppose the more markdown you have the less sensible the stats starts to become. Unless of course, it only matters to you in the context of the editor.

laurent22 added a commit that referenced this issue May 15, 2020
… and HTML tags (#3037)

* Updated commit

* Update package.json

* Update package.json

Co-authored-by: Laurent Cozic <laurent22@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests and code enhancements high High priority issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants