-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Txt2tags writer #6551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Txt2tags writer #6551
Conversation
adding txt2tags writer
src/Text/Pandoc/Writers.hs
Outdated
@@ -67,6 +67,7 @@ module Text.Pandoc.Writers | |||
, writeTEI | |||
, writeTexinfo | |||
, writeTextile | |||
, writeTxt2Tags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should line up with the other entries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that's strange because I got some compilation warnings into other places, which I fixed, and I believed it was clean when I've submitted the PR. I'll check this soon.
Thank you! This seems a good start, but it still needs some work before it's merge-ready. At the very least, we need automated tests. The recommended (and easiest) approach would be to add to Another thing you might try is doing round-trip tests with the t2t reader. This will help identify bugs and missing things. |
added test case
Thanks @farvardin, I think it would be great if we could get this moving again. I see you already added an "old-style" golden file test, but no code to use it. Are you still interested in advancing this? Please let us know how we can help. I had quick read through the writer source, I think it looks good. Would you be interested in an in-depth review, or do you rather just want to get it merged soon? |
@tarleb if you could help me how to improve it, it would be great. I don't really understand the test-thing. |
Gladly. You can enable the tests by adding the appropriate commands to It would also be great if you could take a look at the doclayout package. It is used in plain text writers to avoid overlong lines. If I understand correctly, then txt2tags supports breaking long paragraphs into shorter lines by inserting newline characters. This is what doclayout is designed to do, see also pandoc's LMK if you run into problems. |
@tarleb Thank you, I understand better the test process now. I've added the txt2tags writer to the file and it seems to work now. About the Lint error the missing file is in my pull request (I've never used lint or Haskell before and I'm not really a developper) and therefore I don't know why it's complaining. I've had a look at doclayout but even with the example I can't figure out what I could do with this, sorry. I've seen all the writers are importing it, but I don't even find where they really use it in the code. |
The linting issue is caused by a missing entry in pandoc.cabal. All test files must be listed under The |
@tarleb thanks again for the explanation. I should have figured it out myself by looking for the reference to "writer.rtf" for example, since it can only be found in pandoc.cabal. lint is no longer complaining, so I suppose it could be merged officially! I've just converted an epub file into txt2tags, using the command "pandoc -f epub -t txt2tags doc.epub > doc.t2t" and it works fine! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are good news, thanks! I went through the code and it looks good; I made inline comments whenever I was unsure about something.
I'm not sure if @jgm considers it essential, but it would be best if we could switch the result type of blockToTxt2Tags
from Text
to Doc Text
so pandoc's --columns
option would work better with this writer. Do you have time to give it a try?
src/Text/Pandoc/Writers/Txt2Tags.hs
Outdated
"javascript", "latex", "lisp", "lua", "matlab", "mirc", "mpasm", "mysql", "nsis", "objc", | ||
"ocaml", "ocaml-brief", "oobas", "oracle8", "pascal", "perl", "php", "php-brief", "plsql", | ||
"python", "qbasic", "rails", "reg", "robots", "ruby", "sas", "scheme", "sdlbasic", | ||
"smalltalk", "smarty", "sql", "tcl", "", "thinbasic", "tsql", "vb", "vbnet", "vhdl", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The empty string in this list strikes me as weird. Is it intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I'm pretty sure I've just copied this part from the DokuWiki.hs without modification. I'm not sure how this part should work. The new code in DokuWiki is quite different now:
I'll remove the empty string, though I'm sure most of this part could be removed. But if I change it myself, I might break it!
|
||
inlineToTxt2Tags _ il@(RawInline f str) | ||
| f == Format "Txt2Tags" = return str | ||
| f == Format "html" = return $ "<html>" <> str <> "</html>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most other <html>
tags are uppercase, but here they are lower case. Is there a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose there is no difference. Again, this part is straight from the DokuWiki.hs file:
Those are good news, thanks! I went through the code and it looks good; I made inline comments whenever I was unsure about something.
I'm not sure if @jgm considers it essential, but it would be best if we could switch the result type of
blockToTxt2Tags
fromText
toDoc Text
so pandoc's--columns
option would work better with this writer. Do you have time to give it a try?
I don't know how to do this, sorry. I've just hacked the existing dokuwiki code so it's outputting valid txt2tags markup (most of the difference is with the headings)
case txt of | ||
[Str s] | "mailto:" `T.isPrefixOf` src -> return $ "<" <> s <> ">" | ||
| escapeURI s == src -> return src | ||
_ -> if isURI src |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this if
statement necessary? I'm not aware of a string with a leading /
that would be recognized as valid URI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, stUseTags = False | ||
, stBackSlashLB = False } | ||
|
||
type Txt2Tags m = ReaderT WriterEnvironment (StateT WriterState m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we don't need the WriterState
, so we could shorten this to ReaderT WriterEnvironment m
. The function runTxt2Tags
would then become runTxt2Tags = flip runReaderT def
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main stumbling block for us will be maintenance. We have to ensure that this will be maintained after it's merged. Not sure how to proceed there, I'm not too keen to up to take up an additional responsibility. @jgm, what do you think?
the txt2tags language is not evolving, for 2 reasons: it's stable for many years, and also it's possible to add custom markup with some preproc and postprocessors, so the core doesn't need changing. So the only maintenance would be to adapt the writer code in the case pandoc change something in its internal code. But it would require the same maintenance as for the dokuwiki writer.
Oh, I forgot: I think we should also add a |
done |
I think the main stumbling block for us will be maintenance. We have to ensure that this will be maintained after it's merged. Not sure how to proceed there, I'm not too keen to up to take up an additional responsibility. @jgm, what do you think? |
Have you considered putting this through its paces with round trip tests? Pandoc doesn't generally offer a round-tripping guarantee, so we usually don't put things like this in the test suite, but doing this is a good way to identify bugs. I'd also echo @tarleb's comment that doclayout should be used. It's not used in some of the older writers, and in formats where newlines cause paragraph breaks, but I'd rather have it supported in new writers. This is a matter of
Anyway, it's good to know that you don't need to know Haskell to contribute to pandoc. |
I've done it, and it's rather faithful (I've written in the header of the writer which parts are not correctly working, but it's not very important, like definition lists) while testing your command, I've noticed the writer is using the "txt2tags" tag, while the reader is using "t2t". I don't know which one is better.
about doclayout it's totally beyond my skills, I'm sorry. |
What are you using for |
Can you try with |
As for the name, we'd better stick with |
fixed hr lines
it was a sample document from the txt2tags website (https://txt2tags.org/sample.html)
I still get several differences because pandoc creates id and adds some < p > markup in the original export (or maybe I've missed an option to avoid this) but yet it's still acceptable as result.
I suppose it would be more logical to rename the reader part to txt2tags, because markdown and dokuwiki are written respectively "markdown" and "dokuwiki", but I understand if you prefer not breaking the previous behavior. So I've just changed all referencies of txt2tags to "t2t" in the writer part option, which will be a new addition, so it wouldn't confuse people. |
@@ -20,8 +20,7 @@ Txt2Tags: <https://www.Txt2Tags.org/> | |||
- Definition lists are broken | |||
- Tables could be improved | |||
- Some formats make better results than others (html is ok, md is somehow broken) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember, it was 6 months ago. I suppose when writing from markdown, some elements were not correctly rendered
src/Text/Pandoc/Writers/Txt2Tags.hs
Outdated
@@ -20,8 +20,7 @@ Txt2Tags: <https://www.Txt2Tags.org/> | |||
- Definition lists are broken | |||
- Tables could be improved | |||
- Some formats make better results than others (html is ok, md is somehow broken) | |||
- code related to dokuwiki only could be removed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there still dokuwiki-specific code? IF so, let's remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose there are still some code, yes
Yes, I agree, |
$pagetitle$ | ||
$author-meta$ | ||
$date-meta$ | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why you're getting all that blank space at the beginning. Do you want the 3 blank lines if there's a title? If there's no title block, would it be better to omit them? You can use conditionals like
$if(pagetitle)$
$pagetitle$
$author-meta$
$date-meta$
$endif$
I ran writer.t2t through txt2tags, and it appears that some of the syntax is invalid. (e.g. the |
I've removed the html part, and now the test no longer works. There are some parts I'll never be able to fix (def lists), so I suppose I'll give up then, I'll use the generated binary for myself. There are fewer and fewer people who are still using txt2tags today, so it's not a big deal :( |
You can update the tests, no? |
Note: dokuwiki falls back to HTML for tables and lists whose contents are too complex to represent using the native syntax. I don't know if t2t offers any fallback possibility like that, so there's a question what to do if some of the things pandoc can represent can't be renedred in t2t. We could simply skip them and issue a warning, I suppose. |
I don't know, I've updated the code, and thought I've also updated the test, but I still get:
so I don't know what is wrong. |
it's still possible to start a line with % which means it's a comment. It won't be interpreted, but user could see it and check and correct it. |
|
hello,
I've made a txt2tags writer, adapted from Dokuwiki writer. It works rather ok but will be improved. Could you integrate this in pandoc please?
thank you