-
Notifications
You must be signed in to change notification settings - Fork 435
Description
I have a folder of Markdown documents that I translated to HTML using Pandoc 1.14.0.4. The command used to translate from Markdown to HTML is:
pandoc -t html5 -s --no-highlight --no-wrap MY_MARKDOWN.md -o MY_HTML.html
The <head> section from one such HTML file is as follows:
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<title></title>
<style type="text/css">code{white-space: pre;}</style>
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
Notice the presence of the empty <title> element.
I wrote a small Ruby script to process the HTML files so that I could paste them into my content management system. The last line of my Ruby script is:
%x(tidy -m -i -w 0 -utf8 #{File.join(dir, "*.html")})
This runs Tidy on all of the HTML files in the directory dir. When I run this script on a folder of HTML files, the output from Tidy is:
line 1 column 1 - Warning: inserting missing 'title' element
Info: Document content looks like HTML5
1 warning, 0 errors were found!
Info: Document content looks like HTML5
No warnings or errors were found.
line 1 column 1 - Warning: inserting missing 'title' element
Info: Document content looks like HTML5
1 warning, 0 errors were found!
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
line 1 column 1 - Warning: inserting missing 'title' element
Info: Document content looks like HTML5
1 warning, 0 errors were found!
line 1 column 1 - Warning: inserting missing 'title' element
Info: Document content looks like HTML5
1 warning, 0 errors were found!
line 1 column 1 - Warning: inserting missing 'title' element
Info: Document content looks like HTML5
1 warning, 0 errors were found!
line 1 column 1 - Warning: inserting missing 'title' element
Info: Document content looks like HTML5
1 warning, 0 errors were found!
line 1 column 1 - Warning: inserting missing 'title' element
Info: Document content looks like HTML5
1 warning, 0 errors were found!
line 1 column 1 - Warning: inserting missing 'title' element
Info: Document content looks like HTML5
1 warning, 0 errors were found!
Info: Document content looks like HTML5
No warnings or errors were found.
line 1 column 1 - Warning: inserting missing 'title' element
Info: Document content looks like HTML5
1 warning, 0 errors were found!
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
line 1 column 1 - Warning: inserting missing 'title' element
Info: Document content looks like HTML5
1 warning, 0 errors were found!
About HTML Tidy: https://github.com/htacg/tidy-html5
Bug reports and comments: https://github.com/htacg/tidy-html5/issues
Or send questions and comments to: https://lists.w3.org/Archives/Public/public-htacg/
Latest HTML specification: http://dev.w3.org/html5/spec-author-view/
Validate your HTML documents: http://validator.w3.org/nu/
Lobby your company to join the W3C: http://www.w3.org/Consortium
Note the warnings about missing "title" elements, even though they all have "title" elements present (albeit blank).
On the files with the warning about the missing "title" element, Tidy does not actually process the file. I can tell because the indentation is not changed, and the following <meta> tag does not get added:
<meta name="generator" content="HTML Tidy for HTML5 for Mac OS X version 5.1.2">
However, when I make a new folder with Pandoc-generated HTML files and re-run the Ruby script except commenting out the %x(tidy -m -i -w 0 -utf8 #{File.join(dir, "*.html")}) portion, and then in the shell running
$ cd /path/to/my/Pandoc-generated/HTML/files/on/which/I/ran/my/Ruby/script
$ tidy -m -i -w 0 -utf8 *.html
I get the expected output of no warnings or errors on any of the files:
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
Info: Document content looks like HTML5
No warnings or errors were found.
About HTML Tidy: https://github.com/htacg/tidy-html5
Bug reports and comments: https://github.com/htacg/tidy-html5/issues
Or send questions and comments to: https://lists.w3.org/Archives/Public/public-htacg/
Latest HTML specification: http://dev.w3.org/html5/spec-author-view/
Validate your HTML documents: http://validator.w3.org/nu/
Lobby your company to join the W3C: http://www.w3.org/Consortium
and Tidy has run on all of the files, which I can verify by looking at the indentation and seeing the addition of the Tidy <meta> tag.
As far as I can tell, the same files always fail to be processed by Tidy when run from the Ruby script. I checked, and all of the <head> sections have empty <title> elements in them before running the Ruby script. I don't know what the difference is between running Tidy from my Ruby script and running it directly in the shell so I am stuck.