-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reading and writing of HTML (compatible) to a simple ADT format. #1680
Conversation
jurgenvinju
commented
Oct 18, 2022
- first version, not finished
- HTML parser produces AST nodes now instead of node
- perfecting the HTML reader
- fixed static error in m3 Core
- added yielding of HTMLElement as XHTML output, both string and file
- set copyright year
- fixed some more documentation
This is a simple design made to facilitate demonstrating the use of HTML and JS with Rascal:
|
Codecov Report
@@ Coverage Diff @@
## main #1680 +/- ##
=======================================
Coverage 48% 48%
- Complexity 5953 5967 +14
=======================================
Files 672 672
Lines 58463 58508 +45
Branches 8514 8522 +8
=======================================
+ Hits 28355 28411 +56
+ Misses 27961 27942 -19
- Partials 2147 2155 +8
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
yes, escaping works:
|
With this in the library the tutor can be used to demonstrate visuals without an apriori dependency on salix, also basic visuals like tables for relations can be part of the standard library. This should enhance the usability of the "vanilla" Rascal repl quite a bit, and achieve the kind of user interactions we used to have in Eclipse. For now this initial version seems finished and I'm ready to merge. |
How does this relate to And maybe similarly, can you explain the relation to Also, we need tests. (I think before adding new libraries, we need to try and add a tests). |
@@ -117,7 +117,7 @@ M3 createM3FromDirectory(loc project, bool errorRecovery = false, bool includeJa | |||
registerProject(project, result); | |||
|
|||
if (includeJarModels) { | |||
results = composeJavaM3(project, {result, *{createM3FromJar(j, classPaths) | j <- classPaths}}); | |||
results = composeJavaM3(project, {result, *{createM3FromJar(j, classPath=classPaths) | j <- classPaths}}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, this is not related to this branch right? just accidental changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. My bad. Saw it and couldn't leave it there to rot but also didn't have the energy to checkout another branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, we've all been there.
@@ -428,13 +425,17 @@ str nodeToString(str n, set[HTML5Attr] attrs, list[value] kids) { | |||
return s; | |||
} | |||
|
|||
public HTML5Node example | |||
private HTML5Node example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this some kind of inline documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know yet. Have to ask Tijs. I think it's test code
@@ -87,29 +209,66 @@ private void storeAttributes(MutableAttributeSet set) { | |||
for (Enumeration<?> names = set.getAttributeNames(); names.hasMoreElements(); ) { | |||
Object label = names.nextElement(); | |||
Object value = set.getAttribute(label); | |||
attributes.peek().put(label.toString(), factory.string(value.toString())); | |||
if (!"_implied_".equals(label)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some comments on what is happening here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The editor kit puts implied false or true on every node but that's a leaky internal implementation detail not a DOM aspect imho. We can use it to filter implied nodes but I didn't see a need for that. The implied nodes complete the dom semantically. Like a normal form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the "editor kit"?
I meant, this magic if should have an inline comment to explain the reasoning. For future you ;)
/** | ||
* Produces a HTML string output from an HTMLElement AST. | ||
* | ||
* Why go through all the trouble of building a DOM? The only reason is compliance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does introduce scaling issues, so we shouldn't be generating big HTML trees :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Big html trees are perfectly fine with this DOM tree. Building them is linear time and space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
memory consumption. This is a thing with large xml & json files, but I'm guessing nobody is gonna build such a big html.
* Escapes, encodings, etc. all are maintained by these classes from org.w3c.dom. | ||
*/ | ||
public IString writeHTMLString(IConstructor cons) { | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can maybe share more code with the writeHTMLFile
call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thanks for the suggestion. Especially since this code will become twice as long when we add the right XHTML dtd and wrap internal nodes with proper header and body.
public IValue readHTMLString(IString string) { | ||
if (string.length() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should really think about adding isEmpty
to vallang IString, especially since length is not always a o(1) operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it's log(n) max with the current balanced tree implementation, so you'd have to call it many times on a deep tree before you would notice any improvement. It's doable though to add isEmpty with a default false implementation in the interface and only return true for the maximally shared empty constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also length is cached in internal nodes I believe. So it's probably even in O(1) already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a default method should be: return length() == 0; and then subclasses can override it with true or false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's Have a look at IStringImplementation first. It's a private interface used by all implementation classes of IString, all of them but one are non-empty I believe. And that one is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just saying the default interface method should work for future overloads as well.
It's a different way of representing HTML that has its own benefits and pitfalls. What I didn't like is that it's match patterns are different from its constructor patterns. This AST version does have that. Also this AST version imitates the way we model programming languages and DSLs so I thought that more appropriate for examples and demo's. Finally the unparser including all of the intricate escaping rules for XHTML is reused here which is probably for the best. |
This pure data AST module contains only the data declarations that the IO module uses when reading and writing the HTML streams.
Yes! Very much agreed. Working on that already, and also nice documentation that also functions as tests. I've a scraper from the CBS cite and planning a static site generator for the other direction. Tutor will be extended with browser screenshots for the testing and the visual docs. Might have to merge this one though before I can write the tests properly with the tutor. Now that it's a separate project I'm running into cycles now and then. |
do we need both? can we not cleanup? since they appear very similar from a user perspective. |
…of less complexity. factored code clones
…ess test roundtripping with those libraries
There is still some options to make this faster and even better tested, but for now I'm done and I want to use it and document it with the Tutor. |
pom.xml
Outdated
<artifactId>jdom2</artifactId> | ||
<version>2.0.6</version> | ||
<artifactId>jdom</artifactId> | ||
<version>2.0.2</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why go backwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
accident! on grand central the example was 2.0.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jdom2!! not jdom. that's why the version was off too
try (OutputStream out = URIResolverRegistry.getInstance().getOutputStream(file, false)) { | ||
Document doc = createHTMLDocument(cons); | ||
doc = doc.outputSettings(createOutputSettings(charset.getValue(), escapeMode.getName(), outline.getValue(), prettyPrint.getValue(), indentAmount.intValue(), maxPaddingWidth.intValue(), syntax.getName())); | ||
out.write(doc.outerHtml().getBytes("UTF-8")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we ask the registry for a Writer or something? so that we don't have to do .getBytes
on the string?
* Escapes, encodings, etc. all are maintained by these classes from org.jdom. | ||
*/ | ||
public IString writeHTMLString(IConstructor cons, IString charset, IConstructor escapeMode, IBool outline, IBool prettyPrint, IInteger indentAmount, IInteger maxPaddingWidth, IConstructor syntax ) { | ||
try (StringWriter out = new StringWriter()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like out
isn't used?