Skip to content

[WIP] Markdown reader: extension to insert source code position attributes #4659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ehildenb
Copy link

@ehildenb ehildenb commented May 17, 2018

Optionally inserts source-code position attributes for CodeBlock in Markdown
sources. The extension is Ext_code_block_source_position.

This reads the parser position (SourcePos name line col) at the beginning of
parsing a CodeBlock and adds the attributes
[("sourceName", name), ("sourceLine", line), ("sourceColumn", col)] if the
extension is turned on.

Addressed #4657

WIP Part

Currently, the sourceName is returning nothing useful (just the word source). It would be nice if it actually returned the name of the source file for generating things like line directives for files generated from multiple literate sources: https://gcc.gnu.org/onlinedocs/cpp/Line-Control.html

Any ideas @jgm? I could either (1) omit the sourceName attribute, or (2) figure out what the original file name is (using stdin or - for standard input). I would prefer (1)(2) for the above reason.

Optionally inserts source-code position attributes for `CodeBlock` in Markdown
sources. The extension is `Ext_code_block_source_position`.

This reads the parser position (`SourcePos name line col`) at the beginning of
parsing a `CodeBlock` and adds the attributes
`[("sourceName", name), ("sourceLine", line), ("sourceColumn", col)]` if the
extension is turned on.
@jgm
Copy link
Owner

jgm commented May 17, 2018

The way pandoc currently works, we concatenate input from all source files and pass it to the parser. The parser readMarkdown doesn't know where its input comes from -- it could be from stdin, or from one file, or from many. So it's hard to give any useful source name.

One way to change this would be to insert special directives in the input stream that cause the markdown parser to reset source name and position information. That would require a change in the markdown reader and in Text.Pandoc.App.

I wonder if it would make sense to have a single sourcepos attribute whose values look like: SOURCENAME:STARTLINE:STARTCOL-ENDLINE:ENDCOL.

@ehildenb
Copy link
Author

I'm not a huge fan of inserting source directives which may make the parser more fragile. I think I would prefer a structured representation of it, especially since we have all that information in hand at some point, but I guess that would require much more extensive surgery.

I like the single sourcepos attribute, though I'm a bit worried about losing the structure of the information. I think I would prefer something like STARTLINE:STARTCOL:ENDLINE:ENDCOL so it's easier to say "split on colons" in a Lua filter.

Should I:

  1. Investigate parsing files individually?
  2. Call it good without having SOURCENAME for the moment?

This was referenced May 17, 2018
@gpoore
Copy link

gpoore commented May 18, 2018

Would this same approach with getPosition allow adding position attributes to inline code? Code block position attributes would give me everything I would need in most cases, but having position attributes for all types of code would be very convenient in some situations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants