This is a (Ruby) module to add literate programming support to Asciidoctor.
In short, literate programming is an approach to writing software and its documentation by prioritizing human language. A literate programming “source” is composed by text in some documentation system (in this case, Asciidoctor’s flavor of AsciiDoc) that describes the logic, with interspersed code snippets whose union constitutes the source code as it will be passed to the compiler.
The process of creating the code from the snippets is known as Tangling, while the extraction of the documentation is known as weaving. The literate programming support we’re introducing in Asciidoctor with this module does not require a separate weaving step, since the source document is assumed to be a valid Asciidoctor document, and should therefore be processable as-is from the standard compiler, even without special modules. In our case, we call Weaving the process that enhances the document processing by improving the appearance and functionality of the chunk references.
(In fact, this README.adoc
file is itself a literate-programming source for the module itself.)
There are previous effort to introduce literate programming features in the AsciiDoc format, including eWEB, nowasp and the Model Realization Tools' aweb (not to be confused with the Ada-centered one). So why do we need another one?
- Integration
-
we hook directly into Asciidoctor, allowing single-pass processing of a document to produce both the documentation and the source; this also allows us to include proper cross-referencing and indexing for the code chunks at the documentation output level;
- More features
-
most importantly, support the creation of multiple files from a single document, a feature that is missing from some of the existing tools; other features include improved navigation between chunks in the woven documentation, and the option to create a DOT graph of the chunk structure.
- Syntax compatibility
-
the existing tools have slightly different syntax; the obvious solution to this is to introduce a new, incompatible syntax, but the actual plan is to also support the syntax from the other tools.
The syntax we currently support is a small extension to the aweb
syntax.
Similarly to noweb
, chunks in aweb
are defined by an introductory line in the form
<<Chunk title>>=
, and are referenced by using the chunk title between double angle brackets:
<<Chunk title>>
.
It’s interesting to note how both AsciiDoc and noweb
/aweb
use the <<...>>
syntax for references.
In contrast to noweb
, aweb
relies on AsciiDoc syntax to separate chunk definitions from documentation, and it does not support inline chunk references.
In particular, this means that a source line is either a chunk reference (optionally surrounded by whitespace),
or a text line (to be taken verbatim), and that an at symbol (@
) at the beginning of the line has no special meaning.
The downside of this simplified syntax, aside from the restriction about chunk reference usage, is that there is no markup
to indicate definition and usage of symbols. The upside is that the aweb
syntax virtually eliminates the need for escaping
chunk references or @
symbols.
(It is also impossible to have an actual code line that begins with <<
and ends with >>
, so if your language needs those,
you’re (currently) out of luck.)
Another minor difference between the two syntaxes is that in the aweb
syntax chunks with the same name are automatically concatenated,
so there is no need for the <<Chunk title>>+=
notation.
This module supports the standard (“legacy”) aweb
syntax (with the caveats below).
In addition, we interpret every source
block as the definition of a (single) chunk,
using the block’s own title as the title of the chunk.
i
|
while we do support the “legacy” aweb syntax, the output is not guaranteed to match atangle 's output.
We output line directives more aggressively, and the behavior with empty definitions is slightly different.
|
❗︎
|
the root chunks auto-detection mechanism we employ with the “legacy” aweb syntax is quite aggressive,
and may be subject to changes in the future.
|
The module is implemented as an Asciidoctor extension.
Since we are mainly interested in producing secondary outputs,
the core of the module will be a TreeProcessor
,
that traverses the document tree to gather all blocks that define chunks to be output to the secondary files.
The processor needs to track some state,
so it needs to override the default constructor (initialize
method) to set things up properly.
Asciidoctor’s processors take a configuration Hash
on construction,
so we follow the convention, even though we do not (at present) make use of any configuration,
and remember to call the superclass' constructor
(otherwise, the extension won’t work properly).
class LiterateProgrammingTreeProcessor < Asciidoctor::Extensions::TreeProcessor
include Asciidoctor::Logging
<<Plugin version>>
def initialize config = {}
super config
<<Declare and initialize variables needed by the processor>>
end
<<Support methods>>
<<Tangling methods>>
<<Weaving methods>>
<<Processing methods>>
end
Of course, we need to require the asciidoctor/extensions
Ruby module to have the Asciidoctor::Extensions::TreeProcessor
class available:
require 'asciidoctor/extensions'
And the plugin version corresponds to the document one:
VERSION = '2.3'
The Weaving process will introduce cross-referencing between the chunks
as well as navigation links between blocks contributing to the same chunk.
We want to be able to provide default styling for these links, which we can do using a
Docinfo
processor that will insert the needed CSS in the document head.
class LiterateProgrammingDocinfoProcessor < Asciidoctor::Extensions::DocinfoProcessor
<<Plugin version>>
use_dsl
at_location :head
def process doc
%(<style>
<<Styling for woven links>>
</style>)
end
end
Each chunk is identified by a title, and the corresponding source code may be split across multiple blocks. The (final) content of a chunk is obtained by the concatenation of all the blocks with the same title.
The title of the chunk is used as a handle, that can be referenced by other chunks to declare that the content of the referenced chunk should be inlined in the referencing chunk (this inlining process is known as Tangling). A special kind of chunk is the root chunk, that is not referenced by any other chunk and represents the starting point for the tangling process. We support the creation of multiple files from the same source, so we can have multiple root chunks, and we use the chunk title to represents the name of the file to be created by each root chunk.
The natural data structure to store chunks (be them generic or root chunks) is a Hash
that maps the title (a String
) to the content (an Array
).
For the processor we need to declare two such hashes:
@chunks
will hold the generic code chunks, while @roots
will hold root chunks.
Since the source code associated with a generic chunk can be spread out over multiple blocks,
we define a default value constructor for @chunks
: this will simplify the
process of appending new lines to a value each time we come across a new block.
The root chunk is assumed to be unique per output file (i.e. per title), but we still provide the same default value constructor, since this will allow us to handle the extraction in the same way for both types. Uniqueness of root chunks will be handled explicitly during block processing.
@roots = Hash.new { |hash, key| hash[key] = [] }
@chunks = Hash.new { |hash, key| hash[key] = [] }
Chunk titles can be nearly arbitrary strings,
but are conventionally a natural language (synthetic) descriptions of the chunk intended use.
As this can get on the longish side, and typing them multiple times can be time-consuming and error-prone,
additional uses of the same title can be shortened to any unambiguous prefix followed by an ellipsis of three literal dots (…
).
For example, a chunk may be titled Automagical creation of bug-free code
,
and this may be shortened to Automagic…
if there are no other chunks whose title begins with Automagic
.
We do require that the first time a chunk title is encountered (be it to define it or as a reference in another chunk) it must be written out in full. Moreover, since the trailing ellipsis is taken to be a shorthand notation, a chunk title cannot naturally end with it.
To assist in the handling of shortened chunk titles, we keep track of all the (full) titles we’ve come across so far:
@chunk_names = Set.new
and we provide a support method that will take a (possibly shortened) chunk title and return the full title, raising an exception if we do not find one (and only one) chunk title starting with the given prefix:
def full_title string
pfx = string.chomp("...")
# nothing to do if title was not shortened
return string if string == pfx
hits = @chunk_names.find_all { |s| s.start_with? pfx }
raise ArgumentError, "No chunk #{string}" if hits.length == 0
raise ArgumentError, "Chunk title #{string} is not unique" if hits.length > 1
hits.first
end
The chunk content is stored as an Array
whose elements are either
String
s (the actual chunk lines),
Asciidoctor::Reader::Cursor
s,
an Asciidoctor-provided structure that carries information about the origin
(file and line number) of the blocks,
or Hash
es (the attributes of the block that originated this component).
Since, as we mentioned, a chunk may span multiple blocks,
we can easily track information about the origin of each of the component blocks
by storing the corresponding Cursor
before the corresponding lines,
as detailed in the Collecting chunks section.
We also track separately which chunks are referred to by which other chunks (and in which block) to be able to provide a relationship graph if requested.
@chunk_backrefs = Hash.new { |hash, key| hash[key] = [] }
Updates to @chunk_backrefs
are abstracted by the add_chunk_ref
function:
def add_chunk_ref includer, includer_block_id, included
@chunk_backrefs[included].push [includer, includer_block_id]
end
The origin information for a block can be used to add appropriate metadata to the output files.
The format with which this information is output is set by the litprog-line-template
document attribute,
a string where the %{line}
and %{file}
keywords will be replaced by the source line number and file name, respectively.
As an example, for languages that do not have built-in support for a line directive,
a vim-friendly solution for code navigation would be:
:litprog-line-template: # %{file}:%{line}
The default value for this template produces a C-style #line
directive:
doc.set_attr 'litprog-line-template', '#line %{line} "%{file}"', false
Syntax-specific line templates can be specified through a template litprog-line-template-_lang_
where lang
is the language name as it would be used to specify the syntax highlighting language of a source block.
The module comes with a specialization for CSS:
doc.set_attr 'litprog-line-template-css', '/* %{file}:%{line} */', false
In the tree processor,
the templates used to print the line information are stored in the member variable @line_directive_template
,
a hash mapping the language to the template.
During Tangling, line directives may change based on the language of the chunk block being output,
so we keep track of active directives in the @active_line_directive_template
stack:
@line_directive_template = { }
@active_line_directive_template = []
These variables are initialized at the beginning of the tangling phase,
with the special key _
used for the default template.
@line_directive_template['_'] = doc.attr('litprog-line-template').dup
doc.attributes.each do |key, value|
lang = key.dup
if lang.delete_prefix! 'litprog-line-template-'
@line_directive_template[lang] = value unless lang.empty?
end
end
@active_line_directive_template.push @line_directive_template['_']
The actual output of the line directive is encapsulated in the output_line_directive
method:
def output_line_directive file, fname, lineno
template = @active_line_directive_template.last
file.puts( template % { line: lineno, file: fname}) unless template.nil_or_empty?
end
Tangling is the process of “stitching together” all the code blocks, recursively following the referenced chunks starting from the root chunk, for each file.
References to other chunks are identified by a chunk title written between double angle brackets
(e.g. <<(Possibly shortened) chunk title>>
)
on a line of its own, optionally surrounded by whitespace.
When processing chunks line by line, we may want to check if a particular line is a chunk reference,
and if so we’ll want the full name of the chunk, as well as any indenting that precedes the reference:
def is_chunk_ref line
if line.match /^(\s*)<<(.*)>>\s*$/
return full_title($2), $1
else
return false
end
end
The recursive tangling of chunks is achieved by starting at the root chunk, outputting any line that is not a reference to another chunk, and recursively calling the function any time a reference is encountered.
The state we need to keep track of during the recursion is composed of:
- the output stream
-
to which we are writing the lines,
- the title of the chunk being processed
-
to detect circular references and produce meaningful error messages,
- the current indent
-
added to all lines being output,
- the contents of the chunk being processed
-
this could be obtained knowing the chunk name and the chunk type, but by passing the chunk contents itself we can simplify the logic of the method,
- the names of the chunks we’re in the middle of processing
-
this is a
Set
to which chunk names are added when entering the method and removed on exit, and it is used to detect circular references.
As mentioned in Chunk contents and metadata, the chunk
is an Array
whose elements are either
String
s (the actual chunk lines),
Hash
es of attributes, or
Asciidoctor::Reader::Cursor
s (that provide source line information).
We handle the three cases separately, and raise an appropriate exception if we come across something unexpected.
We return the number of time the active line directive template was pushed, so that it can be popped as many times by the caller.
def recursive_tangle file, chunk_name, indent, chunk, stack
stack.add chunk_name
fname = ''
lineno = 0
line_directive_template_push = 0
chunk.each do |line|
case line
<<Hash case>>
<<Cursor case>>
<<String case>>
else
raise TypeError, "Unknown chunk element #{line.inspect}"
end
end
stack.delete chunk_name
return line_directive_template_push
end
In the Hash
case, we only care about finding the source language of the block,
if defined, to set the @active_line_directive_template
appropriately:
when Hash
lang = line.fetch('language', '_')
lang = '_' unless @line_directive_template.key? lang
@active_line_directive_template.push @line_directive_template[lang]
line_directive_template_push += 1
A Cursor
always precedes the content lines it refers to.
We use it to update the filename (fname
) and line number (lineno
) information,
and we output a line directive, since the upcoming text lines will have a different origin
compared to what has been output so far:
when Asciidoctor::Reader::Cursor
fname = line.file
lineno = line.lineno + 1
output_line_directive file, fname, lineno
If the chunk element we’re processing is a String
, this can be either
a reference to another chunk, or an actual content line. In both cases,
we update the current origin line number lineno
, so that the
origin information is correct if we need to output a new line directive.
If the line is not a reference, we just output it as-is, preserving indent, except for empty strings, in which case the indent is not added.
when String
lineno += 1
ref, new_indent = is_chunk_ref line
if ref
<<Reference case>>
else
file.puts line.empty? ? line : indent + line
end
In the reference case, we check for circular references or references to undefined chunks (raising appropriate exceptions), and then recurse into the referenced chunk. After returning from the referenced chunk, we output a new line directive, so that subsequent lines from the current chunk have correct origin information metadata. If the line directive template was change in the recursion, we pop it after outputting the new line, under the assumption that the language change will not be in effect until the next actual line of output.
i
|
The rationale for this is that language changes happen in embedded language context, with the fences delimiting the new language part of the block in the original language. An example of this is the CSS embedded by the Docinfo Processor of this module. |
# must not be in the stack
raise RuntimeError, "Recursive reference to #{ref} from #{chunk_name}" if stack.include? ref
# must be defined
raise ArgumentError, "Found reference to undefined chunk #{ref}" unless @chunks.has_key? ref
# recurse and get line directive stack growth
to_pop = recursive_tangle file, ref, indent + new_indent, @chunks[ref], stack
output_line_directive file, fname, lineno
# pop line directive stack
@active_line_directive_template.pop to_pop
The recursive tangling process must be repeated for each root chunk defined by the document.
Each root chunk will use the root name as output file name,
unless overridden.
The special root chunk name *
will indicate that the chunks have to be streamed to the standard output.
def tangle doc
<<Set line directive>>
<<Prepare output directory>>
<<Root name map creation>>
@roots.each do |name, initial_chunk|
<<Remap file name if requested>>
if name == '*'
to_pop = recursive_tangle STDOUT, name, '', initial_chunk, Set[]
@active_line_directive_template.pop to_pop
else
<<Convert name to full_path>>
File.open(full_path, 'w') do |f|
to_pop = recursive_tangle f, name, '', initial_chunk, Set[]
@active_line_directive_template.pop to_pop
end
end
end
end
We allow users to specify where the output files should be placed by overriding
the litprog-outdir
document attribute.
If set, this must be a path relative to the docdir
.
If unset, the docdir
will be used directly.
The output directory is created if not present (and if different from the docdir
).
docdir = doc.attributes['docdir']
outdir = doc.attributes['litprog-outdir']
if outdir and not outdir.empty?
outdir = File.join(docdir, outdir)
FileUtils.mkdir_p outdir
else
outdir = docdir
end
Accessing FileUtils
introduces a new requirement:
require 'fileutils'
When tangling a new file, the name provided by the user is considered relative to the (literate programming) output directory:
full_path = File.join(outdir, name)
Root chunk names are used as output file names by default,
but this behavior can be overridden on a name-by-name case
by setting the litprog-file-map
document attribute.
If not empty, this is a colon-separated list of entries in the chunk_name > file_name
form.
Whitespace around the file and chunk names is optional and will be stripped.
The user is warned if either the chunk or file name is empty,
and for any referenced root chunk name that was not found in the file.
Identity maps (mapping the root chunk name to itself) are ignored.
root_name_map = {}
doc.attr('litprog-file-map').to_s.split ':' do |entry|
entry.strip!
cname, fname = entry.split '>', 2
cname.strip!
fname.strip!
if cname.empty? or fname.empty?
logger.warn 'empty chunk name in litprog-file-map ignored' if cname.empty?
logger.warn 'empty file name in litprog-file-map ignored' if fname.empty?
next
end
unless @roots.include? cname
logger.warn "non-existent chunk #{cname} in litprog-file-map ignored"
next
end
next if cname == fname # nothing to remap
<<Check for fname uniqueness>>
root_name_map[cname] = fname
end
We want output file names to be unique, i.e. different both from other file names and from root chunk names. This is to avoid overwriting an output with the other.
i
|
due to the way this check is done, it’s not possible to swap
two chunk names with a A > B : B > A file map.
|
raise ArgumentError, "#{cname} remapped to existing #{fname}" if @roots.include? fname
mapped_already = root_name_map.key fname
raise ArgumentError, "#{cname} remapped to #{fname}, same as #{mapped_already}" if mapped_already
Once the root_name_map
hash is constructed, its use is trivial:
name = root_name_map.fetch name, name
AsciiDoc’s syntax allows us to forego special syntax to identify code chunks: we assume
that any listing
block in the source
style is (part of) a single code chunk.
Processing of a single block requires us to identify the chunk type (root or generic) and title, add the title to the known chunk titles (if necessary) and append the block lines to the chunk contents.
Since the default value for missing chunks is an empty Array
,
we can append the new lines directly using the Array#+=
method,
without special-casing the case for the first block that defines a chunk.
We also need to check if the new lines reference other chunks, and if so we add the title to the list of known titles, to allow shortened names to be used henceforth. This information can also be used for cross-referencing chunks, in which case the ID of the block is necessary to identify exactly which block in a chunk references another chunk. This block ID is described below.
def add_to_chunk chunk_hash, chunk_title, block_lines, block_id
@chunk_names.add chunk_title
chunk_hash[chunk_title] += block_lines
<<Check for references and prime the chunk names>>
end
We want to be able to reference blocks by the title of the chunk(s) they define, so we generate a chunk-specific ID and assign it to the block if appropriate.
To simplify management, we keep track of the blocks that contribute to each chunk:
@chunk_blocks = Hash.new { |hash, key| hash[key] = [] }
Since a source
block contributes to a single chunk, this map would be sufficient
to trivially reconstruct the whole chunk contents with origin information.
However, since the “legacy” aweb
syntax has a more complex many-to-many correspondence between chunks and blocks,
we need to separate the two pieces of information.
The chunk-specific block ID is always generated when a block is added to a chunk,
but since Asciidoctor does not support having multiple IDs referring to the same block,
it is assigned as the block ID only if the block does not already have a user-defined ID.
The chunk-specific ID is generated using the method Asciidoctor uses for sections,
but prepending _chunk
and appending a sequential block_N
where N is
the sequential block number (1-based, computed after appending the current block to the @chunk_blocks
).
The map between title and block ID is also registered in the document catalog, for use in the weaving process.
def add_chunk_block_with_id chunk_title, block
block_count = @chunk_blocks[chunk_title].append(block).size
title_for_id = "_chunk_#{chunk_title}_block_#{block_count}"
new_id = Asciidoctor::Section.generate_id title_for_id, block.document
# TODO error handling
block.document.register :refs, [new_id, block]
block.id = new_id unless block.id
block.document.catalog[:lit_prog_chunks][chunk_title] << new_id
return new_id
end
i
|
since the chunk-specific block ID is only assigned to the block if it doesn’t have an ID already, it should not be used in cross-references directly. An auxiliary function is defined to help remap from the chunk-based ID to the Asciidoctor ID |
def remap_chunk_block_id doc, chunk_block_id
return doc.catalog[:refs][chunk_block_id].id
end
To allow document metadata to be used in source
blocks
(e.g. to share author and version information)
we allow the :attributes
substitutions (and only those)
to be applied to the block lines:
def apply_supported_subs block
if block.subs.include? :attributes
block.apply_subs block.lines, [:attributes]
else
block.lines
end
end
A source
block contributes to a single chunk.
This will be a root chunk if the block has an output
attribute, or a generic chunk otherwise.
The chunk_hash
local variable is used to track which of the @root
and @chunks
collections this block needs to be added to.
def process_source_block block
chunk_hash = @chunks
if block.attributes.has_key? 'output'
<<Handle root chunk>>
else
<<Handle generic chunk>>
end
<<Track source location information>>
block_lines = apply_supported_subs block
block_id = add_chunk_block_with_id chunk_title, block
add_to_chunk chunk_hash, chunk_title, block_lines, block_id
end
For a root chunk, the chunk_hash
must be set to @root
,
and we take the output
block attribute as chunk_title
.
chunk_hash = @roots
chunk_title = block.attributes['output']
<<Ensure root chunk title is unique>>
Root chunks are unique (we do not append to them), so we need to check that there are no root chunks
already defined with the given chunk_title
:
raise ArgumentError, "Duplicate root chunk for #{chunk_title}" if @roots.has_key?(chunk_title)
For a generic chunk, chunk_hash
is left at the default value (@chunks
),
and the chunk_title
is set from the title attribute of the block.
We want to use the raw block title for this,
which is not exposed by Asciidoctor directly,
Because of this, we need to “monkey patch” the block class to provide an appropriate method:
Block
classmodule Asciidoctor
class Block
def litprog_raw_title
@title
end
end
end
We can use this method to retrieve the raw block title, and if the block title was shortened, we also replace it with the full chunk title, to improve the legibility of the documentation.
# We use the block title (TODO up to the first full stop or colon) as chunk name
title = block.litprog_raw_title
chunk_title = full_title title
block.title = chunk_title if title != chunk_title
Regardless of the chunk type, processing of the block is finished by scanning the lines of the block, to add any
referenced chunk name to @chunk_names
:
block_lines.each do |line|
mentioned, _ = is_chunk_ref line
if mentioned
@chunk_names.add mentioned
add_chunk_ref chunk_title, block_id, mentioned
end
end
For each block composing a chunk we want to keep track of where it was defined,
so that this information can be added to the output file if requested,
and also the source language for the block,
to control the way the location is output.
We do this by pushing the attribute and the source_location
metadata of each block
into the corresponding chunk Array
, right before the corresponding lines:
chunk_hash[chunk_title].append block.attributes
chunk_hash[chunk_title].append block.source_location
The source_location
is only tracked correctly when the sourcemap
feature is enable for the document.
This must be done at the preprocessing stage,
during which we can also set the defaults for our custom attributes:
preprocessor do
process do |doc, reader|
doc.sourcemap = true
<<Set default attributes>>
nil
end
end
In aweb
, chunk definition is done in anonymous listing
blocks (without special attributes or styles).
A listing
block is assumed to define a chunk if the block begins with a chunk assignment line,
i.e. a line that contain only a <<Chunk title>>=
, without leading whitespace, and optionally followed by whitespace.
CHUNK_DEF_RX = /^<<(.*)>>=\s*$/
def process_listing_block block
<<Filter legacy listing block>>
<<Define listing block processing variables>>
<<Legacy block processing>>
end
If the block does not begin with a chunk definition, we can bail out early:
return if block.lines.empty?
return unless block.lines.first.match(CHUNK_DEF_RX)
A single block can define multiple chunks: each definition spans from the line following the assignment line to the end of the block or the next chunk assignment line. We know however that we have at least one chunk (since otherwise the block is skipped):
chunk_titles = [ full_title($1) ]
Since we can have multiple chunks defined in the same block,
we cannot use the block’s source_location
directly:
we need to track the offset (in lines) where each chunk definition begins from the block source location.
block_location = block.source_location
chunk_offset = 0
To group the block lines into chunk definitions, we can leverage Ruby’s Enumerable#slice_when
method.
A new slice starts when the second line in the pair is a chunk assignment.
In this case, the match will give us the chunk title, that we store in chunk_titles
,
and the block_lines
we’re interested in are the lines in the slice, except for the first one
(that holds the chunk assignment expression).
block.lines.slice_when do |l1, l2|
l2.match(CHUNK_DEF_RX) and chunk_titles.append(full_title $1)
end.each do |lines|
chunk_title = chunk_titles.shift
block_lines = lines.drop 1
chunk_hash = @chunks
<<Detect legacy chunk type>>
<<Track legacy chunk location information>>
block_id = add_chunk_block_with_id chunk_title, block
add_to_chunk chunk_hash, chunk_title, block_lines, block_id
end
In aweb
, the root chunk is determined by the user from the command line,
and by default it is identified by the special chunk title *
.
Multiple root chunks are supported, but require multiple pass (one per root) to extract.
We extend the root chunk auto-detection by assuming that any chunk that does not contain spaces
in the title is a root chunk.
unless chunk_title.include? " "
chunk_hash = @roots
<<Ensure root chunk title is unique>>
end
The actual location of the chunk being processed can be obtained from the block location
adding the chunk_offset
, plus one to skip the chunk assignment line.
After we’ve set the origin for the current chunk lines, we can increment the chunk_offset
for the next chunk.
chunk_location = block_location.dup
chunk_location.advance(chunk_offset + 1)
chunk_hash[chunk_title].append(chunk_location)
chunk_offset += lines.size
Since our documents are natively AsciiDoc documents, the literate source itself can be processed
by any AsciiDoc processor, even without support for the special syntax that defines chunks.
The weaving process in this case is limited to a manipulation of the source
blocks
to improve the appearance and functionality of chunk references.
Additionally, the graph describing chunk inclusion is also output during this phase,
if requested.
To support chunk cross-referencing, we manipulate all the blocks associated with a chunk, adding links to the other blocks that define the same chunk, and replacing chunk references with AsciiDoc hyperlinks, in addition to the block title normalization done during the processing.
For each block we will need to know if a block is the last block in the list to determine if it needs a “next” link or not, so we cache the value of the last block index to speed up the check.
def weave doc
@chunk_blocks.each do |chunk_title, block_list|
last_block_index = block_list.size - 1
block_list.each_with_index do |block, i|
<<Add chunk navigation links>>
end
end
if doc.attr('litprog-dot-graph')
<<Output chunk reference graph>>
end
end
The chunk navigation links are added to the title of the block if there are preceding/following blocks in the same list. We also include a link to the chunk block(s) that include the chunk this block belongs to: for these, we have to remeber that the chunk-specific block ID may not correspond to the actual block ID known to Asciidoctor.
links = []
# link to previous block in this chunk
links << "xref:\##{block_list[i-1].id}[⮝,role=prev]" if i > 0
# link to next block in this chunk
links << "xref:\##{block_list[i+1].id}[⮟,role=next]" if i != last_block_index
# link to block(s) that include the chunk this block belongs to
if @chunk_backrefs.key? chunk_title
# uplinks are placed using unshift, so process them in reverse order
@chunk_backrefs[chunk_title].reverse_each do |inc|
includer, includer_block_id = inc
if count_chunk_blocks(doc, includer) > 1
includer_block_num = includer_block_id.split('_').last
desc = "Used in: #{includer} [#{includer_block_num}]"
else
desc = "Used in: #{includer}"
end
# remap from the chunk-specific block ID to the Asciidoctor block ID
includer_block_id = remap_chunk_block_id doc, includer_block_id
links.unshift '|' if links.length > 0
# TODO apparently AsciiDoc(tor) doesn't support anchor titles?
# links.unshift "xref:\##{includer_block_id}[⏚,role=up,title=\"${desc}\"]"
desc.gsub!("'",''')
links.unshift "+++<a href='\##{includer_block_id}' class='up' title='#{desc}'>⏚</a>+++"
end
end
if links.length > 0
# protect against a nil title ---------v
block.title = (block.litprog_raw_title || '') + ' [.litprog-nav]#' + (links * ' ') + '#'
end
The default style for the navigation links floats them to the end of the line (we fall back to right floating for older user agents), prints them in an upright font, and removes the text underline:
span.litprog-nav {
float: right;
float: inline-end;
font-style: normal;
}
span.litprog-nav a {
text-decoration: none;
}
The final part of the weaving process is to turn chunk references found inside chunks into hyperlinks to the corresponding chunk definition(s). Since in-document the code snippets are handled by the syntax highlighter, to be able to capture and manage the chunk references we need to hook into the syntax highlighting mechanism.
Currently we implement support only for the rouge
syntax highlighter, that we extend
with a custom derived class, for which we override the lexer and formatter:
rouge
highlighterclass LitProgRouge < (Asciidoctor::SyntaxHighlighter.for 'rouge')
register_for 'rouge'
def create_lexer node, source, lang, opts
<<Custom lexer>>
end
def create_formatter node, source, lang, opts
<<Custom formatter>>
end
end
The new lexer overrides whatever lexer would normally be used by Asciidoctor,
but extends the step
method (used by RegExp
lexers in rouge
)
to look for whole lines that match a chunk and yield
a Comment::Special
token instead of whatever the original lexer would:
lexer = super
class << lexer
def step state, stream
if state == get_state(:root) or stream.beginning_of_line?
if stream.scan /((?:^|[\r\n]+)\s*)(<<.*>>)(\s*)$/
yield_token Text::Whitespace, stream.captures[0]
yield_token Comment::Special, stream.captures[1]
yield_token Text::Whitespace, stream.captures[2]
return true
end
end
super
end
end
lexer
The custom formatter looks for Comment::Special
tokens
and turns them into hyperlinks if the comment content
matches a chunk reference.
To resolve the chunk references, the formatter needs to query the document catalog,
which we make available by creating a new :@litprog_catalog
instance variable.
If multiple blocks contribute to a chunk, separate numbered links are created for each block past the first.
i
|
this formatter only works as expected for HTML output. |
❗︎
|
we overload the span rather than safe_span method, to simplify title matching.
Otherwise we would need to unescape the special characters < , > , & ,
and then re-escape them again when creating the links.
|
formatter = super
# make the document catalog accessible to the formatter
formatter.instance_variable_set :@litprog_catalog, node.document.catalog[:lit_prog_chunks]
class << formatter
include Asciidoctor::Logging
<<Define function to link to a literate programming chunk>>
def span tok, val
special = tok.matches? ::Rouge::Token::Tokens::Comment::Special
if special
m = val.match /<<(.*)>>/
if m
title = m[1]
<<Query the document catalog of literary programming chunks>>
if hits.empty?
logger.warn "Unresolved chunk reference #{title.inspect} found in special comment while formatting source"
else
first, *rest = *hits
safe_val = "<<" + litprog_link(first, title)
if rest.length > 0
safe_val += "<sup> " + rest.each_with_index.map { |hit, index|
litprog_link(hit, index+2)
}.join(' ') + "</sup>"
end
safe_val += ">>"
return safe_span tok, safe_val
end
end
end
super
end
end
formatter
The function to generate the link is trivial:
it simply returns an a
HTML element with a litprog-nav
class.
def litprog_link id, text
target = '#' + id
"<a class='litprog-nav' href='#{target}'>#{text}</a>"
end
These are also styled without underline:
a.litprog-nav {
text-decoration: none;
}
The map between title and link targets is retrieved from the document catalog,
and we use an ad-hoc version of the full_title
function,
because we expect any duplication or missing chunks to have been detected
at previous stages.
This section of the code also takes care to apply escape_special_html_chars
to the title. This takes care of any <
, >
and &
in the text,
as the standard rouge
HTML formatter would do..
pfx = title.chomp("...")
if pfx != title
fulltitle, hits = @litprog_catalog.find { |k, v| k.start_with? pfx }
fulltitle = fulltitle.gsub("'", ''')
title = "<abbr title='#{fulltitle}'>#{escape_special_html_chars title}</abbr>"
else
hits = @litprog_catalog[title]
title = escape_special_html_chars title
end
If the litprog-dot-graph
attribute is set,
we produce in the output directory a DOT source,
named after the document source, with a .litprog.dot
extension.
This DOT file describes the inclusion graph between the chunks,
output with a left-to-right orientation
(included chunks on the left and including chunks on the right).
❗︎
|
The mechanism is currently very barebone. Several possible improvements that are being considered are presented in the TODO list. |
dotfile = doc.attr('docname') + '.litprog.dot'
dotdir = doc.attr('outdir', '.', 'docdir')
File.open(File.join(dotdir, dotfile), 'w') do |f|
f.puts %(
digraph {
rankdir=LR;
nodesep="1";
overlap=false;
)
<<Output DOT connections>>
<<Output DOT chunks>>
f.puts '}'
end
The DOT file uses the same symbolic naming convention as the block IDs, but without the block count.
def dot_chunk_id doc, chunk_name
block_id = doc.catalog[:lit_prog_chunks][chunk_name].first
return block_id.gsub(/_block_\d+$/,'')
end
We use record
structures in DOT, to identify chunks composed of multiple blocks.
For this, we need to frequently determine how many blocks a chunk is composed of.
def count_chunk_blocks doc, chunk_name
doc.catalog[:lit_prog_chunks][chunk_name].length
end
Chunk names used as labels in the DOT file have to be properly quoted, and we limit their length to avoid the nodes in the graph from getting too long.
The wrap-around is implemented by adding newlines whenever adding a word to a non-empty line would exceed the line length. The “non-empty line” condition is added to allow words longer than the limit to be added.
def limit_line_length text, maxlen
words = text.split ' '
ret = []
line = ''
words.each { |word|
if line.length > 0 and line.length + word.length > maxlen
ret.push line
line = ''
end
line += ' ' if line.length > 0
line += word
}
ret.push line
ret.join("\\n")
end
Quoting actually does more than just quoting: it also adds the record structure for chunks composed by multiple blocks:
def quote_for_dot doc, chunk_name
nblocks = count_chunk_blocks doc, chunk_name
# start by escaping the name proper
base = limit_line_length(chunk_name, 33).gsub('["<>|]', '\\\0')
# add a <chunk> port to the base name
base = "<chunk> #{base}"
# add the other ports for multi-block chunks
if nblocks > 1
base += "| { " + 1.upto(nblocks).map { |i| "<block_#{i}> #{i}" }.join(' | ') + " }"
end
return '"' + base + '"'
end
The connections between the graphs are simply obtained iterating over the chunk references, extracting the ID of both the referencing and referenced chunk, and connecting the primary record of the referenced chunk to the appropriate block record of the referencing chunk.
@chunk_backrefs.each { |chunk, refs|
this_id = dot_chunk_id doc, chunk
refs.each { |ref, block_id|
ref_id = dot_chunk_id doc, ref
port = count_chunk_blocks(doc, ref) == 1 ? "chunk" : block_id.match(/block_\d+$/)[0]
f.puts "#{this_id}:chunk:e -> #{ref_id}:#{port}:w"
}
}
Chunk node definitions are output to the DOT file after all connections, with proper quoting,
and forcing a monospace font for the root chunks.
Since for code chunks we want to output the full (quoted, wrapped) chunk names,
we iterate the @chunk_names
array.
@chunk_names.each { |chunk|
chunk_id = dot_chunk_id doc, chunk
quoted_chunk = quote_for_dot doc, chunk
fontspec = @roots.key?(chunk) ? ",fontname=\"Monospace\"" : ""
f.puts "#{chunk_id} [shape=record,label=#{quoted_chunk}#{fontspec}]"
}
The document as a whole is processed simply by processing all the listing blocks, Tangling the output files, and Weaving the documentation, after initializing the catalog of literate programming chunks, that maps titles to chunk IDs.
def process doc
doc.catalog[:lit_prog_chunks] = Hash.new { |h, k| h[k] = [] }
doc.find_by context: :listing do |block|
if block.style == 'source'
process_source_block block
else
process_listing_block block
end
end
tangle doc
weave doc
doc
end
The complete module simply assembles what we’ve seen so far, and registers the extension with Asciidoctor:
<<Licensing statement>>
<<Requires>>
<<Override...>>
<<Monkey patch...>>
<<Main class...>>
Asciidoctor::Extensions.register do
<<Enable sourcemap...>>
tree_processor LiterateProgrammingTreeProcessor
docinfo_processor LiterateProgrammingDocinfoProcessor
end
The software is copyright © 2021–2024 by Giuseppe Bilotta, and is made available under the MIT license. See the LICENSE file for further details.
# Copyright (C) 2021–2024 Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
# This software is licensed under the MIT license. See LICENSE for details
Known issues and limitations so far:
-
we need to monkey-patch the
Block
class to access raw (unconverted) titles -
conversion of chunk references to hyperlinks during weaving is specific to the syntax highlighter; currently we only support the
rogue
syntax highlighter -
other aspects of the weaving process introduce converter-specific output. These include:
-
uplinks to including chunks (due to a limitation in Asciidoctor’s link management that prevents adding a
title
attribute for tooltips) -
links to the included chunks within
source
blocks -
expansion of abbreviate chunk titles to the full form in
source
blocksFor these features, the only supported converter is the HTML converter.
-
- improve chunk title parsing
-
the block title should only be used up to the first full stop or colon; the biggest problem in implementing this is arguably the ambiguity of the full stop vs ellipsis.
- support for the eWEB and nowasp syntax
-
the nowasp/noweb syntax support in particular will require support for inline chunk reference expansion, escaping of inline
<<
/>>
pair as well as start-of-line@
symbols (see thetest/noweb-alike.adoc
test file); this will probably require some flag to enable/disable (probably a document attribute:litprog-syntax:
with possible valuesaweb
andnoweb
). lineno
configuration-
-
✓ global setting implemented via
litprog-line-template
document attribute; -
✓ per-language overrides (possibly with good defaults);
-
❏ per-file overrides; this should be doable adding other keys to the
@line_directive_template
hash.
-
- auto-indent configuration
-
the preservation of leading whitespace during tangling should be optional (again, globally + per-file / per-language and possibly per-chunk overrides). We see the need in this source file in the graph output code that outputs headers with an indent.
- selective writing
-
in particular, avoid overwriting the destination file if the content would be unchanged; this is important to support large-scale projects where we want to avoid recompiling unchanged modules.
- support other kinds of formatters
-
chunks will not be hyperlinked in syntax highlighters different from
rouge
presently. - allow swapping file names in
litprog-file-map
-
as pointed out in the relevant note, this is currently not supported due to the way the check for uniqueness is done, but could be supported with a smarter check.
- graph output tuning
-
currently the graph output is very barebone. Several possible improvements include:
-
❏ customizable header
-
❏ customizable node width (currently wrapping is hard-coded at 33)
-
❏ take file mapping into consideration for the root chunks output
-
✓ output chunks by their (HTML) ID and only quote/limit the label
-
✓ output multi-block chunks differently from single-block ones (maybe as records?)
-
❏ add links in the chunks to the source in the HTML documentation
-