© 2011-2017 Alice Bevan-McGregor and contributors.
https://github.com/marrow/dsl
Domain specific languages allow you to write code in ways more optimized to specialized tasks. These can often be thought of as interpreters for programming languages other than the one the interpreter is written in, and you may already be familiar with a few such as template engines, or testing frameworks. Marrow DSL is a framework for easily constructing new ones in Python using a preprocessor methodology performing transformation seamlessly at module import time.
Want to write a really fast template engine of your own? You can do that.
Want to write a testing framework for story driven development? Totally possible.
Want to obfuscate or encrypt your code? We don't recommend it, but yeah, doable.
Want to prank your co-workers and turn their Python into Pascal or C? We beg you, please don't do this.
You might notice the base name of the engine is GalfiDecoder – this was the original internal project name. Marrow
projects tend to follow a more... functionally literal... naming scheme in English, and this legacy remains. So what
in the world is "galfi"?
It's a word from the constructed language Lojban. A combination of Chinese "gǎi", English "alter", Hindi "badalanā", Spanish "modificar", Russian "modificirovatʹ", and Arabic "gaiar". It translates as "(an event) X modifies/alters/changes/transforms/converts Y into Z", and is a fairly literal interpretation for the mechanism this DSL engine provides, where X is module import, Y is your DSL, and Z is Python code.
Specific engines, such as cinje and korcu will be released using their literal names, though. Lojban root words, or gismu are neat: built from global natural languages, descriptive, and they scratch the regular expression itch.
We find most DSLs (especially template engines) in Python to:
- Be overly complex, often taking a classical lexer/parser/AST approach to language construction. This can be difficult for developers new to the language (or new to programming) to understand or extend, and poses a hurdle to the understanding of the basic principles. Constructing new ways to write code should be easy and accessible, not difficult and opaque wizardry.
- Repeatedly solve the same problems in similar ways that could benefit from deduplication between engines. The needs of most engines are similar; these should be fulfilled by a common codebase benefitting many engines.
- Duplicate functionality such as the import pipeline (e.g. to acquire an invokable object to generate templated text) or bytecode caching layer already present in Python, instead of leveraging these built-in tools.
Marrow DSL takes a simpler approach than most by:
- Treating the domain-specific code fundamentally as lines of input text which can trigger transformations and code generation, reducing lexing/parsing problems to simple string matching and manipulation. This results in a basic DSL framework less than a quarter the size of an average template engine, and engines utilizing Marrow DSL a fraction of that.
- Ensuring transformation is seamless at module import time, allowing full utilization of Python's own internal bytecode cache as well as the existing package/module discovery and import mechanisms.
- Allowing the bytecode resulting from translation (as managed by Python itself) to have no dependency at all on the engine that produced it, making the engine (and Marrow DSL) a build time, not production deployment dependency.
Domain-specific languages written using Marrow DSL integrate into general Python codebases seamlessly, and are transformed in a predictable, understandable way that is easy to extend.
Installing marrow.dsl is easy, just execute the following in a terminal:
pip install marrow.dsl
Note: We strongly recommend always using a container, virtualization, or sandboxing environment of some kind when developing using Python; installing things system-wide is yucky (for a variety of reasons) nine times out of ten. We prefer light-weight virtualenv, others prefer solutions as robust as Vagrant.
If you add marrow.dsl to the install_requires argument of the call to setup() in your application's
setup.py file, the engine will be automatically installed and made available when your own application or
library is installed. You can alternatively make marrow.dsl a build-time dependency by declaring it against the
setup_requires argument instead.
We recommend "less than" version number pinning to ensure there are no unintentional side-effects when updating. Use
marrow.dsl<1.1 to get all bugfixes for the current release, and marrow.dsl<2.0 to get bugfixes and feature
updates while ensuring that large breaking changes are not installed.
Development takes place on GitHub in the marrow/dsl project. Issue tracking, documentation, and downloads are provided there.
Installing the current development version requires Git, a distributed source code management system. If you have Git you can run the following to download and link the development version into your Python runtime:
git clone https://github.com/marrow/dsl.git (cd dsl; python setup.py develop)
You can then upgrade to the latest version at any time:
(cd dsl; git pull; python setup.py develop)
If you would like to make changes and contribute them back to the project, fork the GitHub project, make your changes, and submit a pull request. This process is beyond the scope of this documentation; for more information see GitHub's documentation.
A Marrow DSL boils down to two things: DSL metadata registration and processing customization, represented as a class
registered via entry_points under the marrow.dsl namespace, and; one or more transformation classes registered
under the entry_points namespace for your named DSL which are used to inspect, claim, and transform lines of input.
The mechanism by which transformation is triggered may be somewhat alien: Python unicode decoding hooks for source
files, executed when opening the source file, prior to parsing, compilation, byte code storage, and evaluation during
import. To control this magic requires the internal use of Unicode encoding declaration and the # [en]coding:
module encoding declaration to trigger transformation at import time.
Python modules written using a DSL are otherwise just .py files given a DSL encoding declaration.
In accordance with PEP 3120, the default encoding of the underlying textual content of all pre-transformation DSLs is UTF-8. Transformers should only operate on native unicode text unless additional processing, such as AST analysis, is absolutely required for the operation of the transformer. The standard library includes a vast amount of introspection, parsing, compilation, and other tools prior to needing to process and regenerate the whole source file from an abstraction. Any DSL whose purpose is the generation of text should similarly default to UTF-8 output.
DSLs may have flags and simple options associated with them. Due to limitations on the way Python searches for encoding prefixes on source files, the names available are restricted.
- Within the general name for a specific DSL, any alphanumeric characters (
a-z,0-9, regardless of case) may be used. This name is parsed early and used to look up the appropriate named metadataentry_pointfrom themarrow.dslnamespace. E.g.:cinje - Allowed flags must be declared via
FLAGSDSL metadata and are enabled within individual encoding declarations as suffixes on the name, with the same restrictions while allowing hyphens, each prefixed with a period. Multiple may be concatenated and should be lexicographically sorted. E.g. therawandunsafeflags on thecinjeencoding:cinje.raw.unsafe - Options are identified as hyphen-separated key value pairs. These are kept unambiguous from flags containing
hyphens by the explicit declaration of allowed flags in the DSL metadata. Allowed options are defined through
assignment of
__slots__explicitly naming options to allocate storage for. (This causes Python to forbid assignment of unknown attributes.) While the value may contain hyphens, the key may not contain any. Numeric-seeming values will be cast to integers automatically during encoding declaration parsing.
Lines of code, both input written in the DSL and output Python code, are individually represented by Line
instances. Collections of lines are stored in Lines instances. At all scales tags are used to help identify
the lines and collections, represented as sets.
Line defines the content, original line number, scope, and metadata for a single line. Lines represents
The attributes of a line are:
line- The string (unicode text) value of the line.stripped- A whitespace stripped (leading and trailing) version of the line.number- The line number this line originated from in the source, or was triggered by for the purpose of code generation.scope- The Python scope, denoted by indentation level in the resulting code. Leading whitespace in a given line already has the scope-based indentation removed. This means indented blocks in docstrings, manually aligned continuations, and other such situations will have their additional whitespace preserved.tag- An optional set of tags to associate with the line. For example, a built-in tag to identify lines that are manually wrapped and "continued" on the next line there is thecontinuedtag.
Each Line offers a rich programmers' representation and upon casting to a unicode string will regenerate the line,
including leading indentation. As most lines are constructed from the mutation of an existing line, or based on a
triggering line in the case of code generation, two methods are provided to assist:
clone(**kw)- Return a newLineinstance as a mutable shallow copy. Any arguments provided will override (replace) the attribute of the same name.format(*args, **kw)- Apply string formatting interpolation (str.format)) to thelineattribute, and return a clone of the line with this new value.
TBD
- common metadata
- extended metadata
- origin tracing
- continuation
TBD
- common metadata
- context stack
- reentrant FIFO, push to head mid-iteration
- named sections
- global metadata
- reentrant line producer
- named scopes
TBD
TBD
Transformation is a stack-based, almost coroutine-like streaming process utilizing Python's yield syntax extensively. Individual transformers cooperate to construct the working context as they go, with block transformers manipulating whole lines, and inline transformers manipulating substrings of a line. Additionally, block transformers may be unbuffered, where they may generate one or more lines in response to a line, or buffered, where they act as context managers helping to subdivide the source text into logical sections by constructing "nested" (though not really) buffers.
TBD
- unbuffered
- buffered
TBD
- delimited interpolation
TBD
- buffer context exit triggered
- post other transformation on the buffer contents
- Initial release.
Marrow DSL (marrow.dsl) has been released under the MIT Open Source license.
Copyright © 2011-2017 Alice Bevan-McGregor and contributors.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.