forked from aosabook/500lines
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtemplate-engine.tex
1360 lines (1099 loc) · 51.7 KB
/
template-engine.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\begin{aosachapter}{A Template Engine}{s:template-engine}{Ned Batchelder}
\aosasecti{Introduction}\label{introduction}
Most programs contain a lot of logic, and a little bit of literal
textual data. Programming languages are designed to be good for this
sort of programming. But some programming tasks involve only a little
bit of logic, and a great deal of textual data. For these tasks, we'd
like to have a tool better suited to these text-heavy problems. A
template engine is such a tool. In this chapter, we build a simple
template engine.
The most common example of one of these text-heavy tasks is in web
applications. An important phase in any web application is generating
HTML to be served to the browser. Very few HTML pages are completely
static: they involve at least a small amount of dynamic data, such as
the user's name. Usually, they contain a great deal of dynamic data:
product listings, friends' news updates, and so on.
At the same time, every HTML page contains large swaths of static text.
And these pages are large, containing tens of thousands of bytes of
text. The web application developer has a problem to solve: how best to
generate a large string containing a mix of static and dynamic data? To
add to the problem, the static text is actually HTML markup that is
authored by another member of the team, the front-end designer, who
wants to be able to work with it in familiar ways.
For purposes of illustration, let's imagine we want to produce this toy
HTML:
\begin{verbatim}
<p>Welcome, Charlie!</p>
<p>Products:</p>
<ul>
<li>Apple: $1.00</li>
<li>Fig: $1.50</li>
<li>Pomegranate: $3.25</li>
</ul>
\end{verbatim}
Here, the user's name will be dynamic, as will the names and prices of
the products. Even the number of products isn't fixed: at another
moment, there could be more or fewer products to display.
One simple way to make this HTML would be to have string constants in
our code, and join them together to produce the page. Dynamic data would
be inserted with string substitution of some sort. Some of our dynamic
data is repetitive, like our lists of products. This means we'll have
chunks of HTML that repeat, so those will have to be handled separately
and combined with the rest of the page.
Producing our toy page in this way might look like this:
\begin{verbatim}
# The main HTML for the whole page.
PAGE_HTML = """
<p>Welcome, {name}!</p>
<p>Products:</p>
<ul>
{products}
</ul>
"""
# The HTML for each product displayed.
PRODUCT_HTML = "<li>{prodname}: {price}</li>\n"
def make_page(username, products):
product_html = ""
for prodname, price in products:
product_html += PRODUCT_HTML.format(prodname=prodname, price=format_price(price))
html = PAGE_HTML.format(name=username, products=product_html)
return html
\end{verbatim}
This works, but we have a mess on our hands. The HTML is in multiple
string constants embedded in our application code. The logic of the page
is hard to see because the static text is broken into separate pieces.
The details of how data is formatted is lost in the Python code. In
order to modify the HTML page, our front-end designer would need to be
able to edit Python code to make HTML changes. Imagine what the code
would look like if the page were ten (or one hundred) times more
complicated; it would quickly become unworkable.
\aosasecti{Templates}\label{templates}
The better way to produce HTML pages is with \emph{templates}. The HTML
page is authored as a template, meaning that the file is mostly static
HTML, with dynamic pieces embedded in it using special notation. Our toy
page above could look like this as a template:
\begin{verbatim}
<p>Welcome, {{user_name}}!</p>
<p>Products:</p>
<ul>
{% for product in product_list %}
<li>{{ product.name }}:
{{ product.price|format_price }}</li>
{% endfor %}
</ul>
\end{verbatim}
Here the focus is on the HTML text, with logic embedded in the HTML.
Contrast this document-centric approach with our logic-centric code
above. Our earlier program was mostly Python code, with HTML embedded in
the Python logic. Here our program is mostly static HTML markup.
The mostly-static style used in templates is the opposite of how most
programming languages work. For example, with Python, most of the source
file is executable code, and if you need literal static text, you embed
it in a string literal:
\begin{verbatim}
def hello():
print("Hello, world!")
hello()
\end{verbatim}
When Python reads this source file, it interprets text like
\texttt{def hello():} as instructions to be executed. The double quote
character in \texttt{print("Hello, world!")} indicates that the
following text is meant literally, until the closing double quote. This
is how most programming languages work: mostly dynamic, with some static
pieces embedded in the instructions. The static pieces are indicated by
the double-quote notation.
A template language flips this around: the template file is mostly
static literal text, with special notation to indicate the executable
dynamic parts.
\begin{verbatim}
<p>Welcome, {{user_name}}!</p>
\end{verbatim}
Here the text is meant to appear literally in the resulting HTML page,
until the `\texttt{\{\{}' notation indicates a switch into dynamic mode,
where the \texttt{user\_name} variable will be substituted into the
output.
String formatting functions such as Python's
\texttt{"foo = \{foo\}!".format(foo=17)} are examples of mini-languages
used to create text from a string literal and the data to be inserted.
Templates extend this idea to include logic constructs like conditionals
and loops, but the difference is only of degree.
These files are called templates because they are used to produce many
pages with similar structure but differing details.
To use HTML templates in our programs, we need a \emph{template engine}:
a function that takes a static template describing the structure and
static content of the page, and a dynamic \emph{context} that provides
the dynamic data to plug into the template. The template engine combines
the template and the context to produce a complete string of HTML. The
job of a template engine is to interpret the template, replacing the
dynamic pieces with real data.
By the way, there's often nothing particular about HTML in a template
engine, it could be used to produce any textual result. For example,
they are also used to produce plain-text email messages. But usually
they are used for HTML, and occasionally have HTML-specific features,
such as escaping, which makes it possible to insert values into the HTML
without worrying about which characters are special in HTML.
\aosasecti{Supported Syntax}\label{supported-syntax}
Template engines vary in the syntax they support. Our template syntax is
based on Django, a popular web framework. Since we are implementing our
engine in Python, some Python concepts will appear in our syntax. We've
already seen some of this syntax in our toy example at the top of the
chapter, but this is a quick summary of all of the syntax we'll
implement.
Data from the context is inserted using double curly braces:
\begin{verbatim}
<p>Welcome, {{user_name}}!</p>
\end{verbatim}
The data available to the template is provided in the context when the
template is rendered. More on that later.
Template engines usually provide access to elements within data using a
simplified and relaxed syntax. In Python, these expressions all have
different effects:
\begin{verbatim}
dict["key"]
obj.attr
obj.method()
\end{verbatim}
In our template syntax, all of these operations are expressed with a
dot:
\begin{verbatim}
dict.key
obj.attr
obj.method
\end{verbatim}
The dot will access object attributes or dictionary values, and if the
resulting value is callable, it's automatically called. This is
different than the Python code, where you need to use different syntax
for those operations. This results in simpler template syntax:
\begin{verbatim}
<p>The price is: {{product.price}}, with a {{product.discount}}% discount.</p>
\end{verbatim}
Dots can be used multiple times on a single value to navigate down an
attribute or element chain.
You can use helper functions, called filters, to modify values. Filters
are invoked with a pipe character:
\begin{verbatim}
<p>Short name: {{story.subject|slugify|lower}}</p>
\end{verbatim}
Building interesting pages usually requires at least a small amount of
logic, so conditionals are available:
\begin{verbatim}
{% if user.is_logged_in %}
<p>Welcome, {{ user.name }}!</p>
{% else %}
<p><a href="/login">Log in </a></p>
{% endif %}
\end{verbatim}
Looping lets us include collections of data in our pages:
\begin{verbatim}
<p>Products:</p>
<ul>
{% for product in product_list %}
<li>{{ product.name }}: {{ product.price|format_price }}</li>
{% endfor %}
</ul>
\end{verbatim}
As with other programming languages, conditionals and loops can be
nested to build complex logical structures.
Lastly, so that we can document our templates, comments appear between
brace-hashes:
\begin{verbatim}
{# This is the best template ever! #}
\end{verbatim}
\aosasecti{Implementation Approaches}\label{implementation-approaches}
In broad strokes, the template engine will have two main phases:
\begin{aosaitemize}
\item
Parse the template
\item
Render the template to assemble the text result, which involves:
\begin{aosaitemize}
\item
Managing the dynamic context, the source of the data
\item
Executing the logic elements
\item
Implementing dot access and filter execution
\end{aosaitemize}
\end{aosaitemize}
The question of what to pass from the parsing phase to the rendering
phase is key. What does parsing produce that can be rendered? There are
two main options; we'll call them \emph{interpretation} and
\emph{compilation}, using the terms loosely from other language
implementations.
In an interpretation model, parsing produces a data structure
representing the structure of the template. The rendering phase walks
that data structure, assembling the result text based on the
instructions it finds. For a real-world example, the Django template
engine uses this approach.
In a compilation model, parsing produces some form of directly
executable code. The rendering phase executes that code, producing the
result. Jinja2 and Mako are two examples of template engines that use
the compilation approach.
Our implementation of the engine uses compilation: we compile the
template into Python code. When run, the Python code assembles the
result.
The template engine described here was originally written as part of
coverage.py, to produce HTML reports. In coverage.py, there are only a
few templates, and they are used over and over to produce many files
from the same template. Overall, the program ran faster if the templates
were compiled to Python code, because even though the compilation
process was a bit more complicated, it only had to run once, while the
execution of the compiled code ran many times, and was faster than
interpreting a data structure many times.
It's a bit more complicated to compile the template to Python, but it's
not as bad as you might think. And besides, as any developer can tell
you, it's more fun to write a program to write a program than it is to
write a program!
Our template compiler is a small example of a general technique called
code generation. Code generation underlies many powerful and flexible
tools, including programming language compilers. Code generation can get
complex, but is a useful technique to have in your toolbox.
Another application of templates might prefer the interpreted approach,
if templates will be used only a few times each. Then the effort to
compile to Python won't pay off in the long run, and a simpler
interpretation process might perform better overall.
\aosasecti{Compiling to Python}\label{compiling-to-python}
Before we get to the code of the template engine, let's look at the code
it produces. The parsing phase will convert a template into a Python
function. Here is our small template again:
\begin{verbatim}
<p>Welcome, {{user_name}}!</p>
<p>Products:</p>
<ul>
{% for product in product_list %}
<li>{{ product.name }}:
{{ product.price|format_price }}</li>
{% endfor %}
</ul>
\end{verbatim}
Our engine will compile this template to Python code. The resulting
Python code looks unusual, because we've chosen some shortcuts that
produce slightly faster code. Here is the Python (slightly reformatted
for readability):
\begin{verbatim}
def render_function(context, do_dots):
c_user_name = context['user_name']
c_product_list = context['product_list']
c_format_price = context['format_price']
result = []
append_result = result.append
extend_result = result.extend
to_str = str
extend_result([
'<p>Welcome, ',
to_str(c_user_name),
'!</p>\n<p>Products:</p>\n<ul>\n'
])
for c_product in c_product_list:
extend_result([
'\n <li>',
to_str(do_dots(c_product, 'name')),
':\n ',
to_str(c_format_price(do_dots(c_product, 'price'))),
'</li>\n'
])
append_result('\n</ul>\n')
return ''.join(result)
\end{verbatim}
Each template is converted into a \texttt{render\_function} function
that takes a dictionary of data called the context. The body of the
function starts by unpacking the data from the context into local names,
because they are faster for repeated use. All the context data goes into
locals with a \texttt{c\_} prefix so that we can use other local names
without fear of collisions.
The result of the template will be a string. The fastest way to build a
string from parts is to create a list of strings, and join them together
at the end. \texttt{result} will be the list of strings. Because we're
going to add strings to this list, we capture its \texttt{append} and
\texttt{extend} methods in the local names \texttt{result\_append} and
\texttt{result\_extend}. The last local we create is a \texttt{to\_str}
shorthand for the \texttt{str} built-in.
These kinds of shortcuts are unusual. Let's look at them more closely.
In Python, a method call on an object like
\texttt{result.append("hello")} is executed in two steps. First, the
append attribute is fetched from the result object:
\texttt{result.append}. Then the value fetched is invoked as a function,
passing it the argument \texttt{"hello"}. Although we're used to seeing
those steps performed together, they really are separate. If you save
the result of the first step, you can perform the second step on the
saved value. So these two Python snippets do the same thing:
\begin{verbatim}
# The way we're used to seeing it:
result.append("hello")
# But this works the same:
append_result = result.append
append_result("hello")
\end{verbatim}
In the template engine code, we've split it out this way so that we only
do the first step once, no matter how many times we do the second step.
This saves us a small amount of time, because we avoid taking the time
to look up the append attribute.
This is an example of a micro-optimization: an unusual coding technique
that gains us tiny improvements in speed. Micro-optimizations can be
less readable, or more confusing, so they are only justified for code
that is a proven performance bottleneck. Developers disagree on how much
micro-optimization is justified, and some beginners overdo it. The
optimizations here were added only after timing experiments showed that
they improved performance, even if only a little bit.
Micro-optimizations can be instructive, as they make use of some exotic
aspects of Python, but don't over-use them in your own code.
The shortcut for \texttt{str} is also a micro-optimization. Names in
Python can be local to a function, global to a module, or built-in to
Python. Looking up a local name is faster than looking up a global or a
built-in. We're used to the fact that \texttt{str} is a builtin that is
always available, but Python still has to look up the name \texttt{str}
each time it is used. Putting it in a local saves us another small slice
of time because locals are faster than builtins.
Once those shortcuts are defined, we're ready for the Python lines
created from our particular template. Strings will be added to the
result list using the \texttt{append\_result} or \texttt{extend\_result}
shorthands, depending on whether we have one string to add, or more than
one. Literal text in the template becomes a simple string literal.
Having both append and extend adds complexity, but remember we're aiming
for the fastest execution of the template, and using extend for one item
means making a new list of one item so that we can pass it to extend.
Expressions in \texttt{\{\{ ... \}\}} are computed, converted to
strings, and added to the result. Dots in the expression are handled by
the \texttt{do\_dots} function passed into our function, because the
meaning of the dotted expressions depends on the data in the context: it
could be attribute access or item access, and it could be a callable.
The logical structures \texttt{\{\% if ... \%\}} and
\texttt{\{\% for ... \%\}} are converted into Python conditionals and
loops. The expression in the \texttt{\{\% if/for ... \%\}} tag will
become the expression in the \texttt{if} or \texttt{for} statement, and
the contents up until the \texttt{\{\% end... \%\}} tag will become the
body of the statement.
\aosasecti{Writing the Engine}\label{writing-the-engine}
Now that we understand what the engine will do, let's walk through the
implementation.
\aosasectii{The Templite class}\label{the-templite-class}
The heart of the template engine is the Templite class. (Get it? It's a
template, but it's lite!)
The Templite class has a small interface. You construct a Templite
object with the text of the template, then later you can use the
\texttt{render} method on it to render a particular context, the
dictionary of data, through the template:
\begin{verbatim}
# Make a Templite object.
templite = Templite('''
<h1>Hello {{name|upper}}!</h1>
{% for topic in topics %}
<p>You are interested in {{topic}}.</p>
{% endfor %}
''',
{'upper': str.upper},
)
# Later, use it to render some data.
text = templite.render({
'name': "Ned",
'topics': ['Python', 'Geometry', 'Juggling'],
})
\end{verbatim}
We pass the text of the template when the object is created so that we
can do the compile step just once, and later call \texttt{render} many
times to reuse the compiled results.
The constructor also accepts a dictionary of values, an initial context.
These are stored in the Templite object, and will be available when the
template is later rendered. These are good for defining functions or
constants we want to be available everywhere, like our \texttt{upper}
function in the previous example.
Before we discuss the implementation of Templite, we have a helper to
define first: CodeBuilder.
\aosasectii{CodeBuilder}\label{codebuilder}
The bulk of the work in our engine is parsing the template and producing
the necessary Python code. To help with producing the Python, we have
the CodeBuilder class, which handles the bookkeeping for us as we
construct the Python code. It adds lines of code, manages indentation,
and finally gives us values from the compiled Python.
One CodeBuilder object is responsible for a complete chunk of Python
code. As used by our template engine, the chunk of Python is always a
single complete function definition. But the CodeBuilder class makes no
assumption that it will only be one function. This keeps the CodeBuilder
code more general, and less coupled to the rest of the template engine
code.
As we'll see, we also use nested CodeBuilders to make it possible to put
code at the beginning of the function even though we don't know what it
will be until we are nearly done.
A CodeBuilder object keeps a list of strings that will together be the
final Python code. The only other state it needs is the current
indentation level:
\begin{verbatim}
class CodeBuilder(object):
"""Build source code conveniently."""
def __init__(self, indent=0):
self.code = []
self.indent_level = indent
\end{verbatim}
CodeBuilder doesn't do much. Let's take a method-by-method look at the
interface and implementation.
\texttt{add\_line} adds a new line of code, which automatically indents
the text to the current indentation level, and supplies a newline:
\begin{verbatim}
def add_line(self, line):
"""Add a line of source to the code.
Indentation and newline will be added for you, don't provide them.
"""
self.code.extend([" " * self.indent_level, line, "\n"])
\end{verbatim}
\texttt{indent} and \texttt{dedent} increase or decrease the indentation
level:
\begin{verbatim}
INDENT_STEP = 4 # PEP8 says so!
def indent(self):
"""Increase the current indent for following lines."""
self.indent_level += self.INDENT_STEP
def dedent(self):
"""Decrease the current indent for following lines."""
self.indent_level -= self.INDENT_STEP
\end{verbatim}
\texttt{add\_section} is managed by another CodeBuilder object. This
lets us keep a reference to a place in the code, and add text to it
later. The \texttt{self.code} list is mostly a list of strings, but will
also hold references to these sections:
\begin{verbatim}
def add_section(self):
"""Add a section, a sub-CodeBuilder."""
section = CodeBuilder(self.indent_level)
self.code.append(section)
return section
\end{verbatim}
\texttt{\_\_str\_\_} produces a single string with all the code. This
simply joins together all the strings in \texttt{self.code}. Note that
because \texttt{self.code} can contain sections, this might call other
\texttt{CodeBuilder} objects recursively:
\begin{verbatim}
def __str__(self):
return "".join(str(c) for c in self.code)
\end{verbatim}
\texttt{get\_globals} yields the final values by executing the code.
This stringifies the object, executes it to get its definitions, and
returns the resulting values:
\begin{verbatim}
def get_globals(self):
"""Execute the code, and return a dict of globals it defines."""
# A check that the caller really finished all the blocks they started.
assert self.indent_level == 0
# Get the Python source as a single string.
python_source = str(self)
# Execute the source, defining globals, and return them.
global_namespace = {}
exec(python_source, global_namespace)
return global_namespace
\end{verbatim}
This last method uses some exotic features of Python. The \texttt{exec}
function executes a string containing Python code. The second argument
to \texttt{exec} is a dictionary that will collect up the globals
defined by the code. So for example, if we do this:
\begin{verbatim}
python_source = """\
SEVENTEEN = 17
def three():
return 3
"""
global_namespace = {}
exec(python_source, global_namespace)
\end{verbatim}
then \texttt{global\_namespace{[}'SEVENTEEN'{]}} is 17, and
\texttt{global\_namespace{[}'three'{]}} is an actual function named
\texttt{three}.
Although we only use CodeBuilder to produce one function, there's
nothing here that limits it to that use. This makes the class simpler to
implement, and easier to understand.
CodeBuilder lets us create a chunk of Python source code, and has no
specific knowledge about our template engine at all. We could use it in
such a way that three different functions would be defined in the
Python, and then \texttt{get\_globals} would return a dict of three
values, the three functions. As it happens, our template engine only
needs to define one function. But it's better software design to keep
that implementation detail in the template engine code, and out of our
CodeBuilder class.
Even as we're actually using it --- to define a single function ---
having \texttt{get\_globals} return the dictionary keeps the code more
modular because it doesn't need to know the name of the function we've
defined. Whatever function name we define in our Python source, we can
retrieve that name from the dict returned by \texttt{get\_globals}.
Now we can get into the implementation of the Templite class itself, and
see how CodeBuilder is used.
\aosasectii{The Templite class
implementation}\label{the-templite-class-implementation}
Most of our code is in the Templite class. As we've discussed, it has
two phases: compilation and rendering.
\aosasectiii{Compiling}\label{compiling}
All of the work to compile the template into a Python function happens
in the Templite constructor. First the contexts are saved away:
\begin{verbatim}
def __init__(self, text, *contexts):
"""Construct a Templite with the given `text`.
`contexts` are dictionaries of values to use for future renderings.
These are good for filters and global values.
"""
self.context = {}
for context in contexts:
self.context.update(context)
\end{verbatim}
Notice we used \texttt{*contexts} as the parameter. The asterisk denotes
that any number of positional arguments will be packed into a tuple and
passed in as \texttt{contexts}. This is called argument unpacking, and
means that the caller can provide a number of different context
dictionaries. Now any of these calls are valid:
\begin{verbatim}
t = Templite(template_text)
t = Templite(template_text, context1)
t = Templite(template_text, context1, context2)
\end{verbatim}
The context arguments (if any) are supplied to the constructor as a
tuple of contexts. We can then iterate over the \texttt{contexts} tuple,
dealing with each of them in turn. We simply create one combined
dictionary called \texttt{self.context} which has the contents of all of
the supplied contexts. If duplicate names are provided in the contexts,
the last one wins.
To make our compiled function as fast as possible, we extract context
variables into Python locals. We'll get those names by keeping a set of
variable names we encounter, but we also need to track the names of
variables defined in the template, the loop variables:
\begin{verbatim}
self.all_vars = set()
self.loop_vars = set()
\end{verbatim}
Later we'll see how these get used to help contruct the prologue of our
function. First, we'll use the CodeBuilder class we wrote earlier to
start to build our compiled function:
\begin{verbatim}
code = CodeBuilder()
code.add_line("def render_function(context, do_dots):")
code.indent()
vars_code = code.add_section()
code.add_line("result = []")
code.add_line("append_result = result.append")
code.add_line("extend_result = result.extend")
code.add_line("to_str = str")
\end{verbatim}
Here we construct our CodeBuilder object, and start writing lines into
it. Our Python function will be called \texttt{render\_function}, and
will take two arguments: \texttt{context} is the data dictionary it
should use, and \texttt{do\_dots} is a function implementing dot
attribute access.
The context here is the combination of the data context passed to the
Templite constructor, and the data context passed to the render
function. It's the complete set of data available to the template that
we made in the Templite constructor.
Notice that CodeBuilder is very simple: it doesn't ``know'' about
function definitions, just lines of code. This keeps CodeBuilder simple,
both in its implementation, and in its use. We can read our generated
code here without having to mentally interpolate too many specialized
CodeBuilder methods.
We create a section called \texttt{vars\_code}. Later we'll write the
variable extraction lines into that section. The \texttt{vars\_code}
object lets us save a place in the function that can be filled in later
when we have the information we need.
Then four fixed lines are written, defining a result list, shortcuts for
the methods to append to or extend that list, and a shortcut for the
\texttt{str()} builtin. As we discussed earlier, this odd step squeezes
just a little bit more performance out of our rendering function.
The reason we have both the \texttt{append} and the \texttt{extend}
shortcut is so we can use the most effective method, depending on
whether we have one line to add to our result, or more than one.
Next we define an inner function to help us with buffering output
strings:
\begin{verbatim}
buffered = []
def flush_output():
"""Force `buffered` to the code builder."""
if len(buffered) == 1:
code.add_line("append_result(%s)" % buffered[0])
elif len(buffered) > 1:
code.add_line("extend_result([%s])" % ", ".join(buffered))
del buffered[:]
\end{verbatim}
As we create chunks of output that need to go into our compiled
function, we need to turn them into function calls that append to our
result. We'd like to combine repeated append calls into one extend call.
This is another micro-optimization. To make this possible, we buffer the
chunks.
The \texttt{buffered} list holds strings that are yet to be written to
our function source code. As our template compilation proceeds, we'll
append strings to \texttt{buffered}, and flush them to the function
source when we reach control flow points, like if statements, or the
beginning or ends of loops.
The \texttt{flush\_output} function is a \emph{closure}, which is a
fancy word for a function that refers to variables outside of itself.
Here \texttt{flush\_output} refers to \texttt{buffered} and
\texttt{code}. This simplifies our calls to the function: we don't have
to tell \texttt{flush\_output} what buffer to flush, or where to flush
it; it knows all that implicitly.
If only one string has been buffered, then the \texttt{append\_result}
shortcut is used to append it to the result. If more than one is
buffered, then the \texttt{extend\_result} shortcut is used, with all of
them, to add them to the result. Then the buffered list is cleared so
more strings can be buffered.
The rest of the compiling code will add lines to the function by
appending them to \texttt{buffered}, and eventually call
\texttt{flush\_output} to write them to the CodeBuilder.
With this function in place, we can have a line of code in our compiler
like this:
\begin{verbatim}
buffered.append("'hello'")
\end{verbatim}
which will mean that our compiled Python function will have this line:
\begin{verbatim}
append_result('hello')
\end{verbatim}
which will add the string \texttt{hello} to the rendered output of the
template. We have multiple levels of abstraction here which can be
difficult to keep straight. The compiler uses
\texttt{buffered.append("'hello'")}, which creates
\texttt{append\_result('hello')} in the compiled Python function, which
when run, appends \texttt{hello} to the template result.
Back to our Templite class. As we parse control structures, we want to
check that they are properly nested. The \texttt{ops\_stack} list is a
stack of strings:
\begin{verbatim}
ops_stack = []
\end{verbatim}
When we encounter an \texttt{\{\% if .. \%\}} tag (for example), we'll
push \texttt{'if'} onto the stack. When we find an
\texttt{\{\% endif \%\}} tag, we can pop the stack and report an error
if there was no \texttt{'if'} at the top of the stack.
Now the real parsing begins. We split the template text into a number of
tokens using a regular expression, or \emph{regex}. Regexes can be
daunting: they are a very compact notation for complex pattern matching.
They are also very efficient, since the complexity of matching the
pattern is implemented in C in the regular expression engine, rather
than in your own Python code. Here's our regex:
\begin{verbatim}
tokens = re.split(r"(?s)({{.*?}}|{%.*?%}|{#.*?#})", text)
\end{verbatim}
This looks complicated; let's break it down.
The \texttt{re.split} function will split a string using a regex. Our
pattern is parenthesized, so the matches will be used to split the
string, and will also be returned as pieces in the split list. Our
pattern will match our tag syntaxes, but we've parenthesized it so that
the string will be split at the tags, and the tags will also be
returned.
The \texttt{(?s)} flag in the regex means that a dot should match even a
newline. Next we have our parenthesized group of three alternatives:
\texttt{\{\{.*?\}\}} matches an expression, \texttt{\{\%.*?\%\}} matches
a tag, and \texttt{\{\#.*?\#\}} matches a comment. In all of these, we
use \texttt{.*?} to match any number of characters, but the shortest
sequence that matches.
The result of \texttt{re.split} is a list of strings. For example, this
template text:
\begin{verbatim}
<p>Topics for {{name}}: {% for t in topics %}{{t}}, {% endfor %}</p>
\end{verbatim}
would be split into these pieces:
\begin{verbatim}
[
'<p>Topics for ', # literal
'{{name}}', # expression
': ', # literal
'{% for t in topics %}', # tag
'', # literal (empty)
'{{t}}', # expression
', ', # literal
'{% endfor %}', # tag
'</p>' # literal
]
\end{verbatim}
Once the text is split into tokens like this, we can loop over the
tokens, and deal with each in turn. By splitting them according to their
type, we can handle each type separately.
The compilation code is a loop over these tokens:
\begin{verbatim}
for token in tokens:
\end{verbatim}
Each token is examined to see which of the four cases it is. Just
looking at the first two characters is enough. The first case is a
comment, which is easy to handle: just ignore it and move on to the next
token:
\begin{verbatim}
if token.startswith('{#'):
# Comment: ignore it and move on.
continue
\end{verbatim}
For the case of \texttt{\{\{...\}\}} expressions, we cut off the two
braces at the front and back, strip off the white space, and pass the
entire expression to \texttt{\_expr\_code}:
\begin{verbatim}
elif token.startswith('{{'):
# An expression to evaluate.
expr = self._expr_code(token[2:-2].strip())
buffered.append("to_str(%s)" % expr)
\end{verbatim}
The \texttt{\_expr\_code} method will compile the template expression
into a Python expression. We'll see that function later. We use the
\texttt{to\_str} function to force the expression's value to be a
string, and add that to our result.
The third case is the big one: \texttt{\{\% ... \%\}} tags. These are
control structures that will become Python control structures. First we
have to flush our buffered output lines, then we extract a list of words
from the tag:
\begin{verbatim}
elif token.startswith('{%'):
# Action tag: split into words and parse further.
flush_output()
words = token[2:-2].strip().split()
\end{verbatim}
Now we have three sub-cases, based on the first word in the tag:
\texttt{if}, \texttt{for}, or \texttt{end}. The \texttt{if} case shows
our simple error handling and code generation:
\begin{verbatim}
if words[0] == 'if':
# An if statement: evaluate the expression to determine if.
if len(words) != 2:
self._syntax_error("Don't understand if", token)
ops_stack.append('if')
code.add_line("if %s:" % self._expr_code(words[1]))
code.indent()
\end{verbatim}
The \texttt{if} tag should have a single expression, so the
\texttt{words} list should have only two elements in it. If it doesn't,
we use the \texttt{\_syntax\_error} helper method to raise a syntax
error exception. We push \texttt{'if'} onto \texttt{ops\_stack} so that
we can check the \texttt{endif} tag. The expression part of the
\texttt{if} tag is compiled to a Python expression with
\texttt{\_expr\_code}, and is used as the conditional expression in a
Python \texttt{if} statement.
The second tag type is \texttt{for}, which will be compiled to a Python
\texttt{for} statement:
\begin{verbatim}
elif words[0] == 'for':
# A loop: iterate over expression result.
if len(words) != 4 or words[2] != 'in':
self._syntax_error("Don't understand for", token)
ops_stack.append('for')
self._variable(words[1], self.loop_vars)
code.add_line(
"for c_%s in %s:" % (
words[1],
self._expr_code(words[3])
)
)
code.indent()
\end{verbatim}
We do a check of the syntax and push \texttt{'for'} onto the stack. The
\texttt{\_variable} method checks the syntax of the variable, and adds
it to the set we provide. This is how we collect up the names of all the
variables during compilation. Later we'll need to write the prologue of
our function, where we'll unpack all the variable names we get from the
context. To do that correctly, we need to know the names of all the
variables we encountered, \texttt{self.all\_vars}, and the names of all
the variables defined by loops, \texttt{self.loop\_vars}.
We add one line to our function source, a \texttt{for} statement. All of
our template variables are turned into Python variables by prepending
\texttt{c\_} to them, so that we know they won't collide with other
names we're using in our Python function. We use \texttt{\_expr\_code}
to compile the iteration expression from the template into an iteration
expression in Python.
The last kind of tag we handle is an \texttt{end} tag; either
\texttt{\{\% endif \%\}} or \texttt{\{\% endfor \%\}}. The effect on our
compiled function source is the same: simply unindent to end the
\texttt{if} or \texttt{for} statement that was started earlier:
\begin{verbatim}
elif words[0].startswith('end'):
# Endsomething. Pop the ops stack.
if len(words) != 1:
self._syntax_error("Don't understand end", token)
end_what = words[0][3:]
if not ops_stack:
self._syntax_error("Too many ends", token)
start_what = ops_stack.pop()
if start_what != end_what:
self._syntax_error("Mismatched end tag", end_what)
code.dedent()
\end{verbatim}
Notice here that the actual work needed for the end tag is one line:
unindent the function source. The rest of this clause is all error
checking to make sure that the template is properly formed. This isn't
unusual in program translation code.
Speaking of error handling, if the tag isn't an \texttt{if}, a
\texttt{for}, or an \texttt{end}, then we don't know what it is, so
raise a syntax error: