-
Notifications
You must be signed in to change notification settings - Fork 6
/
sml97.txt
3191 lines (1595 loc) · 210 KB
/
sml97.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
The Definition of Standard ML
Revised 1996
August 13, 1996
Version: 7.4.0
Robin Milner, Mads Tofte, Robert Harper and David MacQueen
Prelude to the Revised Definition Standard ML is an industrial strength programming language, one of very few with a fully formal definition. The Definition of Standard ML was published in 1990. Since then the implementation technology of the language has advanced enormously, and its users have multiplied. The language and its Definition have therefore incited close scrutiny, evaluation, much approval, sometimes strong criticism.
The originators of the language have sifted this response, and found that there are inadequacies in the original language and its formal Definition. They are of three kinds: missing features which many users want; complex and little-used features which most users can do without; and mistakes of definition. What is remarkable is that these inadequacies are rather few, and that they are rather uncontroversial.
This new version of the Definition addresses the three kinds of inadequacy respectively by additions, subtractions and corrections. But we have only made such amendments when one or more aspects of SML - the language itself, its usage, its implementation, its formal Definition - have thus become simpler, without complicating the other aspects. It is worth noting that even the additions meet this criterion; for example we have introduced type abbreviations in signatures to simplify the use of the language, but the way we have done it has even simplified the Definition too. In fact, after our changes the formal Definition has fewer rules.
In the 1990 Definition it was predicted that further versions of the Definition would be produced as the language develops, with the intention to minimise the number of versions. This is the first revised version, and we foresee no others.
The shape of this new version of the Definition is as follows. The 1990 Definition, fully revised to deal with the amendments, has been sandwiched between this Prelude at the front, and a postlude (Appendix G) at the back which enumerates all the amendments which have been done, giving the rationale for each, and outlining the changes which it implies for the language and for the 1990 Definition.
Robin Milner Mads Tofte Robert Harper David MacQueen
July 1996
iii
Preface A precise description of a programming language is a prerequisite for its implementation and for its use. The description can take many forms, each suited to a different purpose. A common form is a reference manual, which is usually a careful narrative description of the meaning of each construction in the language, often backed up with a formal presentation of the grammar (for example, in Backus-Naur form). This gives the programmer enough understanding for many of his purposes. But it is ill-suited for use by an implementer, or by someone who wants to formulate laws for equivalence of programs, or by a programmer who wants to design programs with mathematical rigour.
This document is a formal description of both the grammar and the meaning of a language which is both designed for large projects and widely used. As such, it aims to serve the whole community of people seriously concerned with the language. At a time when it is increasingly understood that programs must withstand rigorous analysis, particular for systems where safety is critical, a rigorous language presentation is even important for negotiators and contractors; for a robust program written in an insecure language is like a house built upon sand.
Most people have not looked at a rigorous language presentation before. To help them particularly, but also to put the present work in perspective for those more theoretically prepared, it will be useful here to say something about three things: the nature of Standard ML, the task of language definition in general, and the form of the present Definition.
Standard ML Standard ML is a functional programming language, in the sense that the full power of mathematical functions is present. But it grew in response to a particular programming task, for which it was equipped also with full imperative power, and a sophisticated exception mechanism. It has an advanced form of parametric modules, aimed at organised development of large programs. Finally it is strongly typed, and it was the first language to provide a particular form of polymorphic type which makes the strong typing remarkably flexible. This combination of ingredients has not made it unduly large, but their novelty has been a fascinating challenge to semantic method (of which we say more below).
ML has evolved over twenty years as a fusion of many ideas from many people. This evolution is described in some detail in Appendix F of the book, where also we acknowledge all those who have contributed to it, both in design and in implementation.
`ML' stands for meta language; this is the term logicians use for a language in which other (formal or informal) languages are discussed and analysed. Originally ML was conceived as a medium for finding and performing proofs in a logical language. Conducting rigorous argument as dialogue between person and machine has been a strong research interest at Edinburgh and elsewhere, throughout these fourteen years. The difficulties are enormous, and make stern demands upon the programming language which is used for this dialogue. Those who are not familiar with computer-assisted reasoning may be surprised that a programming language, which was designed for this rather esoteric activity, should
iv
ever lay claim to being generally useful. On reflection, they should not be surprised. LISP is a prime example of a language invented for esoteric purposes and becoming widely used. LISP was invented for use in artificial intelligence (AI); the important thing about AI here is not that it is esoteric, but that it is difficult and varied; so much so, that anything which works well for it must work well for many other applications too.
The same can be said about the initial purpose of ML, but with a different emphasis. Rigorous proofs are complex things, which need varied and sophisticated presentation - particularly on the screen in interactive mode. Furthermore the proof methods, or strategies, involved are some of the most complex algorithms which we know. This all applies equally to AI, but one demand is made more strongly by proof than perhaps by any other application: the demand for rigour.
This demand established the character of ML. In order to be sure that, when the user and the computer claim to have together performed a rigorous argument, their claim is justified, it was seen that the language must be strongly typed. On the other hand, to be useful in a difficult application, the type system had to be rather flexible, and permit the machine to guide the user rather than impose a burden upon him. A reasonable solution was found, in which the machine helps the user significantly by inferring his types for him. Thereby the machine also confers complete reliability on his programs, in this sense: If a program claims that a certain result follows from the rules of reasoning which the user has supplied, then the claim may be fully trusted.
The principle of inferring useful structural information about programs is also represented, at the level of program modules, by the inference of signatures. Signatures describe the interfaces between modules, and are vital for robust large-scale programs. When the user combines modules, the signature discipline prevents him from mismatching their interfaces. By programming with interfaces and parametric modules, it becomes possible to focus on the structure of a large system, and to compile parts of it in isolation from one another - even when the system is incomplete.
This emphasis on types and signatures has had a profound effect on the language Definition. Over half this document is devoted to inferring types and signatures for programs. But the method used is exactly the same as for inferring what values a program delivers; indeed, a type or signature is the result of a kind of abstract evaluation of a program phrase.
In designing ML, the interplay among three activities - language design, definition and implementation - was extremely close. This was particularly true for the newest part, the parametric modules. This part of the language grew from an initial proposal by David MacQueen, itself highly developed; but both formal definition and implementation had a strong influence on the detailed design. In general, those who took part in the three activities cannot now imagine how they could have been properly done separately.
Language Definition Every programming language presents its own conceptual view of computation. This view is usually indicated by the names used for the phrase classes of the language, or by its
v
keywords: terms like package, module, structure, exception, channel, type, procedure, reference, sharing, . . . . These terms also have their abstract counterparts, which may be called semantic objects; these are what people really have in mind when they use the language, or discuss it, or think in it. Also, it is these objects, not the syntax, which represent the particular conceptual view of each language; they are the character of the language. Therefore a definition of the language must be in terms of these objects.
As is commonly done in programming language semantics, we shall loosely talk of these semantic objects as meanings. Of course, it is perfectly possible to understand the semantic theory of a language, and yet be unable to understand the meaning of a particular program, in the sense of its intention or purpose. The aim of a language definition is not to formalise everything which could possibly be called the meaning of a program, but to establish a theory of semantic objects upon which the understanding of particular programs may rest.
The job of a language-definer is twofold. First - as we have already suggested - he must create a world of meanings appropriate for the language, and must find a way of saying what these meanings precisely are. Here, he meets a problem; notation of some kind must be used to denote and describe these meanings - but not a programming language notation, unless he is passing the buck and defining one programming language in terms of another. Given a concern for rigour, mathematical notation is an obvious choice. Moreover, it is not enough just to write down mathematical definitions. The world of meanings only becomes meaningful if the objects possess nice properties, which make them tractable. So the language-definer really has to develop a small theory of his meanings, in the same way that a mathematician develops a theory. Typically, after initially defining some objects, the mathematician goes on to verify properties which indicate that they are objects worth studying. It is this part, a kind of scene-setting, which the language-definer shares with the mathematician. Of course he can take many objects and their theories directly from mathematics, such as functions, relations, trees, sequences, . . . . But he must also give some special theory for the objects which make his language particular, as we do for types, structures and signatures in this book; otherwise his language definition may be formal but will give no insight.
The second part of the definer's job is to define evaluation precisely. This means that he must define at least what meaning, M , results from evaluating any phrase P of his language (though he need not explain exactly how the meaning results; that is he need not give the full detail of every computation). This part of his job must be formal to some extent, if only because the phrases P of his language are indeed formal objects. But there is another reason for formality. The task is complex and error-prone, and therefore demands a high level of explicit organisation (which is, largely, the meaning of `formality'); moreover, it will be used to specify an equally complex, error-prone and formal construction: an implementation.
We shall now explain the keystone of our semantic method. First, we need a slight but important refinement. A phrase P is never evaluated in vacuo to a meaning M , but always against a background; this background - call it B - is itself a semantic object, being a distillation of the meanings preserved from evaluation of earlier phrases (typically variable
vi
declarations, procedure declarations, etc.). In fact evaluation is background-dependent - M depends upon B as well as upon P .
The keystone of the method, then, is a certain kind of assertion about evaluation; it takes the form
B ` P ) M
and may be pronounced: `Against the background B, the phrase P evaluates to the meaning M '. The formal purpose of this Definition is no more, and no less, than to decree exactly which assertions of this form are true. This could be achieved in many ways. We have chosen to do it in a structured way, as others have, by giving rules which allow assertions about a compound phrase P to be inferred from assertions about its constituent phrases P1; . . . ; Pn.
The form of the Definition We have written the Definition in a form suggested by the previous remarks. That is, we have defined our semantic objects in mathematical notation which is completely independent of Standard ML, and we have developed just enough of their theory to give sense to our rules of evaluation. Following another suggestion above, we have factored our task by describing abstract evaluation - the inference and checking of types and signatures (which can be done at compile-time) - completely separately from concrete evaluation. It really is a factorisation, because a full value in all its glory - you can think of it as a concrete object with a type attached - never has to be presented.
The resulting document is, we hope, valuable as the essential point of reference for Standard ML. If it is to play this role well, it must be supplemented by other literature. Many expository books have already been written, and this Definition will be useful as a background reference for their readers. We became convinced, while writing the 1990 Definition, that we could not discuss many questions without making it far too long. Such questions are: Why were certain design choices made? What are their implications for programming? Was there a good alternative meaning for some constructs, or was our hand forced? What different forms of phrase are equivalent? What is the proof of certain claims? Many of these questions are not answered by pedagogic texts either. We therefore wrote a Commentary on the 1990 Definition to assist people in reading it, and to serve as a bridge between the Definition and other texts. Though in part outdated by the present revision, the Commentary still fulfils its purpose.
There exist several textbooks on programming with Standard ML[44,43,55,49]. The second edition of Pauson's book[44] conforms with the present revision.
vii
Contents 1 Introduction 1 2 Syntax of the Core 3
2.1 Reserved Words : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 2.2 Special constants : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 2.3 Comments : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2.4 Identifiers : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2.5 Lexical analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.6 Infixed operators : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2.7 Derived Forms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 2.8 Grammar : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 2.9 Syntactic Restrictions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8
3 Syntax of Modules 11
3.1 Reserved Words : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11 3.2 Identifiers : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11 3.3 Infixed operators : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11 3.4 Grammar for Modules : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11 3.5 Syntactic Restrictions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12
4 Static Semantics for the Core 15
4.1 Simple Objects : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15 4.2 Compound Objects : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15 4.3 Projection, Injection and Modification : : : : : : : : : : : : : : : : : : : : 17 4.4 Types and Type functions : : : : : : : : : : : : : : : : : : : : : : : : : : : 17 4.5 Type Schemes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 18 4.6 Scope of Explicit Type Variables : : : : : : : : : : : : : : : : : : : : : : : 18 4.7 Non-expansive Expressions : : : : : : : : : : : : : : : : : : : : : : : : : : : 19 4.8 Closure : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 19 4.9 Type Structures and Type Environments : : : : : : : : : : : : : : : : : : : 20 4.10 Inference Rules : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 20 4.11 Further Restrictions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 27
5 Static Semantics for Modules 29
5.1 Semantic Objects : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 29 5.2 Type Realisation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 29 5.3 Signature Instantiation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30 5.4 Functor Signature Instantiation : : : : : : : : : : : : : : : : : : : : : : : : 30 5.5 Enrichment : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30 5.6 Signature Matching : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30 5.7 Inference Rules : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 31
viii
6 Dynamic Semantics for the Core 37
6.1 Reduced Syntax : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 37 6.2 Simple Objects : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 37 6.3 Compound Objects : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 37 6.4 Basic Values : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 38 6.5 Basic Exceptions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 39 6.6 Function Closures : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 39 6.7 Inference Rules : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 40
7 Dynamic Semantics for Modules 48
7.1 Reduced Syntax : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 48 7.2 Compound Objects : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 48 7.3 Inference Rules : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 49
8 Programs 54 A Appendix: Derived Forms 56 B Appendix: Full Grammar 61 C Appendix: The Initial Static Basis 66 D Appendix: The Initial Dynamic Basis 67 E Overloading 68
E.1 Overloaded special constants : : : : : : : : : : : : : : : : : : : : : : : : : : 68 E.2 Overloaded value identifiers : : : : : : : : : : : : : : : : : : : : : : : : : : 69
F Appendix: The Development of ML 70 G Appendix: What is New? 77
G.1 Type Abbreviations in Signatures : : : : : : : : : : : : : : : : : : : : : : : 77 G.2 Opaque Signature Matching : : : : : : : : : : : : : : : : : : : : : : : : : : 78 G.3 Sharing : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 79
G.3.1 Type Sharing : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 79 G.3.2 The equality attribute of specified types : : : : : : : : : : : : : : : 80 G.3.3 Structure Sharing : : : : : : : : : : : : : : : : : : : : : : : : : : : : 81 G.4 Value Polymorphism : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 82 G.5 Identifier Status : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 82 G.6 Replication of Datatypes : : : : : : : : : : : : : : : : : : : : : : : : : : : : 83 G.7 Local Datatypes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 84 G.8 Principal Environments : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 85 G.9 Consistency and Admissibility : : : : : : : : : : : : : : : : : : : : : : : : : 86 G.10 Special Constants : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 86 G.11 Comments : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 87
ix
G.12 Infixed Operators : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 87 G.13 Non-expansive Expressions : : : : : : : : : : : : : : : : : : : : : : : : : : : 87 G.14 Rebinding of built-in identifiers : : : : : : : : : : : : : : : : : : : : : : : : 87 G.15 Grammar for Modules : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 87 G.16 Closure Restrictions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 87 G.17 Specifications : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 87 G.18 Scope of Explicit Type Variables : : : : : : : : : : : : : : : : : : : : : : : 88 G.19 The Initial Basis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 88 G.20 Overloading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 88
References 89 Index 94
x
1 1 Introduction This document formally defines Standard ML.
To understand the method of definition, at least in broad terms, it helps to consider how an implementation of ML is naturally organised. ML is an interactive language, and a program consists of a sequence of top-level declarations; the execution of each declaration modifies the top-level environment, which we call a basis, and reports the modification to the user.
In the execution of a declaration there are three phases: parsing, elaboration, and evaluation. Parsing determines the grammatical form of a declaration. Elaboration, the static phase, determines whether it is well-typed and well-formed in other ways, and records relevant type or form information in the basis. Finally evaluation, the dynamic phase, determines the value of the declaration and records relevant value information in the basis. Corresponding to these phases, our formal definition divides into three parts: grammatical rules, elaboration rules, and evaluation rules. Furthermore, the basis is divided into the static basis and the dynamic basis; for example, a variable which has been declared is associated with a type in the static basis and with a value in the dynamic basis.
In an implementation, the basis need not be so divided. But for the purpose of formal definition, it eases presentation and understanding to keep the static and dynamic parts of the basis separate. This is further justified by programming experience. A large proportion of errors in ML programs are discovered during elaboration, and identified as errors of type or form, so it follows that it is useful to perform the elaboration phase separately. In fact, elaboration without evaluation is part of what is normally called compilation; once a declaration (or larger entity) is compiled one wishes to evaluate it - repeatedly - without re-elaboration, from which it follows that it is useful to perform the evaluation phase separately.
A further factoring of the formal definition is possible, because of the structure of the language. ML consists of a lower level called the Core language (or Core for short), a middle level concerned with programming-in-the-large called Modules, and a very small upper level called Programs. With the three phases described above, there is therefore a possibility of nine components in the complete language definition. We have allotted one section to each of these components, except that we have combined the parsing, elaboration and evaluation of Programs in one section. The scheme for the ensuing seven sections is therefore as follows:
Core Modules Programs Syntax Section 2 Section 3 Static Semantics Section 4 Section 5 Section 8 Dynamic Semantics Section 6 Section 7
The Core provides many phrase classes, for programming convenience. But about half of these classes are derived forms, whose meaning can be given by translation into the other half which we call the Bare language. Thus each of the three parts for the
2 1 INTRODUCTION Core treats only the bare language; the derived forms are treated in Appendix A. This appendix also contains a few derived forms for Modules. A full grammar for the language is presented in Appendix B.
In Appendices C and D the initial basis is detailed. This basis, divided into its static and dynamic parts, contains the static and dynamic meanings of a small set of predefined identifiers. A richer basis is defined in a separate document[17].
The semantics is presented in a form known as Natural Semantics. It consists of a set of rules allowing sentences of the form
A ` phrase ) A0 to be inferred, where A is often a basis (static or dynamic) and A0 a semantic object - often a type in the static semantics and a value in the dynamic semantics. One should read such a sentence as follows: "against the background provided by A, the phrase phrase elaborates - or evaluates - to the object A0". Although the rules themselves are formal the semantic objects, particularly the static ones, are the subject of a mathematical theory which is presented in a succinct form in the relevant sections.
The robustness of the semantics depends upon theorems. Usually these have been proven, but the proof is not included.
3 2 Syntax of the Core 2.1 Reserved Words The following are the reserved words used in the Core. They may not (except = ) be used as identifiers.
abstype and andalso as case datatype do else end exception fn fun handle if in infix infixr let local nonfix of op open orelse raise rec then type val with withtype while ( ) [ ] - "" , : ; ... . -- = =? -? #
2.2 Special constants An integer constant (in decimal notation) is an optional negation symbol (~) followed by a non-empty sequence of decimal digits (0-9). An integer constant (in hexadecimal notation) is an optional negation symbol followed by 0x followed by a non-empty sequence of hexadecimal digits (0-9a-fA-F, where A-F are alternatives for a-f, respectively).
A word constant (in decimal notation) is 0w followed by a non-empty sequence of decimal digits. A word constant (in hexadecimal notation) is 0wx followed by a non-empty sequence of hexadecimal digits. A real constant is an integer constant in decimal notation, possibly followed by a point (.) and one or more decimal digits, possibly followed by an exponent symbol (E or e) and an integer constant in decimal notation; at least one of the optional parts must occur, hence no integer constant is a real constant. Examples: 0.7 3.32E5 3E~7 . Non-examples: 23 .3 4.E5 1E2.0 .
We assume an underlying alphabet of N characters (N * 256), numbered 0 to N \Gamma 1, which agrees with the ASCII character set on the characters numbered 0 to 127. The interval [0; N \Gamma 1] is called the ordinal range of the alphabet. A string constant is a sequence, between quotes ("), of zero or more printable characters (i.e., numbered 33- 126), spaces or escape sequences. Each escape sequence starts with the escape character " , and stands for a character sequence. The escape sequences are:
"a A single character interpreted by the system as alert (ASCII 7) "b Backspace (ASCII 8) "t Horizontal tab (ASCII 9) "n Linefeed, also known as newline (ASCII 10) "v Vertical tab (ASCII 11) "f Form feed (ASCII 12) "r Carriage return (ASCII 13) "^c The control character c, where c may be any character with number
64-95. The number of "^c is 64 less than the number of c. "ddd The single character with number ddd (3 decimal digits denoting
an integer in the ordinal range of the alphabet).
4 2 SYNTAX OF THE CORE
"uxxxx The single character with number xxxx (4 hexadecimal digits de
noting an integer in the ordinal range of the alphabet). "" " "" " "f \Delta \Delta f " This sequence is ignored, where f \Delta \Delta f stands for a sequence of one
or more formatting characters.
The formatting characters are a subset of the non-printable characters including at least space, tab, newline, formfeed. The last form allows long strings to be written on more than one line, by writing " at the end of one line and at the start of the next.
A character constant is a sequence of the form #s, where s is a string constant denoting a string of size one character.
Libraries may provide multiple numeric types and multiple string types. To each string type corresponds an alphabet with ordinal range [0; N \Gamma 1] for some N * 256; each alphabet must agree with the ASCII character set on the characters numbered 0 to 127. When multiple alphabets are supported, all characters of a given string constant are interpreted over the same alphabet. For each special constant, overloading resolution is used for determining the type of the constant (see Appendix E).
We denote by SCon the class of special constants, i.e., the integer, real, word, character and string constants; we shall use scon to range over SCon.
2.3 Comments A comment is any character sequence within comment brackets (* *) in which comment brackets are properly nested. No space is allowed between the two characters which make up a comment bracket (* or *). An unmatched (* should be detected by the compiler.
2.4 Identifiers The classes of identifiers for the Core are shown in Figure 1. We use vid , tyvar to range over VId, TyVar etc. For each class X marked "long" there is a class longX of long identifiers; if x ranges over X then longx ranges over longX. The syntax of these long identifiers is given by the following:
longx ::= x identifier
strid 1:\Delta \Delta \Delta :strid n:x qualified identifier (n * 1)
VId (value identifiers ) long TyVar (type variables ) TyCon (type constructors ) long Lab (record labels ) StrId (structure identifiers ) long
Figure 1: Identifiers
2.5 Lexical analysis 5 The qualified identifiers constitute a link between the Core and the Modules. Throughout this document, the term "identifier", occurring without an adjective, refers to nonqualified identifiers only.
An identifier is either alphanumeric: any sequence of letters, digits, primes (') and underbars ( ) starting with a letter or prime, or symbolic: any non-empty sequence of the following symbols
! % & $ # + - / : ! = ? ? @ " ~ ` ^ -- * In either case, however, reserved words are excluded. This means that for example # and -- are not identifiers, but ## and --=-- are identifiers. The only exception to this rule is that the symbol = , which is a reserved word, is also allowed as an identifier to stand for the equality predicate. The identifier = may not be re-bound; this precludes any syntactic ambiguity.
A type variable tyvar may be any alphanumeric identifier starting with a prime; the subclass EtyVar of TyVar, the equality type variables, consists of those which start with two or more primes. The other four classes (VId, TyCon, Lab and StrId) are represented by identifiers not starting with a prime. However, * is excluded from TyCon, to avoid confusion with the derived form of tuple type (see Figure 23). The class Lab is extended to include the numeric labels 1 2 3 \Delta \Delta \Delta , i.e. any numeral not starting with 0.
TyVar is therefore disjoint from the other four classes. Otherwise, the syntax class of an occurrence of identifier id in a Core phrase (ignoring derived forms, Section 2.7) is determined thus:
1. Immediately before "." - i.e. in a long identifier - or in an open declaration, id is
a structure identifier. The following rules assume that all occurrences of structure identifiers have been removed.
2. At the start of a component in a record type, record pattern or record expression,
id is a record label.
3. Elsewhere in types id is a type constructor, and must be within the scope of the
type binding or datatype binding which introduced it.
4. Elsewhere, id is a value identifier.
By means of the above rules a compiler can determine the class to which each identifier occurrence belongs; for the remainder of this document we shall therefore assume that the classes are all disjoint.
2.5 Lexical analysis Each item of lexical analysis is either a reserved word, a numeric label, a special constant or a long identifier. Comments and formatting characters separate items (except within string constants; see Section 2.2) and are otherwise ignored. At each stage the longest next item is taken.
6 2 SYNTAX OF THE CORE 2.6 Infixed operators An identifier may be given infix status by the infix or infixr directive, which may occur as a declaration; this status only pertains to its use as a vid within the scope (see below) of the directive, and in these uses it is called an infixed operator. (Note that qualified identifiers never have infix status.) If vid has infix status, then "exp1 vid exp2" (resp. "pat1 vid pat2") may occur - in parentheses if necessary - wherever the application "vid -1=exp1,2=exp2""" or its derived form "vid (exp1,exp2)" (resp "vid (pat1,pat2)") would otherwise occur. On the other hand, an occurrence of any long identifier (qualified or not) prefixed by op is treated as non-infixed. The only required use of op is in prefixing a non-infixed occurrence of an identifier vid which has infix status; elsewhere op, where permitted, has no effect. Infix status is cancelled by the nonfix directive. We refer to the three directives collectively as fixity directives.
The form of the fixity directives is as follows (n * 1):
infix hdi vid 1 \Delta \Delta \Delta vid n infixr hdi vid 1 \Delta \Delta \Delta vid n
nonfix vid 1 \Delta \Delta \Delta vid n where hdi is an optional decimal digit d indicating binding precedence. A higher value of d indicates tighter binding; the default is 0. infix and infixr dictate left and right associativity respectively. In an expression of the form exp1 vid 1 exp2 vid 2 exp3, where vid 1 and vid 2 are infixed operators with the same precedence, either both must associate to the left or both must associate to the right. For example, suppose that !! and ?? have equal precedence, but associate to the left and right respectively; then
x !! y !! z parses as (x !! y) !! z x ?? y ?? z parses as x ?? (y ?? z) x !! y ?? z is illegal x ?? y !! z is illegal
The precedence of infixed operators relative to other expression and pattern constructions is given in Appendix B.
The scope of a fixity directive dir is the ensuing program text, except that if dir occurs in a declaration dec in either of the phrases
let dec in \Delta \Delta \Delta end local dec in \Delta \Delta \Delta end then the scope of dir does not extend beyond the phrase. Further scope limitations are imposed for Modules.
These directives and op are omitted from the semantic rules, since they affect only parsing.
2.7 Derived Forms 7
AtExp atomic expressions ExpRow expression rows Exp expressions Match matches Mrule match rules
Dec declarations ValBind value bindings TypBind type bindings DatBind datatype bindings ConBind constructor bindings ExBind exception bindings
AtPat atomic patterns PatRow pattern rows Pat patterns
Ty type expressions TyRow type-expression rows
Figure 2: Core Phrase Classes
2.7 Derived Forms There are many standard syntactic forms in ML whose meaning can be expressed in terms of a smaller number of syntactic forms, called the bare language. These derived forms, and their equivalent forms in the bare language, are given in Appendix A.
2.8 Grammar The phrase classes for the Core are shown in Figure 2. We use the variable atexp to range over AtExp, etc. The grammatical rules for the Core are shown in Figures 3 and 4.
The following conventions are adopted in presenting the grammatical rules, and in their interpretation:
ffl The brackets h i enclose optional phrases. ffl For any syntax class X (over which x ranges) we define the syntax class Xseq (over
which xseq ranges) as follows:
xseq ::= x (singleton sequence)
(empty sequence) (x1,\Delta \Delta \Delta ,xn) (sequence, n * 1)
(Note that the "\Delta \Delta \Delta " used here, meaning syntactic iteration, must not be confused with "..." which is a reserved word of the language.)
8 2 SYNTAX OF THE CORE
ffl Alternative forms for each phrase class are in order of decreasing precedence; this
resolves ambiguity in parsing, as explained in Appendix B.
ffl L (resp. R) means left (resp. right) association. ffl The syntax of types binds more tightly than that of expressions. ffl Each iterated construct (e.g. match, \Delta \Delta \Delta ) extends as far right as possible; thus,
parentheses may be needed around an expression which terminates with a match, e.g. "fn match", if this occurs within a larger match.
atpat ::= wildcard
scon special constant hopilongvid value identifier - hpatrow i "" record ( pat )
patrow ::= ... wildcard
lab = pat h , patrowi pattern row
pat ::= atpat atomic
hopilongvid atpat constructed pattern pat1 vid pat2 infixed value construction pat : ty typed hopividh: tyi as pat layered
ty ::= tyvar type variable
- htyrow i "" record type expression tyseq longtycon type construction ty -? ty0 function type expression (R) ( ty )
tyrow ::= lab : ty h , tyrowi type-expression row
Figure 3: Grammar: Patterns and Type expressions
2.9 Syntactic Restrictions
ffl No expression row, pattern row or type-expression row may bind the same lab twice. ffl No binding valbind , typbind , datbind or exbind may bind the same identifier twice;
this applies also to value identifiers within a datbind .
ffl In the left side tyvarseq tycon of any typbind or datbind , tyvarseq must not contain
the same tyvar twice. Any tyvar occurring within the right side must occur in tyvarseq .
2.9 Syntactic Restrictions 9
ffl For each dec of the form datatype tyvarseq tycon = datatype tyvarseq 0 longtycon , the
sequences tyvarseq and tyvarseq0 must be equal and neither may contain the same type variable twice.
ffl For each value binding pat = exp within rec, exp must be of the form fn match,
possibly constrained by one or more type expressions. The derived form of functionvalue binding given in Appendix A, page 58, necessarily obeys this restriction.
ffl No datbind or exbind may bind true, false, it, nil, :: or ref.
10 2 SYNTAX OF THE CORE
atexp ::= scon special constant
hopilongvid value identifier - hexprow i "" record let dec in exp end local declaration ( exp )
exprow ::= lab = exp h , exprowi expression row exp ::= atexp atomic
exp atexp application (L) exp1 vid exp2 infixed application exp : ty typed (L) exp handle match handle exception raise exp raise exception fn match function
match ::= mrule h -- matchi mrule ::= pat =? exp dec ::= val tyvarseq valbind value declaration
type typbind type declaration datatype datbind datatype declaration datatype tyvarseq tycon =
datatype tyvarseq longtycon datatype replication abstype datbind with dec end abstype declaration exception exbind exception declaration local dec1 in dec2 end local declaration open longstrid 1 \Delta \Delta \Delta longstrid n open declaration (n * 1)
empty declaration dec1 h;i dec2 sequential declaration infix hdi vid 1 \Delta \Delta \Delta vid n infix (L) directive infixr hdi vid 1 \Delta \Delta \Delta vid n infix (R) directive nonfix vid 1 \Delta \Delta \Delta vid n nonfix directive
valbind ::= pat = exp hand valbind i
rec valbind
typbind ::= tyvarseq tycon = ty hand typbind i datbind ::= tyvarseq tycon = conbind hand datbind i conbind ::= hopivid hof tyi h -- conbindi exbind ::= hopivid hof tyi hand exbind i
hopivid = hopilongvid hand exbind i
Figure 4: Grammar: Expressions, Matches, Declarations and Bindings
11 3 Syntax of Modules For Modules there are further reserved words, identifier classes and derived forms. There are no further special constants; comments and lexical analysis are as for the Core. The derived forms for modules appear in Appendix A.
3.1 Reserved Words The following are the additional reserved words used in Modules.
eqtype functor include sharing sig signature struct structure where :?
3.2 Identifiers The additional identifier classes for Modules are SigId (signature identifiers) and FunId (functor identifiers); they may be either alphanumeric - not starting with a prime - or symbolic. The class of each identifier occurrence is determined by the grammatical rules which follow. Henceforth, therefore, we consider all identifier classes to be disjoint.
3.3 Infixed operators In addition to the scope rules for fixity directives given for the Core syntax, there is a further scope limitation: if dir occurs in a structure-level declaration strdec in any of the phrases
let strdec in \Delta \Delta \Delta end
local strdec in \Delta \Delta \Delta end
struct strdec end then the scope of dir does not extend beyond the phrase.
One effect of this limitation is that fixity is local to a basic structure expression -- in particular, to such an expression occurring as a functor body.
3.4 Grammar for Modules The phrase classes for Modules are shown in Figure 5. We use the variable strexp to range over StrExp, etc. The conventions adopted in presenting the grammatical rules for Modules are the same as for the Core. The grammatical rules are shown in Figures 6, 7 and 8.
12 3 SYNTAX OF MODULES
StrExp structure expressions StrDec structure-level declarations StrBind structure bindings
SigExp signature expressions SigDec signature declarations SigBind signature bindings
Spec specifications ValDesc value descriptions TypDesc type descriptions DatDesc datatype descriptions ConDesc constructor descriptions ExDesc exception descriptions StrDesc structure descriptions
FunDec functor declarations FunBind functor bindings TopDec top-level declarations
Figure 5: Modules Phrase Classes
3.5 Syntactic Restrictions
ffl No binding strbind , sigbind , or funbind may bind the same identifier twice.
ffl No description valdesc , typdesc, datdesc, exdesc or strdesc may describe the same
identifier twice; this applies also to value identifiers within a datdesc.
ffl No tyvarseq may contain the same tyvar twice. ffl For each spec of the form datatype tyvarseq tycon = datatype tyvarseq 0 longtycon ,
the sequences tyvarseq and tyvarseq0 must be equal.
ffl Any tyvar occurring on the right side of a datdesc of the form
tyvarseq tycon = \Delta \Delta \Delta must occur in the tyvarseq; similarly, in signature expressions of the form sigexp where type tyvarseq longtycon = ty, any tyvar occurring in ty must occur in tyvarseq.
ffl No datdesc or exdesc may describe true, false, it, nil, :: or ref.
3.5 Syntactic Restrictions 13
strexp ::= struct strdec end basic
longstrid structure identifier strexp:sigexp transparent constraint strexp:?sigexp opaque constraint funid ( strexp ) functor application let strdec in strexp end local declaration
strdec ::= dec declaration
structure strbind structure local strdec1 in strdec2 end local
empty strdec1 h;i strdec2 sequential
strbind ::= strid = strexp hand strbind i sigexp ::= sig spec end basic
sigid signature identifier sigexp where type tyvarseq longtycon = ty type realisation
sigdec ::= signature sigbind sigbind ::= sigid = sigexp hand sigbind i
Figure 6: Grammar: Structure and Signature Expressions
14 3 SYNTAX OF MODULES
spec ::= val valdesc value
type typdesc type eqtype typdesc eqtype datatype datdesc datatype datatype tyvarseq tycon =
datatype tyvarseq longtycon replication exception exdesc exception structure strdesc structure include sigexp include
empty spec1 h;i spec2 sequential spec sharing type longtycon 1 = \Delta \Delta \Delta = longtycon n sharing
(n * 2)
valdesc ::= vid : ty hand valdesc i typdesc ::= tyvarseq tycon hand typdesci datdesc ::= tyvarseq tycon = condesc hand datdesci condesc ::= vid hof tyi h -- condesci exdesc ::= vid hof tyi hand exdesc i strdesc ::= strid : sigexp hand strdesci
Figure 7: Grammar: Specifications
fundec ::= functor funbind funbind ::= funid ( strid : sigexp ) = strexp functor binding
hand funbind i
topdec ::= strdec htopdeci structure-level declaration
sigdec htopdeci signature declaration fundec htopdeci functor declaration
Note: No topdec may contain, as an initial segment, a strdec followed by a semicolon.
Figure 8: Grammar: Functors and Top-level Declarations
15 4 Static Semantics for the Core Our first task in presenting the semantics - whether for Core or Modules, static or dynamic - is to define the objects concerned. In addition to the class of syntactic objects, which we have already defined, there are classes of so-called semantic objects used to describe the meaning of the syntactic objects. Some classes contain simple semantic objects; such objects are usually identifiers or names of some kind. Other classes contain compound semantic objects, such as types or environments, which are constructed from component objects.
4.1 Simple Objects All semantic objects in the static semantics of the entire language are built from identifiers and two further kinds of simple objects: type constructor names and identifier status descriptors. Type constructor names are the values taken by type constructors; we shall usually refer to them briefly as type names, but they are to be clearly distinguished from type variables and type constructors. The simple object classes, and the variables ranging over them, are shown in Figure 9. We have included TyVar in the table to make visible the use of ff in the semantics to range over TyVar.
ff or tyvar 2 TyVar type variables
t 2 TyName type names is 2 IdStatus = fc; e; vg identifier status descriptors
Figure 9: Simple Semantic Objects Each ff 2 TyVar possesses a boolean equality attribute, which determines whether or not it admits equality, i.e. whether it is a member of EtyVar (defined on page 5).
Each t 2 TyName has an arity k * 0, and also possesses an equality attribute. We denote the class of type names with arity k by TyName(k).
With each special constant scon we associate a type name type(scon) which is either int, real, word, char or string as indicated by Section 2.2. (However, see Appendix E concerning types of overloaded special constants.)
4.2 Compound Objects When A and B are sets Fin A denotes the set of finite subsets of A, and A fin! B denotes the set of finite maps (partial functions with finite domain) from A to B. The domain and range of a finite map, f , are denoted Dom f and Ran f . A finite map will often be written explicitly in the form fa1 7! b1; \Delta \Delta \Delta ; ak 7! bkg; k * 0; in particular the empty map is fg. We shall use the form fx 7! e ; OEg - a form of set comprehension - to stand for
16 4 STATIC SEMANTICS FOR THE CORE the finite map f whose domain is the set of values x which satisfy the condition OE, and whose value on this domain is given by f (x) = e.
When f and g are finite maps the map f + g, called f modified by g, is the finite map with domain Dom f [ Dom g and values
(f + g)(a) = if a 2 Dom g then g(a) else f (a). The compound objects for the static semantics of the Core Language are shown in Figure 10. We take [ to mean disjoint union over semantic object classes. We also understand all the defined object classes to be disjoint.
o/ 2 Type = TyVar [ RowType [ FunType [ ConsType (o/1; \Delta \Delta \Delta ; o/k) or o/ (k) 2 Typek (ff1; \Delta \Delta \Delta ; ffk) or ff(k) 2 TyVark
% 2 RowType = Lab fin! Type o/ ! o/ 0 2 FunType = Type \Theta Type
ConsType = [k*0ConsType(k) o/ (k)t 2 ConsType(k) = Typek \Theta TyName(k) ` or \Lambda ff(k):o/ 2 TypeFcn = [k*0TyVark \Theta Type
oe or 8ff(k):o/ 2 TypeScheme = [k*0TyVark \Theta Type
(`; VE) 2 TyStr = TypeFcn \Theta ValEnv
SE 2 StrEnv = StrId fin! Env TE 2 TyEnv = TyCon fin! TyStr VE 2 ValEnv = VId fin! TypeScheme \Theta IdStatus E or (SE; TE ; VE ) 2 Env = StrEnv \Theta TyEnv \Theta ValEnv
T 2 TyNameSet = Fin(TyName) U 2 TyVarSet = Fin(TyVar) C or T; U; E 2 Context = TyNameSet \Theta TyVarSet \Theta Env
Figure 10: Compound Semantic Objects Note that \Lambda and 8 bind type variables. For any semantic object A, tynames A and tyvars A denote respectively the set of type names and the set of type variables occurring free in A.
Also note that a value environment maps value identifiers to a pair of a type scheme and an identifier status. If VE(vid ) = (oe; is), we say that vid has status is in VE. An occurrence of a value identifier which is elaborated in VE is referred to as a value variable, a value constructor or an exception constructor, depending on whether its status in VE is v, c or e, respectively.
4.3 Projection, Injection and Modification 17 4.3 Projection, Injection and Modification Projection: We often need to select components of tuples - for example, the valueenvironment component of a context. In such cases we rely on metavariable names to indicate which component is selected. For instance "VE of E" means "the value-environment component of E".
Moreover, when a tuple contains a finite map we shall "apply" the tuple to an argument, relying on the syntactic class of the argument to determine the relevant function. For instance C(tycon) means (TE of C)tycon and C(vid ) means (VE of (E of C))(vid ).
Finally, environments may be applied to long identifiers. For instance if longvid = strid 1:\Delta \Delta \Delta :strid k:vid then E(longvid ) means
(VE of (SE of \Delta \Delta \Delta (SE of (SE of E)strid1)strid 2\Delta \Delta \Delta )stridk)vid :
Injection: Components may be injected into tuple classes; for example, "VE in Env" means the environment (fg; fg; VE).
Modification: The modification of one map f by another map g, written f + g, has already been mentioned. It is commonly used for environment modification, for example E + E0. Often, empty components will be left implicit in a modification; for example E + VE means E + (fg; fg; VE). For set components, modification means union, so that C + (T; VE) means
( (T of C) [ T; U of C; (E of C) + VE )
Finally, we frequently need to modify a context C by an environment E (or a type environment TE say), at the same time extending T of C to include the type names of E (or of TE say). We therefore define C \Phi TE, for example, to mean C + (tynames TE; TE).
4.4 Types and Type functions A type o/ is an equality type, or admits equality, if it is of one of the forms
ffl ff, where ff admits equality; ffl flab1 7! o/1; \Delta \Delta \Delta ; labn 7! o/ng, where each o/i admits equality; ffl o/ (k)t, where t and all members of o/ (k) admit equality; ffl (o/ 0)ref. A type function ` = \Lambda ff(k):o/ has arity k; it must be closed - i.e. tyvars(o/ ) ` ff(k) - and the bound variables must be distinct. Two type functions are considered equal if they only differ in their choice of bound variables (alpha-conversion). In particular, the equality attribute has no significance in a bound variable of a type function; for example, \Lambda ff:ff ! ff and \Lambda fi:fi ! fi are equal type functions even if ff admits equality but fi does not. If t has arity k, then we write t to mean \Lambda ff(k):ff(k)t (eta-conversion); thus
18 4 STATIC SEMANTICS FOR THE CORE TyName ` TypeFcn. ` = \Lambda ff(k):o/ is an equality type function, or admits equality, if when the type variables ff(k) are chosen to admit equality then o/ also admits equality.
We write the application of a type function ` to a vector o/ (k) of types as o/ (k)`. If ` = \Lambda ff(k):o/ we set o/ (k)` = o/ fo/ (k)=ff(k)g (beta-conversion).
We write o/ f`(k)=t(k)g for the result of substituting type functions `(k) for type names t(k) in o/ . We assume that all beta-conversions are carried out after substitution, so that for example
(o/ (k)t)f\Lambda ff(k):o/ =tg = o/ fo/ (k)=ff(k)g:
4.5 Type Schemes A type scheme oe = 8ff(k):o/ generalises a type o/ 0, written oe O/ o/ 0, if o/ 0 = o/ fo/ (k)=ff(k)g for some o/ (k), where each member o/i of o/ (k) admits equality if ffi does. If oe0 = 8fi(l):o/ 0 then oe generalises oe0, written oe O/ oe0, if oe O/ o/ 0 and fi(l) contains no free type variable of oe. It can be shown that oe O/ oe0 iff, for all o/ 00, whenever oe0 O/ o/ 00 then also oe O/ o/ 00.
Two type schemes oe and oe0 are considered equal if they can be obtained from each other by renaming and reordering of bound type variables, and deleting type variables from the prefix which do not occur in the body. Here, in contrast to the case for type functions, the equality attribute must be preserved in renaming; for example 8ff:ff ! ff and 8fi:fi ! fi are only equal if either both ff and fi admit equality, or neither does. It can be shown that oe = oe0 iff oe O/ oe0 and oe0 O/ oe.
We consider a type o/ to be a type scheme, identifying it with 8():o/ .
4.6 Scope of Explicit Type Variables In the Core language, a type or datatype binding can explicitly introduce type variables whose scope is that binding. Moreover, in a value declaration val tyvarseq valbind , the sequence tyvarseq binds type variables: a type variable occurs free in val tyvarseq valbind iff it occurs free in valbind and is not in the sequence tyvarseq. However, explicit binding of type variables at val is optional, so we still have to account for the scope of an explicit type variable occurring in the ": ty" of a typed expression or pattern or in the "of ty" of an exception binding. For the rest of this section, we consider such free occurrences of type variables only.
Every occurrence of a value declaration is said to scope a set of explicit type variables determined as follows.
First, a free occurrence of ff in a value declaration val tyvarseq valbind is said to be unguarded if the occurrence is not part of a smaller value declaration within valbind . In this case we say that ff occurs unguarded in the value declaration.
Then we say that ff is implicitly scoped at a particular value declaration val tyvarseq valbind in a program if (1) ff occurs unguarded in this value declaration, and (2) ff does not occur unguarded in any larger value declaration containing the given one.
4.7 Non-expansive Expressions 19
Henceforth, we assume that for every value declaration val tyvarseq \Delta \Delta \Delta occurring in the program, every explicit type variable implicitly scoped at the val has been added to tyvarseq . Thus for example, in the two declarations
val x = let val id:'a-?'a = fn z=?z in id id end val x = (let val id:'a-?'a = fn z=?z in id id end; fn z=?z:'a)
the type variable 'a is scoped differently; they become respectively
val x = let val 'a id:'a-?'a = fn z=?z in id id end val 'a x = (let val id:'a-?'a = fn z=?z in id id end; fn z=?z:'a)
Then, according to the inference rules in Section 4.10 the first example can be elaborated, but the second cannot since 'a is bound at the outer value declaration leaving no possibility of two different instantiations of the type of id in the application id id.
4.7 Non-expansive Expressions In order to treat polymorphic references and exceptions, the set Exp of expressions is partitioned into two classes, the expansive and the non-expansive expressions. An expression is non-expansive in context C if, after replacing infixed forms by their equivalent prefixed forms, and derived forms by their equivalent forms, it can be generated by the following grammar from the non-terminal nexp:
nexp ::= scon
hopilongvid -hnexprow i"" (nexp) conexp nexp nexp:ty fn match nexprow ::= lab = nexph, nexprow i
conexp ::= (conexph:tyi)
hopilongvid Restriction: longvid 6= ref and
is of C(longvid ) 2 fc; eg
All other expressions are said to be expansive (in C). The idea is that the dynamic evaluation of a non-expansive expression will neither generate an exception nor extend the domain of the memory, while the evaluation of an expansive expression might.
4.8 Closure Let o/ be a type and A a semantic object. Then ClosA(o/ ), the closure of o/ with respect to A, is the type scheme 8ff(k):o/ , where ff(k) = tyvars(o/ ) n tyvars A. Commonly, A will be a context C. We abbreviate the total closure Closfg(o/ ) to Clos(o/ ). If the range of a value environment VE contains only types (rather than arbitrary type schemes) we set
ClosAVE = fvid 7! (ClosA(o/ ); is) ; VE(vid ) = (o/; is)g
20 4 STATIC SEMANTICS FOR THE CORE
Closing a value environment VE that stems from the elaboration of a value binding valbind requires extra care to ensure type security of references and exceptions and correct scoping of explicit type variables. Recall that valbind is not allowed to bind the same variable twice. Thus, for each vid 2 Dom VE there is a unique pat = exp in valbind which binds vid . If VE(vid ) = (o/; is), let ClosC;valbind VE(vid ) = (8ff(k):o/; is), where
ff(k) = ( tyvars o/ n tyvars C; if exp is non-expansive in C;(); if exp is expansive in C.
4.9 Type Structures and Type Environments A type structure (`; VE) is well-formed if either VE = fg, or ` is a type name t. (The latter case arises, with VE 6= fg, in datatype declarations.) An object or assembly A of semantic objects is well-formed if every type structure occurring in A is well-formed. All type structures occurring in elaborations are required to be well-formed.
A type structure (t; VE) is said to respect equality if, whenever t admits equality, then either t = ref (see Appendix C) or, for each VE(vid ) of the form (8ff(k):(o/ ! ff(k)t); is), the type function \Lambda ff(k):o/ also admits equality. (This ensures that the equality predicate = will be applicable to a constructed value (vid ; v) of type o/ (k)t only when it is applicable to the value v itself, whose type is o/ fo/ (k)=ff(k)g.) A type environment TE respects equality if all its type structures do so.
Let TE be a type environment, and let T be the set of type names t such that (t; VE) occurs in TE for some VE 6= fg. Then TE is said to maximise equality if (a) TE respects equality, and also (b) if any larger subset of T were to admit equality (without any change in the equality attribute of any type names not in T ) then TE would cease to respect equality.
For any TE of the form
TE = ftyconi 7! (ti; VEi) ; 1 ^ i ^ kg; where no VEi is the empty map, and for any E we define Abs(TE; E) to be the environment obtained from E and TE as follows. First, let Abs(TE) be the type environment ftyconi 7! (ti; fg) ; 1 ^ i ^ kg in which all value environments VEi have been replaced by the empty map. Let t01; \Delta \Delta \Delta ; t0k be new distinct type names none of which admit equality. Then Abs(TE; E) is the result of simultaneously substituting t0i for ti, 1 ^ i ^ k, throughout Abs(TE) + E. (The effect of the latter substitution is to ensure that the use of equality on an abstype is restricted to the with part.)
4.10 Inference Rules Each rule of the semantics allows inferences among sentences of the form
A ` phrase ) A0
4.10 Inference Rules 21 where A is usually an environment or a context, phrase is a phrase of the Core, and A0 is a semantic object - usually a type or an environment. It may be pronounced "phrase elaborates to A0 in (context or environment) A". Some rules have extra hypotheses not of this form; they are called side conditions.
In the presentation of the rules, phrases within single angle brackets h i are called first options, and those within double angle brackets hh ii are called second options. To reduce the number of rules, we have adopted the following convention:
In each instance of a rule, the first options must be either all present or all absent; similarly the second options must be either all present or all absent.
Although not assumed in our definitions, it is intended that every context C = T; U; E has the property that tynames E ` T . Thus T may be thought of, loosely, as containing all type names which "have been generated". It is necessary to include T as a separate component in a context, since tynames E may not contain all the type names which have been generated; one reason is that a context T; ;; E is a projection of the basis B = T; F; G; E whose other components F and G could contain other such names - recorded in T but not present in E. Of course, remarks about what "has been generated" are not precise in terms of the semantic rules. But the following precise result may easily be demonstrated:
Let S be a sentence T; U; E ` phrase ) A such that tynames E ` T , and let S0 be a sentence T 0; U 0; E0 ` phrase0 ) A0 occurring in a proof of S; then also tynames E0 ` T 0.
Atomic Expressions C ` atexp ) o/
C ` scon ) type(scon) (1) C(longvid ) = (oe; is) oe O/ o/
C ` longvid ) o/ (2)
hC ` exprow ) %i C ` - hexprow i "" ) fgh+ %i in Type (3)
C ` dec ) E C \Phi E ` exp ) o/ tynames o/ ` T of C
C ` let dec in exp end ) o/ (4)
C ` exp ) o/ C ` ( exp ) ) o/ (5) Comments:
(2) The instantiation of type schemes allows different occurrences of a single longvid to
assume different types. Note that the identifier status is not used in this rule.
22 4 STATIC SEMANTICS FOR THE CORE
(4) The use of \Phi , here and elsewhere, ensures that type names generated by the first
sub-phrase are different from type names generated by the second sub-phrase.The side condition prevents type names generated by dec from escaping outside the local declaration.
Expression Rows C ` exprow ) %
C ` exp ) o/ hC ` exprow ) %i C ` lab = exp h , exprowi ) flab 7! o/ gh+ %i (6)
Expressions C ` exp ) o/
C ` atexp ) o/ C ` atexp ) o/ (7)
C ` exp ) o/ 0 ! o/ C ` atexp ) o/ 0
C ` exp atexp ) o/ (8)
C ` exp ) o/ C ` ty ) o/
C ` exp : ty ) o/ (9)
C ` exp ) o/ C ` match ) exn ! o/
C ` exp handle match ) o/ (10)
C ` exp ) exn C ` raise exp ) o/ (11)
C ` match ) o/ C ` fn match ) o/ (12) Comments:
(7) The relational symbol ` is overloaded for all syntactic classes (here atomic expressions
and expressions).
(9) Here o/ is determined by C and ty . Notice that type variables in ty cannot be instan
tiated in obtaining o/ ; thus the expression 1:'a will not elaborate successfully, nor will the expression (fn x=?x):'a-?'b. The effect of type variables in an explicitly typed expression is to indicate exactly the degree of polymorphism present in the expression.
(11) Note that o/ does not occur in the premise; thus a raise expression has "arbitrary"
type.
Matches C ` match ) o/
C ` mrule ) o/ hC ` match ) o/ i
C ` mrule h -- matchi ) o/ (13)
4.10 Inference Rules 23 Match Rules C ` mrule ) o/
C ` pat ) (VE; o/ ) C + VE ` exp ) o/ 0 tynames VE ` T of C
C ` pat =? exp ) o/ ! o/ 0 (14)
Comment: This rule allows new free type variables to enter the context. These new type variables will be chosen, in effect, during the elaboration of pat (i.e., in the inference of the first hypothesis). In particular, their choice may have to be made to agree with type variables present in any explicit type expression occurring within exp (see rule 9).
Declarations C ` dec ) E
U = tyvars(tyvarseq ) C + U ` valbind ) VE VE0 = ClosC;valbind VE U " tyvars VE0 = ;
C ` val tyvarseq valbind ) VE0 in Env (15)
C ` typbind ) TE C ` type typbind ) TE in Env (16)
C \Phi TE ` datbind ) VE; TE 8(t; VE0) 2 Ran TE; t =2 (T of C)
TE maximises equality
C ` datatype datbind ) (VE; TE) in Env (17)
C(longtycon) = (`; VE) ` = \Lambda ff(k):o/ tyvarseq = ff(k) TE = ftycon 7! (`; VE)g
C ` datatype tyvarseq tycon = datatype tyvarseq longtycon ) (VE; TE) in Env (18)
C \Phi TE ` datbind ) VE; TE 8(t; VE0) 2 Ran TE; t =2 (T of C)
C \Phi (VE; TE) ` dec ) E TE maximises equality
C ` abstype datbind with dec end ) Abs(TE; E) (19)
C ` exbind ) VE C ` exception exbind ) VE in Env (20)
C ` dec1 ) E1 C \Phi E1 ` dec2 ) E2
C ` local dec1 in dec2 end ) E2 (21)
C(longstrid 1) = E1 \Delta \Delta \Delta C(longstrid n) = En C ` open longstrid 1 \Delta \Delta \Delta longstrid n ) E1 + \Delta \Delta \Delta + En (22)
C ` ) fg in Env (23)
24 4 STATIC SEMANTICS FOR THE CORE
C ` dec1 ) E1 C \Phi E1 ` dec2 ) E2
C ` dec1 h;i dec2 ) E1 + E2 (24) Comments:
(15) Here VE will contain types rather than general type schemes. The closure of VE
allows value identifiers to be used polymorphically, via rule 2.
The side-condition on U ensures that the type variables in tyvarseq are bound by the closure operation, if they occur free in the range of VE.
On the other hand, if the phrase val tyvarseq valbind occurs inside some larger value binding val tyvarseq0 valbind 0 then no type variable ff listed in tyvarseq0 will become bound by the ClosC;valbind VE operation; for ff must be in U of C and hence excluded from closure by the definition of the closure operation (Section 4.8, page 20) since U of C ` tyvars C.
(17),(19) The side conditions express that the elaboration of each datatype binding
generates new type names and that as many of these new names as possible admit equality. Adding TE to the context on the left of the ` captures the recursive nature of the binding.
(18) Note that no new type name is generated (i.e., datatype replication is not genera
tive). By the syntactic restriction in Section 2.9 the two type variable sequences in the conclusion must be equal.
(19) The Abs operation was defined in Section 4.9, page 20. (20) No closure operation is used here, since VE maps exception constructors to types
rather than to general type schemes.
Value Bindings C ` valbind ) VE
C ` pat ) (VE; o/ ) C ` exp ) o/ hC ` valbind ) VE0i
C ` pat = exp hand valbind i ) VE h+ VE0i (25)
C + VE ` valbind ) VE tynames VE ` T of C
C ` rec valbind ) VE (26) Comments:
(25) When the option is present we have Dom VE " Dom VE0 = ; by the syntactic
restrictions.
(26) Modifying C by VE on the left captures the recursive nature of the binding. From
rule 25 we see that any type scheme occurring in VE will have to be a type. Thus each use of a recursive function in its own body must be ascribed the same type. Also note that C + VE may overwrite identifier status. For example, the program datatype t = f; val rec f = fn x =? x; is legal.
4.10 Inference Rules 25 Type Bindings C ` typbind ) TE
tyvarseq = ff(k) C ` ty ) o/ hC ` typbind ) TEi
C ` tyvarseq tycon = ty hand typbind i )
ftycon 7! (\Lambda ff(k):o/; fg)g h+ TEi
(27)
Comment: The syntactic restrictions ensure that the type function \Lambda ff(k):o/ satisfies the well-formedness constraints of Section 4.4 and they ensure tycon =2 Dom TE.
Datatype Bindings C ` datbind ) VE; TE
tyvarseq = ff(k) C; ff(k)t ` conbind ) VE hC ` datbind 0 ) VE0; TE0 8(t0; VE00) 2 Ran TE0; t 6= t0i
C ` tyvarseq tycon = conbind hand datbind 0i )
(ClosVEh+ VE0i; ftycon 7! (t; ClosVE)g h+ TE0i
(28)
Comment: The syntactic restrictions ensure Dom VE " Dom VE0 = ; and tycon =2 Dom TE0.
Constructor Bindings C; o/ ` conbind ) VE
hC ` ty ) o/ 0i hhC; o/ ` conbind ) VEii
C; o/ ` vid hof ty i hh -- conbindii ) fvid 7! (o/; c)g h+ fvid 7! (o/ 0 ! o/; c)g i hh+ VEii
(29)
Comment: By the syntactic restrictions vid =2 Dom VE. Exception Bindings C ` exbind ) VE
hC ` ty ) o/ i hhC ` exbind ) VEii C ` vid hof tyi hhand exbind ii )
fvid 7! (exn; e)g h+ fvid 7! (o/ ! exn; e)g i hh+ VEii
(30)
C(longvid ) = (o/; e) hC ` exbind ) VEi C ` vid = longvid hand exbind i ) fvid 7! (o/; e)g h+ VEi (31)
Comments:
(30) Notice that o/ may contain type variables. (30),(31) For each C and exbind , there is at most one VE satisfying C ` exbind ) VE.
26 4 STATIC SEMANTICS FOR THE CORE Atomic Patterns C ` atpat ) (VE; o/ )
C ` ) (fg; o/ ) (32) C ` scon ) (fg; type(scon)) (33) vid =2 Dom(C) or is of C(vid ) = v
C ` vid ) (fvid 7! (o/; v)g; o/ ) (34)
C(longvid ) = (oe; is) is 6= v oe O/ o/ (k)t
C ` longvid ) (fg; o/ (k)t) (35)
hC ` patrow ) (VE; %)i C ` - hpatrow i "" ) ( fgh+ VEi; fgh+ %i in Type ) (36)
C ` pat ) (VE; o/ ) C ` ( pat ) ) (VE; o/ ) (37) Comments:
(34), (35) The context C determines which of these two rules applies. In rule 34, note
that vid can assume a type, not a general type scheme.
Pattern Rows C ` patrow ) (VE; %)
C ` ... ) (fg; %) (38) C ` pat ) (VE; o/ ) hC ` patrow ) (VE0; %) Dom VE " Dom VE0 = ; lab =2 Dom %i
C ` lab = pat h , patrowi ) (VEh+ VE0i; flab 7! o/ gh+ %i) (39)
Patterns C ` pat ) (VE; o/ )
C ` atpat ) (VE; o/ ) C ` atpat ) (VE; o/ ) (40)
C(longvid ) = (oe; is) is 6= v oe O/ o/ 0 ! o/ C ` atpat ) (VE; o/ 0)
C ` longvid atpat ) (VE; o/ ) (41)
C ` pat ) (VE; o/ ) C ` ty ) o/
C ` pat : ty ) (VE; o/ ) (42)
4.11 Further Restrictions 27
vid =2 Dom(C) or is of C(vid ) = v hC ` ty ) o/ i C ` pat ) (VE; o/ ) vid =2 Dom VE
C ` vidh: tyi as pat ) (fvid 7! (o/; v)g + VE; o/ ) (43)
Type Expressions C ` ty ) o/
tyvar = ff C ` tyvar ) ff (44)
hC ` tyrow ) %i C ` - htyrow i "" ) fgh+ %i in Type (45)
tyseq = ty1\Delta \Delta \Delta tyk C ` tyi ) o/i (1 ^ i ^ k)
C(longtycon) = (`; VE)
C ` tyseq longtycon ) o/ (k)` (46)
C ` ty ) o/ C ` ty 0 ) o/ 0
C ` ty -? ty0 ) o/ ! o/ 0 (47)
C ` ty ) o/ C ` ( ty ) ) o/ (48)
Comments:
(46) Recall that for o/ (k)` to be defined, ` must have arity k.
Type-expression Rows C ` tyrow ) %
C ` ty ) o/ hC ` tyrow ) %i C ` lab : ty h , tyrowi ) flab 7! o/ gh+ %i (49)
Comment: The syntactic constraints ensure lab =2 Dom %.
4.11 Further Restrictions There are a few restrictions on programs which should be enforced by a compiler, but are better expressed apart from the preceding Inference Rules. They are:
1. For each occurrence of a record pattern containing a record wildcard, i.e. of the
form -lab1=pat1,\Delta \Delta \Delta ,labm=patm,..."" the program context must determine uniquely the domain flab1; \Delta \Delta \Delta ; labng of its row type, where m ^ n; thus, the context must determine the labels flabm+1; \Delta \Delta \Delta ; labng of the fields to be matched by the wildcard. For this purpose, an explicit type constraint may be needed.
28 4 STATIC SEMANTICS FOR THE CORE
2. In a match of the form pat 1 =? exp1 -- \Delta \Delta \Delta -- patn =? expn the pattern sequence
pat1; . . . ; patn should be irredundant; that is, each patj must match some value (of the right type) which is not matched by pati for any i ! j. In the context fn match, the match must also be exhaustive; that is, every value (of the right type) must be matched by some pati. The compiler must give warning on violation of these restrictions, but should still compile the match. The restrictions are inherited by derived forms; in particular, this means that in the function-value binding vid atpat1 \Delta \Delta \Delta atpat nh: tyi = exp (consisting of one clause only), each separate atpat i should be exhaustive by itself.
29 5 Static Semantics for Modules 5.1 Semantic Objects The simple objects for Modules static semantics are exactly as for the Core. The compound objects are those for the Core, augmented by those in Figure 11.
\Sigma or (T)E 2 Sig = TyNameSet \Theta Env \Phi or (T)(E; (T 0)E0) 2 FunSig = TyNameSet \Theta (Env \Theta Sig)
G 2 SigEnv = SigId fin! Sig F 2 FunEnv = FunId fin! FunSig B or T; F; G; E 2 Basis = TyNameSet \Theta FunEnv \Theta SigEnv \Theta Env
Figure 11: Further Compound Semantic Objects The prefix (T ), in signatures and functor signatures, binds type names. Certain operations require a change of bound names in semantic objects; see for example Section 5.2. When bound type names are changed, we demand that all of their attributes (i.e. equality and arity) are preserved.
The operations of projection, injection and modification are as for the Core. Moreover, we define C of B to be the context (T of B; ;; E of B), i.e. with an empty set of explicit type variables. Also, we frequently need to modify a basis B by an environment E (or a structure environment SE say), at the same time extending T of B to include the type names of E (or of SE say). We therefore define B \Phi SE, for example, to mean B + (tynames SE; SE).
There is no separate kind of semantic object to represent structures: structure expressions elaborate to environments, just as structure-level declarations do. Thus, notions which are commonly associated with structures (for example the notion of matching a structure against a signature) are defined in terms of environments.
5.2 Type Realisation A (type) realisation is a map ' : TyName ! TypeFcn such that t and '(t) have the same arity, and if t admits equality then so does '(t).
The support Supp ' of a type realisation ' is the set of type names t for which '(t) 6= t. The yield Yield ' of a realisation ' is the set of type names which occur in some '(t) for which t 2 Supp '.
Realisations ' are extended to apply to all semantic objects; their effect is to replace each name t by '(t). In applying ' to an object with bound names, such as a signature (T)E, first bound names must be changed so that, for each binding prefix (T ),
T " (Supp ' [ Yield ') = ; :
30 5 STATIC SEMANTICS FOR MODULES 5.3 Signature Instantiation An environment E2 is an instance of a signature \Sigma 1 = (T1)E1, written \Sigma 1*E2, if there exists a realisation ' such that '(E1) = E2 and Supp ' ` T1.
5.4 Functor Signature Instantiation A pair (E; (T 0)E0) is called a functor instance. Given \Phi = (T1)(E1; (T 01)E01), a functor instance (E2; (T 02)E02) is an instance of \Phi , written \Phi *(E2; (T 02)E02), if there exists a realisation ' such that '(E1; (T 01)E01) = (E2; (T 02)E02) and Supp ' ` T1.
5.5 Enrichment In matching an environment to a signature, the environment will be allowed both to have more components, and to be more polymorphic, than (an instance of) the signature. Precisely, we define enrichment of environments and type structures recursively as follows.
An environment E1 = (SE1; TE1; VE1) enriches another environment E2 = (SE2; TE2; VE2), written E1 O/ E2, if
1. Dom SE1 ' Dom SE2, and SE1(strid ) O/ SE2(strid ) for all strid 2 Dom SE2 2. Dom TE1 ' Dom TE2, and TE1(tycon) O/ TE2(tycon) for all tycon 2 Dom TE2 3. Dom VE1 ' Dom VE2, and VE1(vid ) O/ VE2(vid ) for all vid 2 Dom VE2, where
(oe1; is1) O/ (oe2; is2) means oe1 O/ oe2 and
is1 = is2 or is2 = v
Finally, a type structure (`1; VE1) enriches another type structure (`2; VE2), written (`1; VE1) O/ (`2; VE2), if
1. `1 = `2 2. Either VE1 = VE2 or VE2 = fg
5.6 Signature Matching An environment E matches a signature \Sigma 1 if there exists an environment E\Gamma such that \Sigma 1 * E\Gamma OE E. Thus matching is a combination of instantiation and enrichment. There is at most one such E\Gamma , given \Sigma 1 and E.
5.7 Inference Rules 31 5.7 Inference Rules As for the Core, the rules of the Modules static semantics allow sentences of the form
A ` phrase ) A0 to be inferred, where in this case A is either a basis, a context or an environment and A0 is a semantic object. The convention for options is as in the Core semantics.
Although not assumed in our definitions, it is intended that every basis B = T; F; G; E in which a topdec is elaborated has the property that tynames F [tynames G[tynames E ` T . The following Theorem can be proved:
Let S be an inferred sentence B ` topdec ) B0 in which B satisfies the above condition. Then B0 also satisfies the condition.
Moreover, if S0 is a sentence of the form B00 ` phrase ) A occurring in a proof of S, where phrase is any Modules phrase, then B00 also satisfies the condition.
Finally, if T; U; E ` phrase ) A occurs in a proof of S, where phrase is a phrase of Modules or of the Core, then tynames E ` T .
Structure Expressions B ` strexp ) E
B ` strdec ) E B ` struct strdec end ) E (50)
B(longstrid ) = E B ` longstrid ) E (51)
B ` strexp ) E B ` sigexp ) \Sigma \Sigma * E0 OE E
B ` strexp:sigexp ) E0 (52)
B ` strexp ) E B ` sigexp ) (T 0)E0
(T 0)E0 * E00 OE E T 0 " (T of B) = ;
B ` strexp:?sigexp ) E0 (53)
B ` strexp ) E B(funid )*(E00; (T 0)E0) ; E O/ E00
(tynames E [ T of B) " T 0 = ;
B ` funid ( strexp ) ) E0 (54)
B ` strdec ) E1 B \Phi E1 ` strexp ) E2
B ` let strdec in strexp end ) E2 (55)
32 5 STATIC SEMANTICS FOR MODULES
Comments:
(54) The side condition (tynames E [T of B)"T 0 = ; can always be satisfied by renaming
bound names in (T 0)E0; it ensures that the generated datatypes receive new names.
Let B(funid ) = (T )(Ef ; (T 0)E0f ). Let ' be a realisation such that '(Ef ; (T 0)E0f ) = (E00; (T 0)E0). Sharing between argument and result specified in the declaration of the functor funid is represented by the occurrence of the same name in both Ef and E0f , and this repeated occurrence is preserved by ', yielding sharing between the argument structure E and the result structure E0 of this functor application.
(55) The use of \Phi , here and elsewhere, ensures that type names generated by the first
sub-phrase are distinct from names generated by the second sub-phrase.
Structure-level Declarations B ` strdec ) E
C of B ` dec ) E
B ` dec ) E (56)
B ` strbind ) SE B ` structure strbind ) SE in Env (57)
B ` strdec1 ) E1 B \Phi E1 ` strdec2 ) E2
B ` local strdec1 in strdec2 end ) E2 (58)
B ` ) fg in Env (59) B ` strdec1 ) E1 B \Phi E1 ` strdec2 ) E2
B ` strdec1 h;i strdec2 ) E1 + E2 (60)
Structure Bindings B ` strbind ) SE
B ` strexp ) E hB + tynames E ` strbind ) SEi B ` strid = strexp hand strbind i ) fstrid 7! Eg h+ SEi (61)
Signature Expressions B ` sigexp ) E
B ` spec ) E B ` sig spec end ) E (62)
5.7 Inference Rules 33
B(sigid) = (T )E T " (T of B) = ;
B ` sigid ) E (63)
B ` sigexp ) E tyvarseq = ff(k) C of B ` ty ) o/
E(longtycon ) = (t; VE) t =2 (T of B) [ tynames o/ ' = ft 7! \Lambda ff(k):o/ g \Lambda ff(k):o/ admits equality, if t does '(E) well-formed
B ` sigexp where type tyvarseq longtycon = ty ) '(E) (64) Comments: