nsml/sml97.txt



The Definition of Standard ML

Revised 1996

August 13, 1996

Version: 7.4.0

Robin Milner, Mads Tofte, Robert Harper and David MacQueen

Prelude to the Revised Definition Standard ML is an industrial strength programming language, one of very few with a fully formal definition. The Definition of Standard ML was published in 1990. Since then the implementation technology of the language has advanced enormously, and its users have multiplied. The language and its Definition have therefore incited close scrutiny, evaluation, much approval, sometimes strong criticism.

The originators of the language have sifted this response, and found that there are inadequacies in the original language and its formal Definition. They are of three kinds: missing features which many users want; complex and little-used features which most users can do without; and mistakes of definition. What is remarkable is that these inadequacies are rather few, and that they are rather uncontroversial.

This new version of the Definition addresses the three kinds of inadequacy respectively by additions, subtractions and corrections. But we have only made such amendments when one or more aspects of SML - the language itself, its usage, its implementation, its formal Definition - have thus become simpler, without complicating the other aspects. It is worth noting that even the additions meet this criterion; for example we have introduced type abbreviations in signatures to simplify the use of the language, but the way we have done it has even simplified the Definition too. In fact, after our changes the formal Definition has fewer rules.

In the 1990 Definition it was predicted that further versions of the Definition would be produced as the language develops, with the intention to minimise the number of versions. This is the first revised version, and we foresee no others.

The shape of this new version of the Definition is as follows. The 1990 Definition, fully revised to deal with the amendments, has been sandwiched between this Prelude at the front, and a postlude (Appendix G) at the back which enumerates all the amendments which have been done, giving the rationale for each, and outlining the changes which it implies for the language and for the 1990 Definition.

Robin Milner Mads Tofte Robert Harper David MacQueen

July 1996

iii

Preface A precise description of a programming language is a prerequisite for its implementation and for its use. The description can take many forms, each suited to a different purpose. A common form is a reference manual, which is usually a careful narrative description of the meaning of each construction in the language, often backed up with a formal presentation of the grammar (for example, in Backus-Naur form). This gives the programmer enough understanding for many of his purposes. But it is ill-suited for use by an implementer, or by someone who wants to formulate laws for equivalence of programs, or by a programmer who wants to design programs with mathematical rigour.

This document is a formal description of both the grammar and the meaning of a language which is both designed for large projects and widely used. As such, it aims to serve the whole community of people seriously concerned with the language. At a time when it is increasingly understood that programs must withstand rigorous analysis, particular for systems where safety is critical, a rigorous language presentation is even important for negotiators and contractors; for a robust program written in an insecure language is like a house built upon sand.

Most people have not looked at a rigorous language presentation before. To help them particularly, but also to put the present work in perspective for those more theoretically prepared, it will be useful here to say something about three things: the nature of Standard ML, the task of language definition in general, and the form of the present Definition.

Standard ML Standard ML is a functional programming language, in the sense that the full power of mathematical functions is present. But it grew in response to a particular programming task, for which it was equipped also with full imperative power, and a sophisticated exception mechanism. It has an advanced form of parametric modules, aimed at organised development of large programs. Finally it is strongly typed, and it was the first language to provide a particular form of polymorphic type which makes the strong typing remarkably flexible. This combination of ingredients has not made it unduly large, but their novelty has been a fascinating challenge to semantic method (of which we say more below).

ML has evolved over twenty years as a fusion of many ideas from many people. This evolution is described in some detail in Appendix F of the book, where also we acknowledge all those who have contributed to it, both in design and in implementation.

`ML' stands for meta language; this is the term logicians use for a language in which other (formal or informal) languages are discussed and analysed. Originally ML was conceived as a medium for finding and performing proofs in a logical language. Conducting rigorous argument as dialogue between person and machine has been a strong research interest at Edinburgh and elsewhere, throughout these fourteen years. The difficulties are enormous, and make stern demands upon the programming language which is used for this dialogue. Those who are not familiar with computer-assisted reasoning may be surprised that a programming language, which was designed for this rather esoteric activity, should

iv

ever lay claim to being generally useful. On reflection, they should not be surprised. LISP is a prime example of a language invented for esoteric purposes and becoming widely used. LISP was invented for use in artificial intelligence (AI); the important thing about AI here is not that it is esoteric, but that it is difficult and varied; so much so, that anything which works well for it must work well for many other applications too.

The same can be said about the initial purpose of ML, but with a different emphasis. Rigorous proofs are complex things, which need varied and sophisticated presentation - particularly on the screen in interactive mode. Furthermore the proof methods, or strategies, involved are some of the most complex algorithms which we know. This all applies equally to AI, but one demand is made more strongly by proof than perhaps by any other application: the demand for rigour.

This demand established the character of ML. In order to be sure that, when the user and the computer claim to have together performed a rigorous argument, their claim is justified, it was seen that the language must be strongly typed. On the other hand, to be useful in a difficult application, the type system had to be rather flexible, and permit the machine to guide the user rather than impose a burden upon him. A reasonable solution was found, in which the machine helps the user significantly by inferring his types for him. Thereby the machine also confers complete reliability on his programs, in this sense: If a program claims that a certain result follows from the rules of reasoning which the user has supplied, then the claim may be fully trusted.

The principle of inferring useful structural information about programs is also represented, at the level of program modules, by the inference of signatures. Signatures describe the interfaces between modules, and are vital for robust large-scale programs. When the user combines modules, the signature discipline prevents him from mismatching their interfaces. By programming with interfaces and parametric modules, it becomes possible to focus on the structure of a large system, and to compile parts of it in isolation from one another - even when the system is incomplete.

This emphasis on types and signatures has had a profound effect on the language Definition. Over half this document is devoted to inferring types and signatures for programs. But the method used is exactly the same as for inferring what values a program delivers; indeed, a type or signature is the result of a kind of abstract evaluation of a program phrase.

In designing ML, the interplay among three activities - language design, definition and implementation - was extremely close. This was particularly true for the newest part, the parametric modules. This part of the language grew from an initial proposal by David MacQueen, itself highly developed; but both formal definition and implementation had a strong influence on the detailed design. In general, those who took part in the three activities cannot now imagine how they could have been properly done separately.

Language Definition Every programming language presents its own conceptual view of computation. This view is usually indicated by the names used for the phrase classes of the language, or by its

v

keywords: terms like package, module, structure, exception, channel, type, procedure, reference, sharing, . . . . These terms also have their abstract counterparts, which may be called semantic objects; these are what people really have in mind when they use the language, or discuss it, or think in it. Also, it is these objects, not the syntax, which represent the particular conceptual view of each language; they are the character of the language. Therefore a definition of the language must be in terms of these objects.

As is commonly done in programming language semantics, we shall loosely talk of these semantic objects as meanings. Of course, it is perfectly possible to understand the semantic theory of a language, and yet be unable to understand the meaning of a particular program, in the sense of its intention or purpose. The aim of a language definition is not to formalise everything which could possibly be called the meaning of a program, but to establish a theory of semantic objects upon which the understanding of particular programs may rest.

The job of a language-definer is twofold. First - as we have already suggested - he must create a world of meanings appropriate for the language, and must find a way of saying what these meanings precisely are. Here, he meets a problem; notation of some kind must be used to denote and describe these meanings - but not a programming language notation, unless he is passing the buck and defining one programming language in terms of another. Given a concern for rigour, mathematical notation is an obvious choice. Moreover, it is not enough just to write down mathematical definitions. The world of meanings only becomes meaningful if the objects possess nice properties, which make them tractable. So the language-definer really has to develop a small theory of his meanings, in the same way that a mathematician develops a theory. Typically, after initially defining some objects, the mathematician goes on to verify properties which indicate that they are objects worth studying. It is this part, a kind of scene-setting, which the language-definer shares with the mathematician. Of course he can take many objects and their theories directly from mathematics, such as functions, relations, trees, sequences, . . . . But he must also give some special theory for the objects which make his language particular, as we do for types, structures and signatures in this book; otherwise his language definition may be formal but will give no insight.

The second part of the definer's job is to define evaluation precisely. This means that he must define at least what meaning, M , results from evaluating any phrase P of his language (though he need not explain exactly how the meaning results; that is he need not give the full detail of every computation). This part of his job must be formal to some extent, if only because the phrases P of his language are indeed formal objects. But there is another reason for formality. The task is complex and error-prone, and therefore demands a high level of explicit organisation (which is, largely, the meaning of `formality'); moreover, it will be used to specify an equally complex, error-prone and formal construction: an implementation.

We shall now explain the keystone of our semantic method. First, we need a slight but important refinement. A phrase P is never evaluated in vacuo to a meaning M , but always against a background; this background - call it B - is itself a semantic object, being a distillation of the meanings preserved from evaluation of earlier phrases (typically variable

vi

declarations, procedure declarations, etc.). In fact evaluation is background-dependent - M depends upon B as well as upon P .

The keystone of the method, then, is a certain kind of assertion about evaluation; it takes the form

B ` P ) M

and may be pronounced: `Against the background B, the phrase P evaluates to the meaning M '. The formal purpose of this Definition is no more, and no less, than to decree exactly which assertions of this form are true. This could be achieved in many ways. We have chosen to do it in a structured way, as others have, by giving rules which allow assertions about a compound phrase P to be inferred from assertions about its constituent phrases P1; . . . ; Pn.

The form of the Definition We have written the Definition in a form suggested by the previous remarks. That is, we have defined our semantic objects in mathematical notation which is completely independent of Standard ML, and we have developed just enough of their theory to give sense to our rules of evaluation. Following another suggestion above, we have factored our task by describing abstract evaluation - the inference and checking of types and signatures (which can be done at compile-time) - completely separately from concrete evaluation. It really is a factorisation, because a full value in all its glory - you can think of it as a concrete object with a type attached - never has to be presented.

The resulting document is, we hope, valuable as the essential point of reference for Standard ML. If it is to play this role well, it must be supplemented by other literature. Many expository books have already been written, and this Definition will be useful as a background reference for their readers. We became convinced, while writing the 1990 Definition, that we could not discuss many questions without making it far too long. Such questions are: Why were certain design choices made? What are their implications for programming? Was there a good alternative meaning for some constructs, or was our hand forced? What different forms of phrase are equivalent? What is the proof of certain claims? Many of these questions are not answered by pedagogic texts either. We therefore wrote a Commentary on the 1990 Definition to assist people in reading it, and to serve as a bridge between the Definition and other texts. Though in part outdated by the present revision, the Commentary still fulfils its purpose.

There exist several textbooks on programming with Standard ML[44,43,55,49]. The second edition of Pauson's book[44] conforms with the present revision.

vii

Contents 1 Introduction 1 2 Syntax of the Core 3

2.1 Reserved Words : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 2.2 Special constants : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 2.3 Comments : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2.4 Identifiers : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 4 2.5 Lexical analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.6 Infixed operators : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2.7 Derived Forms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 2.8 Grammar : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 2.9 Syntactic Restrictions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8

3 Syntax of Modules 11

3.1 Reserved Words : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11 3.2 Identifiers : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11 3.3 Infixed operators : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11 3.4 Grammar for Modules : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 11 3.5 Syntactic Restrictions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12

4 Static Semantics for the Core 15

4.1 Simple Objects : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15 4.2 Compound Objects : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 15 4.3 Projection, Injection and Modification : : : : : : : : : : : : : : : : : : : : 17 4.4 Types and Type functions : : : : : : : : : : : : : : : : : : : : : : : : : : : 17 4.5 Type Schemes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 18 4.6 Scope of Explicit Type Variables : : : : : : : : : : : : : : : : : : : : : : : 18 4.7 Non-expansive Expressions : : : : : : : : : : : : : : : : : : : : : : : : : : : 19 4.8 Closure : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 19 4.9 Type Structures and Type Environments : : : : : : : : : : : : : : : : : : : 20 4.10 Inference Rules : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 20 4.11 Further Restrictions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 27

5 Static Semantics for Modules 29

5.1 Semantic Objects : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 29 5.2 Type Realisation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 29 5.3 Signature Instantiation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30 5.4 Functor Signature Instantiation : : : : : : : : : : : : : : : : : : : : : : : : 30 5.5 Enrichment : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30 5.6 Signature Matching : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 30 5.7 Inference Rules : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 31

viii

6 Dynamic Semantics for the Core 37

6.1 Reduced Syntax : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 37 6.2 Simple Objects : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 37 6.3 Compound Objects : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 37 6.4 Basic Values : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 38 6.5 Basic Exceptions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 39 6.6 Function Closures : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 39 6.7 Inference Rules : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 40

7 Dynamic Semantics for Modules 48

7.1 Reduced Syntax : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 48 7.2 Compound Objects : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 48 7.3 Inference Rules : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 49

8 Programs 54 A Appendix: Derived Forms 56 B Appendix: Full Grammar 61 C Appendix: The Initial Static Basis 66 D Appendix: The Initial Dynamic Basis 67 E Overloading 68

E.1 Overloaded special constants : : : : : : : : : : : : : : : : : : : : : : : : : : 68 E.2 Overloaded value identifiers : : : : : : : : : : : : : : : : : : : : : : : : : : 69

F Appendix: The Development of ML 70 G Appendix: What is New? 77

G.1 Type Abbreviations in Signatures : : : : : : : : : : : : : : : : : : : : : : : 77 G.2 Opaque Signature Matching : : : : : : : : : : : : : : : : : : : : : : : : : : 78 G.3 Sharing : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 79

G.3.1 Type Sharing : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 79 G.3.2 The equality attribute of specified types : : : : : : : : : : : : : : : 80 G.3.3 Structure Sharing : : : : : : : : : : : : : : : : : : : : : : : : : : : : 81 G.4 Value Polymorphism : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 82 G.5 Identifier Status : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 82 G.6 Replication of Datatypes : : : : : : : : : : : : : : : : : : : : : : : : : : : : 83 G.7 Local Datatypes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 84 G.8 Principal Environments : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 85 G.9 Consistency and Admissibility : : : : : : : : : : : : : : : : : : : : : : : : : 86 G.10 Special Constants : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 86 G.11 Comments : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 87

ix

G.12 Infixed Operators : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 87 G.13 Non-expansive Expressions : : : : : : : : : : : : : : : : : : : : : : : : : : : 87 G.14 Rebinding of built-in identifiers : : : : : : : : : : : : : : : : : : : : : : : : 87 G.15 Grammar for Modules : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 87 G.16 Closure Restrictions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 87 G.17 Specifications : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 87 G.18 Scope of Explicit Type Variables : : : : : : : : : : : : : : : : : : : : : : : 88 G.19 The Initial Basis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 88 G.20 Overloading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 88

References 89 Index 94

x

1 1 Introduction This document formally defines Standard ML.

To understand the method of definition, at least in broad terms, it helps to consider how an implementation of ML is naturally organised. ML is an interactive language, and a program consists of a sequence of top-level declarations; the execution of each declaration modifies the top-level environment, which we call a basis, and reports the modification to the user.

In the execution of a declaration there are three phases: parsing, elaboration, and evaluation. Parsing determines the grammatical form of a declaration. Elaboration, the static phase, determines whether it is well-typed and well-formed in other ways, and records relevant type or form information in the basis. Finally evaluation, the dynamic phase, determines the value of the declaration and records relevant value information in the basis. Corresponding to these phases, our formal definition divides into three parts: grammatical rules, elaboration rules, and evaluation rules. Furthermore, the basis is divided into the static basis and the dynamic basis; for example, a variable which has been declared is associated with a type in the static basis and with a value in the dynamic basis.

In an implementation, the basis need not be so divided. But for the purpose of formal definition, it eases presentation and understanding to keep the static and dynamic parts of the basis separate. This is further justified by programming experience. A large proportion of errors in ML programs are discovered during elaboration, and identified as errors of type or form, so it follows that it is useful to perform the elaboration phase separately. In fact, elaboration without evaluation is part of what is normally called compilation; once a declaration (or larger entity) is compiled one wishes to evaluate it - repeatedly - without re-elaboration, from which it follows that it is useful to perform the evaluation phase separately.

A further factoring of the formal definition is possible, because of the structure of the language. ML consists of a lower level called the Core language (or Core for short), a middle level concerned with programming-in-the-large called Modules, and a very small upper level called Programs. With the three phases described above, there is therefore a possibility of nine components in the complete language definition. We have allotted one section to each of these components, except that we have combined the parsing, elaboration and evaluation of Programs in one section. The scheme for the ensuing seven sections is therefore as follows:

Core Modules Programs Syntax Section 2 Section 3 Static Semantics Section 4 Section 5 Section 8 Dynamic Semantics Section 6 Section 7

The Core provides many phrase classes, for programming convenience. But about half of these classes are derived forms, whose meaning can be given by translation into the other half which we call the Bare language. Thus each of the three parts for the

2 1 INTRODUCTION Core treats only the bare language; the derived forms are treated in Appendix A. This appendix also contains a few derived forms for Modules. A full grammar for the language is presented in Appendix B.

In Appendices C and D the initial basis is detailed. This basis, divided into its static and dynamic parts, contains the static and dynamic meanings of a small set of predefined identifiers. A richer basis is defined in a separate document[17].

The semantics is presented in a form known as Natural Semantics. It consists of a set of rules allowing sentences of the form

A ` phrase ) A0 to be inferred, where A is often a basis (static or dynamic) and A0 a semantic object - often a type in the static semantics and a value in the dynamic semantics. One should read such a sentence as follows: "against the background provided by A, the phrase phrase elaborates - or evaluates - to the object A0". Although the rules themselves are formal the semantic objects, particularly the static ones, are the subject of a mathematical theory which is presented in a succinct form in the relevant sections.

The robustness of the semantics depends upon theorems. Usually these have been proven, but the proof is not included.

3 2 Syntax of the Core 2.1 Reserved Words The following are the reserved words used in the Core. They may not (except = ) be used as identifiers.

abstype and andalso as case datatype do else end exception fn fun handle if in infix infixr let local nonfix of op open orelse raise rec then type val with withtype while ( ) [ ] - "" , : ; ... . -- = =? -? #

2.2 Special constants An integer constant (in decimal notation) is an optional negation symbol (~) followed by a non-empty sequence of decimal digits (0-9). An integer constant (in hexadecimal notation) is an optional negation symbol followed by 0x followed by a non-empty sequence of hexadecimal digits (0-9a-fA-F, where A-F are alternatives for a-f, respectively).

A word constant (in decimal notation) is 0w followed by a non-empty sequence of decimal digits. A word constant (in hexadecimal notation) is 0wx followed by a non-empty sequence of hexadecimal digits. A real constant is an integer constant in decimal notation, possibly followed by a point (.) and one or more decimal digits, possibly followed by an exponent symbol (E or e) and an integer constant in decimal notation; at least one of the optional parts must occur, hence no integer constant is a real constant. Examples: 0.7 3.32E5 3E~7 . Non-examples: 23 .3 4.E5 1E2.0 .

We assume an underlying alphabet of N characters (N * 256), numbered 0 to N \Gamma  1, which agrees with the ASCII character set on the characters numbered 0 to 127. The interval [0; N \Gamma  1] is called the ordinal range of the alphabet. A string constant is a sequence, between quotes ("), of zero or more printable characters (i.e., numbered 33- 126), spaces or escape sequences. Each escape sequence starts with the escape character " , and stands for a character sequence. The escape sequences are:

"a A single character interpreted by the system as alert (ASCII 7) "b Backspace (ASCII 8) "t Horizontal tab (ASCII 9) "n Linefeed, also known as newline (ASCII 10) "v Vertical tab (ASCII 11) "f Form feed (ASCII 12) "r Carriage return (ASCII 13) "^c The control character c, where c may be any character with number

64-95. The number of "^c is 64 less than the number of c. "ddd The single character with number ddd (3 decimal digits denoting

an integer in the ordinal range of the alphabet).

4 2 SYNTAX OF THE CORE

"uxxxx The single character with number xxxx (4 hexadecimal digits de

noting an integer in the ordinal range of the alphabet). "" " "" " "f \Delta  \Delta f " This sequence is ignored, where f \Delta  \Delta f stands for a sequence of one

or more formatting characters.

The formatting characters are a subset of the non-printable characters including at least space, tab, newline, formfeed. The last form allows long strings to be written on more than one line, by writing " at the end of one line and at the start of the next.

A character constant is a sequence of the form #s, where s is a string constant denoting a string of size one character.

Libraries may provide multiple numeric types and multiple string types. To each string type corresponds an alphabet with ordinal range [0; N \Gamma  1] for some N * 256; each alphabet must agree with the ASCII character set on the characters numbered 0 to 127. When multiple alphabets are supported, all characters of a given string constant are interpreted over the same alphabet. For each special constant, overloading resolution is used for determining the type of the constant (see Appendix E).

We denote by SCon the class of special constants, i.e., the integer, real, word, character and string constants; we shall use scon to range over SCon.

2.3 Comments A comment is any character sequence within comment brackets (* *) in which comment brackets are properly nested. No space is allowed between the two characters which make up a comment bracket (* or *). An unmatched (* should be detected by the compiler.

2.4 Identifiers The classes of identifiers for the Core are shown in Figure 1. We use vid , tyvar to range over VId, TyVar etc. For each class X marked "long" there is a class longX of long identifiers; if x ranges over X then longx ranges over longX. The syntax of these long identifiers is given by the following:

longx ::= x identifier

strid 1:\Delta \Delta \Delta :strid n:x qualified identifier (n * 1)

VId (value identifiers ) long TyVar (type variables ) TyCon (type constructors ) long Lab (record labels ) StrId (structure identifiers ) long

Figure 1: Identifiers

2.5 Lexical analysis 5 The qualified identifiers constitute a link between the Core and the Modules. Throughout this document, the term "identifier", occurring without an adjective, refers to nonqualified identifiers only.

An identifier is either alphanumeric: any sequence of letters, digits, primes (') and underbars ( ) starting with a letter or prime, or symbolic: any non-empty sequence of the following symbols

! % & $ # + - / : ! = ? ? @ " ~ ` ^ -- * In either case, however, reserved words are excluded. This means that for example # and -- are not identifiers, but ## and --=-- are identifiers. The only exception to this rule is that the symbol = , which is a reserved word, is also allowed as an identifier to stand for the equality predicate. The identifier = may not be re-bound; this precludes any syntactic ambiguity.

A type variable tyvar may be any alphanumeric identifier starting with a prime; the subclass EtyVar of TyVar, the equality type variables, consists of those which start with two or more primes. The other four classes (VId, TyCon, Lab and StrId) are represented by identifiers not starting with a prime. However, * is excluded from TyCon, to avoid confusion with the derived form of tuple type (see Figure 23). The class Lab is extended to include the numeric labels 1 2 3 \Delta \Delta \Delta , i.e. any numeral not starting with 0.

TyVar is therefore disjoint from the other four classes. Otherwise, the syntax class of an occurrence of identifier id in a Core phrase (ignoring derived forms, Section 2.7) is determined thus:

1. Immediately before "." - i.e. in a long identifier - or in an open declaration, id is

a structure identifier. The following rules assume that all occurrences of structure identifiers have been removed.

2. At the start of a component in a record type, record pattern or record expression,

id is a record label.

3. Elsewhere in types id is a type constructor, and must be within the scope of the

type binding or datatype binding which introduced it.

4. Elsewhere, id is a value identifier.

By means of the above rules a compiler can determine the class to which each identifier occurrence belongs; for the remainder of this document we shall therefore assume that the classes are all disjoint.

2.5 Lexical analysis Each item of lexical analysis is either a reserved word, a numeric label, a special constant or a long identifier. Comments and formatting characters separate items (except within string constants; see Section 2.2) and are otherwise ignored. At each stage the longest next item is taken.

6 2 SYNTAX OF THE CORE 2.6 Infixed operators An identifier may be given infix status by the infix or infixr directive, which may occur as a declaration; this status only pertains to its use as a vid within the scope (see below) of the directive, and in these uses it is called an infixed operator. (Note that qualified identifiers never have infix status.) If vid has infix status, then "exp1 vid exp2" (resp. "pat1 vid pat2") may occur - in parentheses if necessary - wherever the application "vid -1=exp1,2=exp2""" or its derived form "vid (exp1,exp2)" (resp "vid (pat1,pat2)") would otherwise occur. On the other hand, an occurrence of any long identifier (qualified or not) prefixed by op is treated as non-infixed. The only required use of op is in prefixing a non-infixed occurrence of an identifier vid which has infix status; elsewhere op, where permitted, has no effect. Infix status is cancelled by the nonfix directive. We refer to the three directives collectively as fixity directives.

The form of the fixity directives is as follows (n * 1):

infix hdi vid 1 \Delta \Delta \Delta  vid n infixr hdi vid 1 \Delta \Delta \Delta  vid n

nonfix vid 1 \Delta \Delta \Delta  vid n where hdi is an optional decimal digit d indicating binding precedence. A higher value of d indicates tighter binding; the default is 0. infix and infixr dictate left and right associativity respectively. In an expression of the form exp1 vid 1 exp2 vid 2 exp3, where vid 1 and vid 2 are infixed operators with the same precedence, either both must associate to the left or both must associate to the right. For example, suppose that !! and ?? have equal precedence, but associate to the left and right respectively; then

x !! y !! z parses as (x !! y) !! z x ?? y ?? z parses as x ?? (y ?? z) x !! y ?? z is illegal x ?? y !! z is illegal

The precedence of infixed operators relative to other expression and pattern constructions is given in Appendix B.

The scope of a fixity directive dir is the ensuing program text, except that if dir occurs in a declaration dec in either of the phrases

let dec in \Delta \Delta \Delta  end local dec in \Delta \Delta \Delta  end then the scope of dir does not extend beyond the phrase. Further scope limitations are imposed for Modules.

These directives and op are omitted from the semantic rules, since they affect only parsing.

2.7 Derived Forms 7

AtExp atomic expressions ExpRow expression rows Exp expressions Match matches Mrule match rules

Dec declarations ValBind value bindings TypBind type bindings DatBind datatype bindings ConBind constructor bindings ExBind exception bindings

AtPat atomic patterns PatRow pattern rows Pat patterns

Ty type expressions TyRow type-expression rows

Figure 2: Core Phrase Classes

2.7 Derived Forms There are many standard syntactic forms in ML whose meaning can be expressed in terms of a smaller number of syntactic forms, called the bare language. These derived forms, and their equivalent forms in the bare language, are given in Appendix A.

2.8 Grammar The phrase classes for the Core are shown in Figure 2. We use the variable atexp to range over AtExp, etc. The grammatical rules for the Core are shown in Figures 3 and 4.

The following conventions are adopted in presenting the grammatical rules, and in their interpretation:

ffl The brackets h i enclose optional phrases. ffl For any syntax class X (over which x ranges) we define the syntax class Xseq (over

which xseq ranges) as follows:

xseq ::= x (singleton sequence)

(empty sequence) (x1,\Delta \Delta \Delta ,xn) (sequence, n * 1)

(Note that the "\Delta \Delta \Delta " used here, meaning syntactic iteration, must not be confused with "..." which is a reserved word of the language.)

8 2 SYNTAX OF THE CORE

ffl Alternative forms for each phrase class are in order of decreasing precedence; this

resolves ambiguity in parsing, as explained in Appendix B.

ffl L (resp. R) means left (resp. right) association. ffl The syntax of types binds more tightly than that of expressions. ffl Each iterated construct (e.g. match, \Delta \Delta \Delta ) extends as far right as possible; thus,

parentheses may be needed around an expression which terminates with a match, e.g. "fn match", if this occurs within a larger match.

atpat ::= wildcard

scon special constant hopilongvid value identifier - hpatrow i "" record ( pat )

patrow ::= ... wildcard

lab = pat h , patrowi pattern row

pat ::= atpat atomic

hopilongvid atpat constructed pattern pat1 vid pat2 infixed value construction pat : ty typed hopividh: tyi as pat layered

ty ::= tyvar type variable

- htyrow i "" record type expression tyseq longtycon type construction ty -? ty0 function type expression (R) ( ty )

tyrow ::= lab : ty h , tyrowi type-expression row

Figure 3: Grammar: Patterns and Type expressions

2.9 Syntactic Restrictions

ffl No expression row, pattern row or type-expression row may bind the same lab twice. ffl No binding valbind , typbind , datbind or exbind may bind the same identifier twice;

this applies also to value identifiers within a datbind .

ffl In the left side tyvarseq tycon of any typbind or datbind , tyvarseq must not contain

the same tyvar twice. Any tyvar occurring within the right side must occur in tyvarseq .

2.9 Syntactic Restrictions 9

ffl For each dec of the form datatype tyvarseq tycon = datatype tyvarseq 0 longtycon , the

sequences tyvarseq and tyvarseq0 must be equal and neither may contain the same type variable twice.

ffl For each value binding pat = exp within rec, exp must be of the form fn match,

possibly constrained by one or more type expressions. The derived form of functionvalue binding given in Appendix A, page 58, necessarily obeys this restriction.

ffl No datbind or exbind may bind true, false, it, nil, :: or ref.

10 2 SYNTAX OF THE CORE

atexp ::= scon special constant

hopilongvid value identifier - hexprow i "" record let dec in exp end local declaration ( exp )

exprow ::= lab = exp h , exprowi expression row exp ::= atexp atomic

exp atexp application (L) exp1 vid exp2 infixed application exp : ty typed (L) exp handle match handle exception raise exp raise exception fn match function

match ::= mrule h -- matchi mrule ::= pat =? exp dec ::= val tyvarseq valbind value declaration

type typbind type declaration datatype datbind datatype declaration datatype tyvarseq tycon =

datatype tyvarseq longtycon datatype replication abstype datbind with dec end abstype declaration exception exbind exception declaration local dec1 in dec2 end local declaration open longstrid 1 \Delta \Delta \Delta  longstrid n open declaration (n * 1)

empty declaration dec1 h;i dec2 sequential declaration infix hdi vid 1 \Delta \Delta \Delta  vid n infix (L) directive infixr hdi vid 1 \Delta \Delta \Delta  vid n infix (R) directive nonfix vid 1 \Delta \Delta \Delta  vid n nonfix directive

valbind ::= pat = exp hand valbind i

rec valbind

typbind ::= tyvarseq tycon = ty hand typbind i datbind ::= tyvarseq tycon = conbind hand datbind i conbind ::= hopivid hof tyi h -- conbindi exbind ::= hopivid hof tyi hand exbind i

hopivid = hopilongvid hand exbind i

Figure 4: Grammar: Expressions, Matches, Declarations and Bindings

11 3 Syntax of Modules For Modules there are further reserved words, identifier classes and derived forms. There are no further special constants; comments and lexical analysis are as for the Core. The derived forms for modules appear in Appendix A.

3.1 Reserved Words The following are the additional reserved words used in Modules.

eqtype functor include sharing sig signature struct structure where :?

3.2 Identifiers The additional identifier classes for Modules are SigId (signature identifiers) and FunId (functor identifiers); they may be either alphanumeric - not starting with a prime - or symbolic. The class of each identifier occurrence is determined by the grammatical rules which follow. Henceforth, therefore, we consider all identifier classes to be disjoint.

3.3 Infixed operators In addition to the scope rules for fixity directives given for the Core syntax, there is a further scope limitation: if dir occurs in a structure-level declaration strdec in any of the phrases

let strdec in \Delta \Delta \Delta  end

local strdec in \Delta \Delta \Delta  end

struct strdec end then the scope of dir does not extend beyond the phrase.

One effect of this limitation is that fixity is local to a basic structure expression -- in particular, to such an expression occurring as a functor body.

3.4 Grammar for Modules The phrase classes for Modules are shown in Figure 5. We use the variable strexp to range over StrExp, etc. The conventions adopted in presenting the grammatical rules for Modules are the same as for the Core. The grammatical rules are shown in Figures 6, 7 and 8.

12 3 SYNTAX OF MODULES

StrExp structure expressions StrDec structure-level declarations StrBind structure bindings

SigExp signature expressions SigDec signature declarations SigBind signature bindings

Spec specifications ValDesc value descriptions TypDesc type descriptions DatDesc datatype descriptions ConDesc constructor descriptions ExDesc exception descriptions StrDesc structure descriptions

FunDec functor declarations FunBind functor bindings TopDec top-level declarations

Figure 5: Modules Phrase Classes

3.5 Syntactic Restrictions

ffl No binding strbind , sigbind , or funbind may bind the same identifier twice.

ffl No description valdesc , typdesc, datdesc, exdesc or strdesc may describe the same

identifier twice; this applies also to value identifiers within a datdesc.

ffl No tyvarseq may contain the same tyvar twice. ffl For each spec of the form datatype tyvarseq tycon = datatype tyvarseq 0 longtycon ,

the sequences tyvarseq and tyvarseq0 must be equal.

ffl Any tyvar occurring on the right side of a datdesc of the form

tyvarseq tycon = \Delta \Delta \Delta  must occur in the tyvarseq; similarly, in signature expressions of the form sigexp where type tyvarseq longtycon = ty, any tyvar occurring in ty must occur in tyvarseq.

ffl No datdesc or exdesc may describe true, false, it, nil, :: or ref.

3.5 Syntactic Restrictions 13

strexp ::= struct strdec end basic

longstrid structure identifier strexp:sigexp transparent constraint strexp:?sigexp opaque constraint funid ( strexp ) functor application let strdec in strexp end local declaration

strdec ::= dec declaration

structure strbind structure local strdec1 in strdec2 end local

empty strdec1 h;i strdec2 sequential

strbind ::= strid = strexp hand strbind i sigexp ::= sig spec end basic

sigid signature identifier sigexp where type tyvarseq longtycon = ty type realisation

sigdec ::= signature sigbind sigbind ::= sigid = sigexp hand sigbind i

Figure 6: Grammar: Structure and Signature Expressions

14 3 SYNTAX OF MODULES

spec ::= val valdesc value

type typdesc type eqtype typdesc eqtype datatype datdesc datatype datatype tyvarseq tycon =

datatype tyvarseq longtycon replication exception exdesc exception structure strdesc structure include sigexp include

empty spec1 h;i spec2 sequential spec sharing type longtycon 1 = \Delta \Delta \Delta  = longtycon n sharing

(n * 2)

valdesc ::= vid : ty hand valdesc i typdesc ::= tyvarseq tycon hand typdesci datdesc ::= tyvarseq tycon = condesc hand datdesci condesc ::= vid hof tyi h -- condesci exdesc ::= vid hof tyi hand exdesc i strdesc ::= strid : sigexp hand strdesci

Figure 7: Grammar: Specifications

fundec ::= functor funbind funbind ::= funid ( strid : sigexp ) = strexp functor binding

hand funbind i

topdec ::= strdec htopdeci structure-level declaration

sigdec htopdeci signature declaration fundec htopdeci functor declaration

Note: No topdec may contain, as an initial segment, a strdec followed by a semicolon.

Figure 8: Grammar: Functors and Top-level Declarations

15 4 Static Semantics for the Core Our first task in presenting the semantics - whether for Core or Modules, static or dynamic - is to define the objects concerned. In addition to the class of syntactic objects, which we have already defined, there are classes of so-called semantic objects used to describe the meaning of the syntactic objects. Some classes contain simple semantic objects; such objects are usually identifiers or names of some kind. Other classes contain compound semantic objects, such as types or environments, which are constructed from component objects.

4.1 Simple Objects All semantic objects in the static semantics of the entire language are built from identifiers and two further kinds of simple objects: type constructor names and identifier status descriptors. Type constructor names are the values taken by type constructors; we shall usually refer to them briefly as type names, but they are to be clearly distinguished from type variables and type constructors. The simple object classes, and the variables ranging over them, are shown in Figure 9. We have included TyVar in the table to make visible the use of ff in the semantics to range over TyVar.

ff or tyvar 2 TyVar type variables

t 2 TyName type names is 2 IdStatus = fc; e; vg identifier status descriptors

Figure 9: Simple Semantic Objects Each ff 2 TyVar possesses a boolean equality attribute, which determines whether or not it admits equality, i.e. whether it is a member of EtyVar (defined on page 5).

Each t 2 TyName has an arity k * 0, and also possesses an equality attribute. We denote the class of type names with arity k by TyName(k).

With each special constant scon we associate a type name type(scon) which is either int, real, word, char or string as indicated by Section 2.2. (However, see Appendix E concerning types of overloaded special constants.)

4.2 Compound Objects When A and B are sets Fin A denotes the set of finite subsets of A, and A fin! B denotes the set of finite maps (partial functions with finite domain) from A to B. The domain and range of a finite map, f , are denoted Dom f and Ran f . A finite map will often be written explicitly in the form fa1 7! b1; \Delta \Delta \Delta ; ak 7! bkg; k * 0; in particular the empty map is fg. We shall use the form fx 7! e ; OEg - a form of set comprehension - to stand for

16 4 STATIC SEMANTICS FOR THE CORE the finite map f whose domain is the set of values x which satisfy the condition OE, and whose value on this domain is given by f (x) = e.

When f and g are finite maps the map f + g, called f modified by g, is the finite map with domain Dom f [ Dom g and values

(f + g)(a) = if a 2 Dom g then g(a) else f (a). The compound objects for the static semantics of the Core Language are shown in Figure 10. We take [ to mean disjoint union over semantic object classes. We also understand all the defined object classes to be disjoint.

o/ 2 Type = TyVar [ RowType [ FunType [ ConsType (o/1; \Delta \Delta \Delta ; o/k) or o/ (k) 2 Typek (ff1; \Delta \Delta \Delta ; ffk) or ff(k) 2 TyVark

% 2 RowType = Lab fin! Type o/ ! o/ 0 2 FunType = Type \Theta  Type

ConsType = [k*0ConsType(k) o/ (k)t 2 ConsType(k) = Typek \Theta  TyName(k) ` or \Lambda ff(k):o/ 2 TypeFcn = [k*0TyVark \Theta  Type

oe or 8ff(k):o/ 2 TypeScheme = [k*0TyVark \Theta  Type

(`; VE) 2 TyStr = TypeFcn \Theta  ValEnv

SE 2 StrEnv = StrId fin! Env TE 2 TyEnv = TyCon fin! TyStr VE 2 ValEnv = VId fin! TypeScheme \Theta  IdStatus E or (SE; TE ; VE ) 2 Env = StrEnv \Theta  TyEnv \Theta  ValEnv

T 2 TyNameSet = Fin(TyName) U 2 TyVarSet = Fin(TyVar) C or T; U; E 2 Context = TyNameSet \Theta  TyVarSet \Theta  Env

Figure 10: Compound Semantic Objects Note that \Lambda  and 8 bind type variables. For any semantic object A, tynames A and tyvars A denote respectively the set of type names and the set of type variables occurring free in A.

Also note that a value environment maps value identifiers to a pair of a type scheme and an identifier status. If VE(vid ) = (oe; is), we say that vid has status is in VE. An occurrence of a value identifier which is elaborated in VE is referred to as a value variable, a value constructor or an exception constructor, depending on whether its status in VE is v, c or e, respectively.

4.3 Projection, Injection and Modification 17 4.3 Projection, Injection and Modification Projection: We often need to select components of tuples - for example, the valueenvironment component of a context. In such cases we rely on metavariable names to indicate which component is selected. For instance "VE of E" means "the value-environment component of E".

Moreover, when a tuple contains a finite map we shall "apply" the tuple to an argument, relying on the syntactic class of the argument to determine the relevant function. For instance C(tycon) means (TE of C)tycon and C(vid ) means (VE of (E of C))(vid ).

Finally, environments may be applied to long identifiers. For instance if longvid = strid 1:\Delta \Delta \Delta :strid k:vid then E(longvid ) means

(VE of (SE of \Delta \Delta \Delta (SE of (SE of E)strid1)strid 2\Delta \Delta \Delta )stridk)vid :

Injection: Components may be injected into tuple classes; for example, "VE in Env" means the environment (fg; fg; VE).

Modification: The modification of one map f by another map g, written f + g, has already been mentioned. It is commonly used for environment modification, for example E + E0. Often, empty components will be left implicit in a modification; for example E + VE means E + (fg; fg; VE). For set components, modification means union, so that C + (T; VE) means

( (T of C) [ T; U of C; (E of C) + VE )

Finally, we frequently need to modify a context C by an environment E (or a type environment TE say), at the same time extending T of C to include the type names of E (or of TE say). We therefore define C \Phi  TE, for example, to mean C + (tynames TE; TE).

4.4 Types and Type functions A type o/ is an equality type, or admits equality, if it is of one of the forms

ffl ff, where ff admits equality; ffl flab1 7! o/1; \Delta \Delta \Delta ; labn 7! o/ng, where each o/i admits equality; ffl o/ (k)t, where t and all members of o/ (k) admit equality; ffl (o/ 0)ref. A type function ` = \Lambda ff(k):o/ has arity k; it must be closed - i.e. tyvars(o/ ) ` ff(k) - and the bound variables must be distinct. Two type functions are considered equal if they only differ in their choice of bound variables (alpha-conversion). In particular, the equality attribute has no significance in a bound variable of a type function; for example, \Lambda ff:ff ! ff and \Lambda fi:fi ! fi are equal type functions even if ff admits equality but fi does not. If t has arity k, then we write t to mean \Lambda ff(k):ff(k)t (eta-conversion); thus

18 4 STATIC SEMANTICS FOR THE CORE TyName ` TypeFcn. ` = \Lambda ff(k):o/ is an equality type function, or admits equality, if when the type variables ff(k) are chosen to admit equality then o/ also admits equality.

We write the application of a type function ` to a vector o/ (k) of types as o/ (k)`. If ` = \Lambda ff(k):o/ we set o/ (k)` = o/ fo/ (k)=ff(k)g (beta-conversion).

We write o/ f`(k)=t(k)g for the result of substituting type functions `(k) for type names t(k) in o/ . We assume that all beta-conversions are carried out after substitution, so that for example

(o/ (k)t)f\Lambda ff(k):o/ =tg = o/ fo/ (k)=ff(k)g:

4.5 Type Schemes A type scheme oe = 8ff(k):o/ generalises a type o/ 0, written oe O/ o/ 0, if o/ 0 = o/ fo/ (k)=ff(k)g for some o/ (k), where each member o/i of o/ (k) admits equality if ffi does. If oe0 = 8fi(l):o/ 0 then oe generalises oe0, written oe O/ oe0, if oe O/ o/ 0 and fi(l) contains no free type variable of oe. It can be shown that oe O/ oe0 iff, for all o/ 00, whenever oe0 O/ o/ 00 then also oe O/ o/ 00.

Two type schemes oe and oe0 are considered equal if they can be obtained from each other by renaming and reordering of bound type variables, and deleting type variables from the prefix which do not occur in the body. Here, in contrast to the case for type functions, the equality attribute must be preserved in renaming; for example 8ff:ff ! ff and 8fi:fi ! fi are only equal if either both ff and fi admit equality, or neither does. It can be shown that oe = oe0 iff oe O/ oe0 and oe0 O/ oe.

We consider a type o/ to be a type scheme, identifying it with 8():o/ .

4.6 Scope of Explicit Type Variables In the Core language, a type or datatype binding can explicitly introduce type variables whose scope is that binding. Moreover, in a value declaration val tyvarseq valbind , the sequence tyvarseq binds type variables: a type variable occurs free in val tyvarseq valbind iff it occurs free in valbind and is not in the sequence tyvarseq. However, explicit binding of type variables at val is optional, so we still have to account for the scope of an explicit type variable occurring in the ": ty" of a typed expression or pattern or in the "of ty" of an exception binding. For the rest of this section, we consider such free occurrences of type variables only.

Every occurrence of a value declaration is said to scope a set of explicit type variables determined as follows.

First, a free occurrence of ff in a value declaration val tyvarseq valbind is said to be unguarded if the occurrence is not part of a smaller value declaration within valbind . In this case we say that ff occurs unguarded in the value declaration.

Then we say that ff is implicitly scoped at a particular value declaration val tyvarseq valbind in a program if (1) ff occurs unguarded in this value declaration, and (2) ff does not occur unguarded in any larger value declaration containing the given one.

4.7 Non-expansive Expressions 19

Henceforth, we assume that for every value declaration val tyvarseq \Delta \Delta \Delta  occurring in the program, every explicit type variable implicitly scoped at the val has been added to tyvarseq . Thus for example, in the two declarations

val x = let val id:'a-?'a = fn z=?z in id id end val x = (let val id:'a-?'a = fn z=?z in id id end; fn z=?z:'a)

the type variable 'a is scoped differently; they become respectively

val x = let val 'a id:'a-?'a = fn z=?z in id id end val 'a x = (let val id:'a-?'a = fn z=?z in id id end; fn z=?z:'a)

Then, according to the inference rules in Section 4.10 the first example can be elaborated, but the second cannot since 'a is bound at the outer value declaration leaving no possibility of two different instantiations of the type of id in the application id id.

4.7 Non-expansive Expressions In order to treat polymorphic references and exceptions, the set Exp of expressions is partitioned into two classes, the expansive and the non-expansive expressions. An expression is non-expansive in context C if, after replacing infixed forms by their equivalent prefixed forms, and derived forms by their equivalent forms, it can be generated by the following grammar from the non-terminal nexp:

nexp ::= scon

hopilongvid -hnexprow i"" (nexp) conexp nexp nexp:ty fn match nexprow ::= lab = nexph, nexprow i

conexp ::= (conexph:tyi)

hopilongvid Restriction: longvid 6= ref and

is of C(longvid ) 2 fc; eg

All other expressions are said to be expansive (in C). The idea is that the dynamic evaluation of a non-expansive expression will neither generate an exception nor extend the domain of the memory, while the evaluation of an expansive expression might.

4.8 Closure Let o/ be a type and A a semantic object. Then ClosA(o/ ), the closure of o/ with respect to A, is the type scheme 8ff(k):o/ , where ff(k) = tyvars(o/ ) n tyvars A. Commonly, A will be a context C. We abbreviate the total closure Closfg(o/ ) to Clos(o/ ). If the range of a value environment VE contains only types (rather than arbitrary type schemes) we set

ClosAVE = fvid 7! (ClosA(o/ ); is) ; VE(vid ) = (o/; is)g

20 4 STATIC SEMANTICS FOR THE CORE

Closing a value environment VE that stems from the elaboration of a value binding valbind requires extra care to ensure type security of references and exceptions and correct scoping of explicit type variables. Recall that valbind is not allowed to bind the same variable twice. Thus, for each vid 2 Dom VE there is a unique pat = exp in valbind which binds vid . If VE(vid ) = (o/; is), let ClosC;valbind VE(vid ) = (8ff(k):o/; is), where

ff(k) = ( tyvars o/ n tyvars C; if exp is non-expansive in C;(); if exp is expansive in C.

4.9 Type Structures and Type Environments A type structure (`; VE) is well-formed if either VE = fg, or ` is a type name t. (The latter case arises, with VE 6= fg, in datatype declarations.) An object or assembly A of semantic objects is well-formed if every type structure occurring in A is well-formed. All type structures occurring in elaborations are required to be well-formed.

A type structure (t; VE) is said to respect equality if, whenever t admits equality, then either t = ref (see Appendix C) or, for each VE(vid ) of the form (8ff(k):(o/ ! ff(k)t); is), the type function \Lambda ff(k):o/ also admits equality. (This ensures that the equality predicate = will be applicable to a constructed value (vid ; v) of type o/ (k)t only when it is applicable to the value v itself, whose type is o/ fo/ (k)=ff(k)g.) A type environment TE respects equality if all its type structures do so.

Let TE be a type environment, and let T be the set of type names t such that (t; VE) occurs in TE for some VE 6= fg. Then TE is said to maximise equality if (a) TE respects equality, and also (b) if any larger subset of T were to admit equality (without any change in the equality attribute of any type names not in T ) then TE would cease to respect equality.

For any TE of the form

TE = ftyconi 7! (ti; VEi) ; 1 ^ i ^ kg; where no VEi is the empty map, and for any E we define Abs(TE; E) to be the environment obtained from E and TE as follows. First, let Abs(TE) be the type environment ftyconi 7! (ti; fg) ; 1 ^ i ^ kg in which all value environments VEi have been replaced by the empty map. Let t01; \Delta \Delta \Delta ; t0k be new distinct type names none of which admit equality. Then Abs(TE; E) is the result of simultaneously substituting t0i for ti, 1 ^ i ^ k, throughout Abs(TE) + E. (The effect of the latter substitution is to ensure that the use of equality on an abstype is restricted to the with part.)

4.10 Inference Rules Each rule of the semantics allows inferences among sentences of the form

A ` phrase ) A0

4.10 Inference Rules 21 where A is usually an environment or a context, phrase is a phrase of the Core, and A0 is a semantic object - usually a type or an environment. It may be pronounced "phrase elaborates to A0 in (context or environment) A". Some rules have extra hypotheses not of this form; they are called side conditions.

In the presentation of the rules, phrases within single angle brackets h i are called first options, and those within double angle brackets hh ii are called second options. To reduce the number of rules, we have adopted the following convention:

In each instance of a rule, the first options must be either all present or all absent; similarly the second options must be either all present or all absent.

Although not assumed in our definitions, it is intended that every context C = T; U; E has the property that tynames E ` T . Thus T may be thought of, loosely, as containing all type names which "have been generated". It is necessary to include T as a separate component in a context, since tynames E may not contain all the type names which have been generated; one reason is that a context T; ;; E is a projection of the basis B = T; F; G; E whose other components F and G could contain other such names - recorded in T but not present in E. Of course, remarks about what "has been generated" are not precise in terms of the semantic rules. But the following precise result may easily be demonstrated:

Let S be a sentence T; U; E ` phrase ) A such that tynames E ` T , and let S0 be a sentence T 0; U 0; E0 ` phrase0 ) A0 occurring in a proof of S; then also tynames E0 ` T 0.

Atomic Expressions C ` atexp ) o/

C ` scon ) type(scon) (1) C(longvid ) = (oe; is) oe O/ o/

C ` longvid ) o/ (2)

hC ` exprow ) %i C ` - hexprow i "" ) fgh+ %i in Type (3)

C ` dec ) E C \Phi  E ` exp ) o/ tynames o/ ` T of C

C ` let dec in exp end ) o/ (4)

C ` exp ) o/ C ` ( exp ) ) o/ (5) Comments:

(2) The instantiation of type schemes allows different occurrences of a single longvid to

assume different types. Note that the identifier status is not used in this rule.

22 4 STATIC SEMANTICS FOR THE CORE

(4) The use of \Phi , here and elsewhere, ensures that type names generated by the first

sub-phrase are different from type names generated by the second sub-phrase.The side condition prevents type names generated by dec from escaping outside the local declaration.

Expression Rows C ` exprow ) %

C ` exp ) o/ hC ` exprow ) %i C ` lab = exp h , exprowi ) flab 7! o/ gh+ %i (6)

Expressions C ` exp ) o/

C ` atexp ) o/ C ` atexp ) o/ (7)

C ` exp ) o/ 0 ! o/ C ` atexp ) o/ 0

C ` exp atexp ) o/ (8)

C ` exp ) o/ C ` ty ) o/

C ` exp : ty ) o/ (9)

C ` exp ) o/ C ` match ) exn ! o/

C ` exp handle match ) o/ (10)

C ` exp ) exn C ` raise exp ) o/ (11)

C ` match ) o/ C ` fn match ) o/ (12) Comments:

(7) The relational symbol ` is overloaded for all syntactic classes (here atomic expressions

and expressions).

(9) Here o/ is determined by C and ty . Notice that type variables in ty cannot be instan

tiated in obtaining o/ ; thus the expression 1:'a will not elaborate successfully, nor will the expression (fn x=?x):'a-?'b. The effect of type variables in an explicitly typed expression is to indicate exactly the degree of polymorphism present in the expression.

(11) Note that o/ does not occur in the premise; thus a raise expression has "arbitrary"

type.

Matches C ` match ) o/

C ` mrule ) o/ hC ` match ) o/ i

C ` mrule h -- matchi ) o/ (13)

4.10 Inference Rules 23 Match Rules C ` mrule ) o/

C ` pat ) (VE; o/ ) C + VE ` exp ) o/ 0 tynames VE ` T of C

C ` pat =? exp ) o/ ! o/ 0 (14)

Comment: This rule allows new free type variables to enter the context. These new type variables will be chosen, in effect, during the elaboration of pat (i.e., in the inference of the first hypothesis). In particular, their choice may have to be made to agree with type variables present in any explicit type expression occurring within exp (see rule 9).

Declarations C ` dec ) E

U = tyvars(tyvarseq ) C + U ` valbind ) VE VE0 = ClosC;valbind VE U " tyvars VE0 = ;

C ` val tyvarseq valbind ) VE0 in Env (15)

C ` typbind ) TE C ` type typbind ) TE in Env (16)

C \Phi  TE ` datbind ) VE; TE 8(t; VE0) 2 Ran TE; t =2 (T of C)

TE maximises equality

C ` datatype datbind ) (VE; TE) in Env (17)

C(longtycon) = (`; VE) ` = \Lambda ff(k):o/ tyvarseq = ff(k) TE = ftycon 7! (`; VE)g

C ` datatype tyvarseq tycon = datatype tyvarseq longtycon ) (VE; TE) in Env (18)

C \Phi  TE ` datbind ) VE; TE 8(t; VE0) 2 Ran TE; t =2 (T of C)

C \Phi  (VE; TE) ` dec ) E TE maximises equality

C ` abstype datbind with dec end ) Abs(TE; E) (19)

C ` exbind ) VE C ` exception exbind ) VE in Env (20)

C ` dec1 ) E1 C \Phi  E1 ` dec2 ) E2

C ` local dec1 in dec2 end ) E2 (21)

C(longstrid 1) = E1 \Delta \Delta \Delta  C(longstrid n) = En C ` open longstrid 1 \Delta \Delta \Delta  longstrid n ) E1 + \Delta \Delta \Delta  + En (22)

C ` ) fg in Env (23)

24 4 STATIC SEMANTICS FOR THE CORE

C ` dec1 ) E1 C \Phi  E1 ` dec2 ) E2

C ` dec1 h;i dec2 ) E1 + E2 (24) Comments:

(15) Here VE will contain types rather than general type schemes. The closure of VE

allows value identifiers to be used polymorphically, via rule 2.

The side-condition on U ensures that the type variables in tyvarseq are bound by the closure operation, if they occur free in the range of VE.

On the other hand, if the phrase val tyvarseq valbind occurs inside some larger value binding val tyvarseq0 valbind 0 then no type variable ff listed in tyvarseq0 will become bound by the ClosC;valbind VE operation; for ff must be in U of C and hence excluded from closure by the definition of the closure operation (Section 4.8, page 20) since U of C ` tyvars C.

(17),(19) The side conditions express that the elaboration of each datatype binding

generates new type names and that as many of these new names as possible admit equality. Adding TE to the context on the left of the ` captures the recursive nature of the binding.

(18) Note that no new type name is generated (i.e., datatype replication is not genera

tive). By the syntactic restriction in Section 2.9 the two type variable sequences in the conclusion must be equal.

(19) The Abs operation was defined in Section 4.9, page 20. (20) No closure operation is used here, since VE maps exception constructors to types

rather than to general type schemes.

Value Bindings C ` valbind ) VE

C ` pat ) (VE; o/ ) C ` exp ) o/ hC ` valbind ) VE0i

C ` pat = exp hand valbind i ) VE h+ VE0i (25)

C + VE ` valbind ) VE tynames VE ` T of C

C ` rec valbind ) VE (26) Comments:

(25) When the option is present we have Dom VE " Dom VE0 = ; by the syntactic

restrictions.

(26) Modifying C by VE on the left captures the recursive nature of the binding. From

rule 25 we see that any type scheme occurring in VE will have to be a type. Thus each use of a recursive function in its own body must be ascribed the same type. Also note that C + VE may overwrite identifier status. For example, the program datatype t = f; val rec f = fn x =? x; is legal.

4.10 Inference Rules 25 Type Bindings C ` typbind ) TE

tyvarseq = ff(k) C ` ty ) o/ hC ` typbind ) TEi

C ` tyvarseq tycon = ty hand typbind i )

ftycon 7! (\Lambda ff(k):o/; fg)g h+ TEi

(27)

Comment: The syntactic restrictions ensure that the type function \Lambda ff(k):o/ satisfies the well-formedness constraints of Section 4.4 and they ensure tycon =2 Dom TE.

Datatype Bindings C ` datbind ) VE; TE

tyvarseq = ff(k) C; ff(k)t ` conbind ) VE hC ` datbind 0 ) VE0; TE0 8(t0; VE00) 2 Ran TE0; t 6= t0i

C ` tyvarseq tycon = conbind hand datbind 0i )

(ClosVEh+ VE0i; ftycon 7! (t; ClosVE)g h+ TE0i

(28)

Comment: The syntactic restrictions ensure Dom VE " Dom VE0 = ; and tycon =2 Dom TE0.

Constructor Bindings C; o/ ` conbind ) VE

hC ` ty ) o/ 0i hhC; o/ ` conbind ) VEii

C; o/ ` vid hof ty i hh -- conbindii ) fvid 7! (o/; c)g h+ fvid 7! (o/ 0 ! o/; c)g i hh+ VEii

(29)

Comment: By the syntactic restrictions vid =2 Dom VE. Exception Bindings C ` exbind ) VE

hC ` ty ) o/ i hhC ` exbind ) VEii C ` vid hof tyi hhand exbind ii )

fvid 7! (exn; e)g h+ fvid 7! (o/ ! exn; e)g i hh+ VEii

(30)

C(longvid ) = (o/; e) hC ` exbind ) VEi C ` vid = longvid hand exbind i ) fvid 7! (o/; e)g h+ VEi (31)

Comments:

(30) Notice that o/ may contain type variables. (30),(31) For each C and exbind , there is at most one VE satisfying C ` exbind ) VE.

26 4 STATIC SEMANTICS FOR THE CORE Atomic Patterns C ` atpat ) (VE; o/ )

C ` ) (fg; o/ ) (32) C ` scon ) (fg; type(scon)) (33) vid =2 Dom(C) or is of C(vid ) = v

C ` vid ) (fvid 7! (o/; v)g; o/ ) (34)

C(longvid ) = (oe; is) is 6= v oe O/ o/ (k)t

C ` longvid ) (fg; o/ (k)t) (35)

hC ` patrow ) (VE; %)i C ` - hpatrow i "" ) ( fgh+ VEi; fgh+ %i in Type ) (36)

C ` pat ) (VE; o/ ) C ` ( pat ) ) (VE; o/ ) (37) Comments:

(34), (35) The context C determines which of these two rules applies. In rule 34, note

that vid can assume a type, not a general type scheme.

Pattern Rows C ` patrow ) (VE; %)

C ` ... ) (fg; %) (38) C ` pat ) (VE; o/ ) hC ` patrow ) (VE0; %) Dom VE " Dom VE0 = ; lab =2 Dom %i

C ` lab = pat h , patrowi ) (VEh+ VE0i; flab 7! o/ gh+ %i) (39)

Patterns C ` pat ) (VE; o/ )

C ` atpat ) (VE; o/ ) C ` atpat ) (VE; o/ ) (40)

C(longvid ) = (oe; is) is 6= v oe O/ o/ 0 ! o/ C ` atpat ) (VE; o/ 0)

C ` longvid atpat ) (VE; o/ ) (41)

C ` pat ) (VE; o/ ) C ` ty ) o/

C ` pat : ty ) (VE; o/ ) (42)

4.11 Further Restrictions 27

vid =2 Dom(C) or is of C(vid ) = v hC ` ty ) o/ i C ` pat ) (VE; o/ ) vid =2 Dom VE

C ` vidh: tyi as pat ) (fvid 7! (o/; v)g + VE; o/ ) (43)

Type Expressions C ` ty ) o/

tyvar = ff C ` tyvar ) ff (44)

hC ` tyrow ) %i C ` - htyrow i "" ) fgh+ %i in Type (45)

tyseq = ty1\Delta \Delta \Delta tyk C ` tyi ) o/i (1 ^ i ^ k)

C(longtycon) = (`; VE)

C ` tyseq longtycon ) o/ (k)` (46)

C ` ty ) o/ C ` ty 0 ) o/ 0

C ` ty -? ty0 ) o/ ! o/ 0 (47)

C ` ty ) o/ C ` ( ty ) ) o/ (48)

Comments:

(46) Recall that for o/ (k)` to be defined, ` must have arity k.

Type-expression Rows C ` tyrow ) %

C ` ty ) o/ hC ` tyrow ) %i C ` lab : ty h , tyrowi ) flab 7! o/ gh+ %i (49)

Comment: The syntactic constraints ensure lab =2 Dom %.

4.11 Further Restrictions There are a few restrictions on programs which should be enforced by a compiler, but are better expressed apart from the preceding Inference Rules. They are:

1. For each occurrence of a record pattern containing a record wildcard, i.e. of the

form -lab1=pat1,\Delta \Delta \Delta ,labm=patm,..."" the program context must determine uniquely the domain flab1; \Delta \Delta \Delta ; labng of its row type, where m ^ n; thus, the context must determine the labels flabm+1; \Delta \Delta \Delta ; labng of the fields to be matched by the wildcard. For this purpose, an explicit type constraint may be needed.

28 4 STATIC SEMANTICS FOR THE CORE

2. In a match of the form pat 1 =? exp1 -- \Delta \Delta \Delta  -- patn =? expn the pattern sequence

pat1; . . . ; patn should be irredundant; that is, each patj must match some value (of the right type) which is not matched by pati for any i ! j. In the context fn match, the match must also be exhaustive; that is, every value (of the right type) must be matched by some pati. The compiler must give warning on violation of these restrictions, but should still compile the match. The restrictions are inherited by derived forms; in particular, this means that in the function-value binding vid atpat1 \Delta \Delta \Delta  atpat nh: tyi = exp (consisting of one clause only), each separate atpat i should be exhaustive by itself.

29 5 Static Semantics for Modules 5.1 Semantic Objects The simple objects for Modules static semantics are exactly as for the Core. The compound objects are those for the Core, augmented by those in Figure 11.

\Sigma  or (T)E 2 Sig = TyNameSet \Theta  Env \Phi  or (T)(E; (T 0)E0) 2 FunSig = TyNameSet \Theta  (Env \Theta  Sig)

G 2 SigEnv = SigId fin! Sig F 2 FunEnv = FunId fin! FunSig B or T; F; G; E 2 Basis = TyNameSet \Theta  FunEnv \Theta  SigEnv \Theta  Env

Figure 11: Further Compound Semantic Objects The prefix (T ), in signatures and functor signatures, binds type names. Certain operations require a change of bound names in semantic objects; see for example Section 5.2. When bound type names are changed, we demand that all of their attributes (i.e. equality and arity) are preserved.

The operations of projection, injection and modification are as for the Core. Moreover, we define C of B to be the context (T of B; ;; E of B), i.e. with an empty set of explicit type variables. Also, we frequently need to modify a basis B by an environment E (or a structure environment SE say), at the same time extending T of B to include the type names of E (or of SE say). We therefore define B \Phi  SE, for example, to mean B + (tynames SE; SE).

There is no separate kind of semantic object to represent structures: structure expressions elaborate to environments, just as structure-level declarations do. Thus, notions which are commonly associated with structures (for example the notion of matching a structure against a signature) are defined in terms of environments.

5.2 Type Realisation A (type) realisation is a map ' : TyName ! TypeFcn such that t and '(t) have the same arity, and if t admits equality then so does '(t).

The support Supp ' of a type realisation ' is the set of type names t for which '(t) 6= t. The yield Yield ' of a realisation ' is the set of type names which occur in some '(t) for which t 2 Supp '.

Realisations ' are extended to apply to all semantic objects; their effect is to replace each name t by '(t). In applying ' to an object with bound names, such as a signature (T)E, first bound names must be changed so that, for each binding prefix (T ),

T " (Supp ' [ Yield ') = ; :

30 5 STATIC SEMANTICS FOR MODULES 5.3 Signature Instantiation An environment E2 is an instance of a signature \Sigma 1 = (T1)E1, written \Sigma 1*E2, if there exists a realisation ' such that '(E1) = E2 and Supp ' ` T1.

5.4 Functor Signature Instantiation A pair (E; (T 0)E0) is called a functor instance. Given \Phi  = (T1)(E1; (T 01)E01), a functor instance (E2; (T 02)E02) is an instance of \Phi , written \Phi *(E2; (T 02)E02), if there exists a realisation ' such that '(E1; (T 01)E01) = (E2; (T 02)E02) and Supp ' ` T1.

5.5 Enrichment In matching an environment to a signature, the environment will be allowed both to have more components, and to be more polymorphic, than (an instance of) the signature. Precisely, we define enrichment of environments and type structures recursively as follows.

An environment E1 = (SE1; TE1; VE1) enriches another environment E2 = (SE2; TE2; VE2), written E1 O/ E2, if

1. Dom SE1 ' Dom SE2, and SE1(strid ) O/ SE2(strid ) for all strid 2 Dom SE2 2. Dom TE1 ' Dom TE2, and TE1(tycon) O/ TE2(tycon) for all tycon 2 Dom TE2 3. Dom VE1 ' Dom VE2, and VE1(vid ) O/ VE2(vid ) for all vid 2 Dom VE2, where

(oe1; is1) O/ (oe2; is2) means oe1 O/ oe2 and

is1 = is2 or is2 = v

Finally, a type structure (`1; VE1) enriches another type structure (`2; VE2), written (`1; VE1) O/ (`2; VE2), if

1. `1 = `2 2. Either VE1 = VE2 or VE2 = fg

5.6 Signature Matching An environment E matches a signature \Sigma 1 if there exists an environment E\Gamma  such that \Sigma 1 * E\Gamma  OE E. Thus matching is a combination of instantiation and enrichment. There is at most one such E\Gamma , given \Sigma 1 and E.

5.7 Inference Rules 31 5.7 Inference Rules As for the Core, the rules of the Modules static semantics allow sentences of the form

A ` phrase ) A0 to be inferred, where in this case A is either a basis, a context or an environment and A0 is a semantic object. The convention for options is as in the Core semantics.

Although not assumed in our definitions, it is intended that every basis B = T; F; G; E in which a topdec is elaborated has the property that tynames F [tynames G[tynames E ` T . The following Theorem can be proved:

Let S be an inferred sentence B ` topdec ) B0 in which B satisfies the above condition. Then B0 also satisfies the condition.

Moreover, if S0 is a sentence of the form B00 ` phrase ) A occurring in a proof of S, where phrase is any Modules phrase, then B00 also satisfies the condition.

Finally, if T; U; E ` phrase ) A occurs in a proof of S, where phrase is a phrase of Modules or of the Core, then tynames E ` T .

Structure Expressions B ` strexp ) E

B ` strdec ) E B ` struct strdec end ) E (50)

B(longstrid ) = E B ` longstrid ) E (51)

B ` strexp ) E B ` sigexp ) \Sigma  \Sigma  * E0 OE E

B ` strexp:sigexp ) E0 (52)

B ` strexp ) E B ` sigexp ) (T 0)E0

(T 0)E0 * E00 OE E T 0 " (T of B) = ;

B ` strexp:?sigexp ) E0 (53)

B ` strexp ) E B(funid )*(E00; (T 0)E0) ; E O/ E00

(tynames E [ T of B) " T 0 = ;

B ` funid ( strexp ) ) E0 (54)

B ` strdec ) E1 B \Phi  E1 ` strexp ) E2

B ` let strdec in strexp end ) E2 (55)

32 5 STATIC SEMANTICS FOR MODULES

Comments:

(54) The side condition (tynames E [T of B)"T 0 = ; can always be satisfied by renaming

bound names in (T 0)E0; it ensures that the generated datatypes receive new names.

Let B(funid ) = (T )(Ef ; (T 0)E0f ). Let ' be a realisation such that '(Ef ; (T 0)E0f ) = (E00; (T 0)E0). Sharing between argument and result specified in the declaration of the functor funid is represented by the occurrence of the same name in both Ef and E0f , and this repeated occurrence is preserved by ', yielding sharing between the argument structure E and the result structure E0 of this functor application.

(55) The use of \Phi , here and elsewhere, ensures that type names generated by the first

sub-phrase are distinct from names generated by the second sub-phrase.

Structure-level Declarations B ` strdec ) E

C of B ` dec ) E

B ` dec ) E (56)

B ` strbind ) SE B ` structure strbind ) SE in Env (57)

B ` strdec1 ) E1 B \Phi  E1 ` strdec2 ) E2

B ` local strdec1 in strdec2 end ) E2 (58)

B ` ) fg in Env (59) B ` strdec1 ) E1 B \Phi  E1 ` strdec2 ) E2

B ` strdec1 h;i strdec2 ) E1 + E2 (60)

Structure Bindings B ` strbind ) SE

B ` strexp ) E hB + tynames E ` strbind ) SEi B ` strid = strexp hand strbind i ) fstrid 7! Eg h+ SEi (61)

Signature Expressions B ` sigexp ) E

B ` spec ) E B ` sig spec end ) E (62)

5.7 Inference Rules 33

B(sigid) = (T )E T " (T of B) = ;

B ` sigid ) E (63)

B ` sigexp ) E tyvarseq = ff(k) C of B ` ty ) o/

E(longtycon ) = (t; VE) t =2 (T of B) [ tynames o/ ' = ft 7! \Lambda ff(k):o/ g \Lambda ff(k):o/ admits equality, if t does '(E) well-formed

B ` sigexp where type tyvarseq longtycon = ty ) '(E) (64) Comments:

(63) The bound names of B(sigid ) can always be renamed to satisfy T " (T of B) = ;,

if necessary.

B ` sigexp ) \Sigma  B ` sigexp ) E T = tynames E n (T of B)

B ` sigexp ) (T )E (65)

Comment: A signature expression sigexp which is an immediate constituent of a signature binding, a signature constraint, or a functor binding is elaborated to a signature, see rules 52, 53, 67 and 86.

Signature Declarations B ` sigdec ) G

B ` sigbind ) G B ` signature sigbind ) G (66)

Signature Bindings B ` sigbind ) G

B ` sigexp ) \Sigma  hB ` sigbind ) Gi B ` sigid = sigexp hand sigbind i ) fsigid 7! \Sigma g h+ Gi (67)

Specifications B ` spec ) E

C of B ` valdesc ) VE B ` val valdesc ) ClosVE in Env (68)

C of B ` typdesc ) TE 8(t; VE) 2 Ran TE; t does not admit equality

B ` type typdesc ) TE in Env (69)

C of B ` typdesc ) TE 8(t; VE) 2 Ran TE; t admits equality

B ` eqtype typdesc ) TE in Env (70)

34 5 STATIC SEMANTICS FOR MODULES

C of B \Phi  TE ` datdesc ) VE; TE 8(t; VE0) 2 Ran TE; t =2 T of B

TE maximises equality

B ` datatype datdesc ) (VE; TE) in Env (71)

B(longtycon ) = (`; VE) ` = \Lambda ff(k):o/ tyvarseq = ff(k) TE = ftycon 7! (`; VE)g

B ` datatype tyvarseq tycon = datatype tyvarseq longtycon ) (VE; TE) in Env (72)

C of B ` exdesc ) VE B ` exception exdesc ) VE in Env (73)

B ` strdesc ) SE B ` structure strdesc ) SE in Env (74)

B ` sigexp ) E B ` include sigexp ) E (75)

B ` ) fg in Env (76) B ` spec1 ) E1 B \Phi  E1 ` spec2 ) E2 Dom(E1) " Dom(E2) = ;

B ` spec1 h;i spec2 ) E1 + E2 (77)

B ` spec ) E E(longtycon i) = (ti; VEi); i = 1::n

t 2 ft1; . . . ; tng t admits equality, if some ti does ft1; . . . ; tng " T of B = ; ' = ft1 7! t; . . . ; tn 7! tg

B ` spec sharing type longtycon 1 = \Delta \Delta \Delta  = longtycon n ) '(E) (78)

Comments: (68) VE is determined by B and valdesc. (69)-(71) The type names in TE are new. (73) VE is determined by B and exdesc and contains monotypes only. (77) Note that no sequential specification is allowed to specify the same identifier twice.

Value Descriptions C ` valdesc ) VE

C ` ty ) o/ hC ` valdesc ) VEi C ` vid : ty hand valdesci ) fvid 7! (o/; v)g h+ VEi (79)

5.7 Inference Rules 35 Type Descriptions C ` typdesc ) TE

tyvarseq = ff(k) t =2 T of C arity t = k

hC ` typdesc ) TE t =2 tynames TEi

C ` tyvarseq tycon hand typdesci ) ftycon 7! (t; fg)g h+ TEi (80)

Comment: Note that the value environment in the resulting type structure must be empty. For example, datatype s=C type t sharing type t=s is a legal specification, but the type structure bound to t does not bind any value constructors.

Datatype Descriptions C ` datdesc ) VE; TE

tyvarseq = ff(k) C; ff(k)t ` condesc ) VE arity t = k hC ` datdesc0 ) VE0; TE0 8(t0; VE00) 2 Ran TE0; t 6= t0i

C ` tyvarseq tycon = condesc hand datdesc0i )

ClosVEh+ VE0i; ftycon 7! (t; ClosVE)g h+ TE0i

(81)

Constructor Descriptions C; o/ ` condesc ) VE

hC ` ty ) o/ 0i hhC; o/ ` condesc ) VEii C; o/ ` vid hof tyi hh -- condescii )

fvid 7! (o/; c)g h+ fvid 7! (o/ 0 ! o/; c)g i hh+ VEii

(82)

Exception Descriptions C ` exdesc ) VE

hC ` ty ) o/ tyvars(o/ ) = ;i hhC ` exdesc ) VEii C ` vid hof tyi hhand exdescii )

fvid 7! (exn; e)g h+ fvid 7! (o/ ! exn; e)gi hh+ VEii

(83)

Structure Descriptions B ` strdesc ) SE

B ` sigexp ) E hB + tynames E ` strdesc ) SEi B ` strid : sigexp hand strdesci ) fstrid 7! Eg h+ SEi (84)

Functor Declarations B ` fundec ) F

B ` funbind ) F B ` functor funbind ) F (85)

36 5 STATIC SEMANTICS FOR MODULES Functor Bindings B ` funbind ) F

B ` sigexp ) (T )E B \Phi  fstrid 7! Eg ` strexp ) E0

T " (T of B) = ; T 0 = tynames E0 n ((T of B) [ T )

hB ` funbind ) F i

B ` funid ( strid : sigexp ) = strexp hand funbindi )

ffunid 7! (T )(E; (T 0)E0)g h+ F i

(86)

Comment: Since \Phi  is used, any type name t in E acts like a constant in the functor body; in particular, it ensures that further names generated during elaboration of the body are distinct from t. The set T 0 is chosen such that every name free in (T )E or (T )(E; (T 0)E0) is free in B.

Top-level Declarations B ` topdec ) B0

B ` strdec ) E hB \Phi  E ` topdec ) B0i B00 = (tynames E; E)in Basis h+B0i tyvars B00 = ;

B ` strdec htopdeci ) B00 (87)

B ` sigdec ) G hB \Phi  G ` topdec ) B0i

B00 = (tynames G; G) in Basis h+B0i

B ` sigdec htopdeci ) B00 (88)

B ` fundec ) F hB \Phi  F ` topdec ) B0i B00 = (tynames F; F ) in Basis h+B0i tyvars B00 = ;

B ` fundec htopdeci ) B00 (89)

Comments: (87)-(89) No free type variables enter the basis: if B ` topdec ) B0 then tyvars(B0) = ;.

37 6 Dynamic Semantics for the Core 6.1 Reduced Syntax Since types are mostly dealt with in the static semantics, the Core syntax is reduced by the following transformations, for the purpose of the dynamic semantics:

ffl All explicit type ascriptions ": ty " are omitted, and qualifications "of ty " are

omitted from constructor and exception bindings.

ffl The Core phrase classes Ty and TyRow are omitted.

6.2 Simple Objects All objects in the dynamic semantics are built from identifier classes together with the simple object classes shown (with the variables which range over them) in Figure 12.

a 2 Addr addresses en 2 ExName exception names

b 2 BasVal basic values sv 2 SVal special values

fFAILg failure

Figure 12: Simple Semantic Objects Addr and ExName are infinite sets. BasVal is described below. SVal is the class of values denoted by the special constants SCon. Each integer, word or real constant denotes a value according to normal mathematical conventions; each string or character constant denotes a sequence of characters as explained in Section 2.2. The value denoted by scon is written val(scon). FAIL is the result of a failing attempt to match a value and a pattern. Thus FAIL is neither a value nor an exception, but simply a semantic object used in the rules to express operationally how matching proceeds.

Exception constructors evaluate to exception names. This is to accommodate the generative nature of exception bindings; each evaluation of a declaration of a exception constructor binds it to a new unique name.

6.3 Compound Objects The compound objects for the dynamic semantics are shown in Figure 13. Many conventions and notations are adopted as in the static semantics; in particular projection, injection and modification all retain their meaning. We generally omit the injection functions taking VId, VId \Theta  Val etc into Val. For records r 2 Record however, we write this

38 6 DYNAMIC SEMANTICS FOR THE CORE

v 2 Val = f:=g [ SVal [ BasVal [ VId

[(VId \Theta  Val) [ ExVal [Record [ Addr [ FcnClosure

r 2 Record = Lab fin! Val

e 2 ExVal = ExName [ (ExName \Theta  Val) [e] or p 2 Pack = ExVal (match; E; VE) 2 FcnClosure = Match \Theta  Env \Theta  ValEnv

mem 2 Mem = Addr fin! Val

ens 2 ExNameSet = Fin(ExName) (mem; ens) or s 2 State = Mem \Theta  ExNameSet (SE; TE; VE) or E 2 Env = StrEnv \Theta  TyEnv \Theta  ValEnv

SE 2 StrEnv = StrId fin! Env TE 2 TyEnv = TyCon fin! ValEnv VE 2 ValEnv = VId fin! Val \Theta  IdStatus

Figure 13: Compound Semantic Objects injection explicitly as "in Val"; this accords with the fact that there is a separate phrase class ExpRow, whose members evaluate to records.

We take [ to mean disjoint union over semantic object classes. We also understand all the defined object classes to be disjoint. A particular case deserves mention; ExVal and Pack (exception values and packets) are isomorphic classes, but the latter class corresponds to exceptions which have been raised, and therefore has different semantic significance from the former, which is just a subclass of values.

Although the same names, e.g. E for an environment, are used as in the static semantics, the objects denoted are different. This need cause no confusion since the static and dynamic semantics are presented separately.

6.4 Basic Values The basic values in BasVal are values bound to predefined value variables. In this document, we take BasVal to be the singleton set f=g; however, libraries may define a larger set of basic values. The meaning of basic values is represented by a function

APPLY : BasVal \Theta  Val ! Val [ Pack which satisfies that APPLY(=; f1 7! v1; 2 7! v2g) is true or false according as the values v1 and v2 are, or are not, identical values.

6.5 Basic Exceptions 39 6.5 Basic Exceptions A subset BasExName ae ExName of the exception names are bound to predefined exception constructors in the initial dynamic basis (see Appendix D). These names are denoted by the identifiers to which they are bound in the initial basis, and are as follows:

Match Bind The exceptions Match and Bind are raised upon failure of pattern-matching in evaluating a function fn match or a valbind , as detailed in the rules to follow. Recall from Section 4.11 that in the context fn match, the match must be irredundant and exhaustive and that the compiler should flag the match if it violates these restrictions. The exception Match can only be raised for a match which is not exhaustive, and has therefore been flagged by the compiler.

For each value binding pat = exp the compiler must issue a report (but still compile) if either pat is not exhaustive or pat contains no variable. This will (on both counts) detect a mistaken declaration like val nil = exp in which the user expects to declare a new variable nil (whereas the language dictates that nil is here a constant pattern, so no variable gets declared). However, these warnings should not be given when the binding is a component of a top-level declaration val valbind; e.g. val x::l = exp1 and y = exp2 is not faulted by the compiler at top level, but may of course generate a Bind exception.

6.6 Function Closures

The informal understanding of a function closure (match; E; VE) is as follows: when the function closure is applied to a value v, match will be evaluated against v, in the environment E modified in a special sense by VE. The domain Dom VE of this third component contains those identifiers to be treated recursively in the evaluation. To achieve this effect, the evaluation of match will take place not in E + VE but in E + Rec VE, where

Rec : ValEnv ! ValEnv

is defined as follows:

ffl Dom(Rec VE) = Dom VE

ffl If VE(vid ) =2 FcnClosure \Theta  fvg, then (Rec VE)(vid ) = VE(vid ) ffl If VE(vid ) = ((match0; E0; VE0); v) then (Rec VE)(vid ) = ((match0; E0; VE); v) The effect is that, before application of (match; E; VE) to v, the function closures in Ran VE are "unrolled" once, to prepare for their possible recursive application during the evaluation of match upon v.

This device is adopted to ensure that all semantic objects are finite (by controlling the unrolling of recursion). The operator Rec is invoked in just two places in the semantic rules: in the rule for recursive value bindings of the form "rec valbind ", and in the rule for evaluating an application expression "exp atexp" in the case that exp evaluates to a function closure.

40 6 DYNAMIC SEMANTICS FOR THE CORE 6.7 Inference Rules The semantic rules allow sentences of the form

s; A ` phrase ) A0; s0 to be inferred, where A is usually an environment, A0 is some semantic object and s,s0 are the states before and after the evaluation represented by the sentence. Some hypotheses in rules are not of this form; they are called side-conditions. The convention for options is the same as for the Core static semantics.

In most rules the states s and s0 are omitted from sentences; they are only included for those rules which are directly concerned with the state - either referring to its contents or changing it. When omitted, the convention for restoring them is as follows. If the rule is presented in the form

A1 ` phrase1 ) A01 A2 ` phrase2 ) A02 \Delta \Delta \Delta 

\Delta \Delta \Delta  An ` phrasen ) A0n

A ` phrase ) A0

then the full form is intended to be

s0; A1 ` phrase1 ) A01; s1 s1; A2 ` phrase2 ) A02; s2 \Delta \Delta \Delta 

\Delta \Delta \Delta  sn\Gamma 1; An ` phrasen ) A0n; sn

s0; A ` phrase ) A0; sn

(Any side-conditions are left unaltered). Thus the left-to-right order of the hypotheses indicates the order of evaluation. Note that in the case n = 0, when there are no hypotheses (except possibly side-conditions), we have sn = s0; this implies that the rule causes no side effect. The convention is called the state convention, and must be applied to each version of a rule obtained by inclusion or omission of its options.

A second convention, the exception convention, is adopted to deal with the propagation of exception packets p. For each rule whose full form (ignoring side-conditions) is

s1; A1 ` phrase1 ) A01; s01 \Delta \Delta \Delta  sn; An ` phrasen ) A0n; s0n

s; A ` phrase ) A0; s0

and for each k, 1 ^ k ^ n, for which the result A0k is not a packet p, an extra rule is added of the form

s1; A1 ` phrase1 ) A01; s01 \Delta \Delta \Delta  sk; Ak ` phrasek ) p0; s0

s; A ` phrase ) p0; s0

where p0 does not occur in the original rule.1 This indicates that evaluation of phrases in the hypothesis terminates with the first whose result is a packet (other than one already treated in the rule), and this packet is the result of the phrase in the conclusion.

1There is one exception to the exception convention; no extra rule is added for rule 104 which deals with handlers, since a handler is the only means by which propagation of an exception can be arrested.

6.7 Inference Rules 41

A third convention is that we allow compound variables (variables built from the variables in Figure 13 and the symbol "/") to range over unions of semantic objects. For instance the compound variable v=p ranges over Val [ Pack. We also allow x=FAIL to range over X [ fFAILg where x ranges over X; furthermore, we extend environment modification to allow for failure as follows:

VE + FAIL = FAIL:

Atomic Expressions E ` atexp ) v=p

E ` scon ) val(scon) (90)

E(longvid ) = (v; is)

E ` longvid ) v (91)

hE ` exprow ) ri E ` - hexprow i "" ) fgh+ ri in Val (92)

E ` dec ) E0 E + E0 ` exp ) v

E ` let dec in exp end ) v (93)

E ` exp ) v E ` ( exp ) ) v (94)

Comments:

(91) As in the static semantics, value identifiers are looked up in the environment and

the identifier status is not used.

Expression Rows E ` exprow ) r=p

E ` exp ) v hE ` exprow ) ri E ` lab = exp h , exprowi ) flab 7! vgh+ ri (95)

Comment: We may think of components as being evaluated from left to right, because of the state and exception conventions.

Expressions E ` exp ) v=p

E ` atexp ) v E ` atexp ) v (96)

E ` exp ) vid vid 6= ref E ` atexp ) v

E ` exp atexp ) (vid ; v) (97)

42 6 DYNAMIC SEMANTICS FOR THE CORE

E ` exp ) en E ` atexp ) v

E ` exp atexp ) (en; v) (98)

s; E ` exp ) ref ; s0 s0; E ` atexp ) v; s00 a =2 Dom(mem of s00)

s; E ` exp atexp ) a; s00 + fa 7! vg (99)

s; E ` exp ) := ; s0 s0; E ` atexp ) f1 7! a; 2 7! vg; s00

s; E ` exp atexp ) fg in Val; s00 + fa 7! vg (100)

E ` exp ) b E ` atexp ) v APPLY(b; v) = v0=p

E ` exp atexp ) v0=p (101)

E ` exp ) (match; E0; VE) E ` atexp ) v

E0 + Rec VE; v ` match ) v0

E ` exp atexp ) v0 (102)

E ` exp ) (match; E0; VE) E ` atexp ) v

E0 + Rec VE; v ` match ) FAIL

E ` exp atexp ) [Match] (103)

E ` exp ) v E ` exp handle match ) v (104)

E ` exp ) [e] E; e ` match ) v

E ` exp handle match ) v (105)

E ` exp ) [e] E; e ` match ) FAIL

E ` exp handle match ) [e] (106)

E ` exp ) e E ` raise exp ) [e] (107)

E ` fn match ) (match; E; fg) (108) Comments:

(99) The side condition ensures that a new address is chosen. There are no rules con

cerning disposal of inaccessible addresses.

(97)-(103) Note that none of the rules for function application has a premise in which the

operator evaluates to a constructed value, a record or an address. This is because we are interested in the evaluation of well-typed programs only, and in such programs exp will always have a functional type.

6.7 Inference Rules 43

(104) This is the only rule to which the exception convention does not apply. If the

operator evaluates to a packet then rule 105 or rule 106 must be used.

(106) Packets that are not handled by the match propagate. (108) The third component of the function closure is empty because the match does not

introduce new recursively defined values.

Matches E; v ` match ) v0=p=FAIL

E; v ` mrule ) v0 E; v ` mrule h -- matchi ) v0 (109)

E; v ` mrule ) FAIL E; v ` mrule ) FAIL (110)

E; v ` mrule ) FAIL E; v ` match ) v0=FAIL

E; v ` mrule -- match ) v0=FAIL (111)

Comment: A value v occurs on the left of the turnstile, in evaluating a match. We may think of a match as being evaluated against a value; similarly, we may think of a pattern as being evaluated against a value. Alternative match rules are tried from left to right.

Match Rules E; v ` mrule ) v0=p=FAIL

E; v ` pat ) VE E + VE ` exp ) v0

E; v ` pat =? exp ) v0 (112)

E; v ` pat ) FAIL E; v ` pat =? exp ) FAIL (113)

Declarations E ` dec ) E0=p

E ` valbind ) VE E ` val tyvarseq valbind ) VE in Env (114)

` typbind ) TE E ` type typbind ) TE in Env (115)

` datbind ) VE; TE E ` datatype datbind ) (VE; TE) in Env (116)

44 6 DYNAMIC SEMANTICS FOR THE CORE

E(longtycon ) = VE E ` datatype tyvarseq tycon = datatype tyvarseq longtycon )

(VE; ftycon 7! VEg) in Env

(117)

` datbind ) VE E + VE ` dec ) E0 E ` abstype datbind with dec end ) E0 (118)

E ` exbind ) VE E ` exception exbind ) VE in Env (119)

E ` dec1 ) E1 E + E1 ` dec2 ) E2

E ` local dec1 in dec2 end ) E2 (120)

E(longstrid 1) = E1 \Delta \Delta \Delta  E(longstrid n) = En E ` open longstrid 1 \Delta \Delta \Delta  longstrid n ) E1 + \Delta \Delta \Delta  + En (121)

E ` ) fg in Env (122) E ` dec1 ) E1 E + E1 ` dec2 ) E2

E ` dec1 h;i dec2 ) E1 + E2 (123)

Value Bindings E ` valbind ) VE=p

E ` exp ) v E; v ` pat ) VE hE ` valbind ) VE0i

E ` pat = exp hand valbind i ) VE h+ VE0i (124)

E ` exp ) v E; v ` pat ) FAIL E ` pat = exp hand valbind i ) [Bind] (125)

E ` valbind ) VE E ` rec valbind ) Rec VE (126)

Type Bindings ` typbind ) TE

h` typbind ) TEi ` tyvarseq tycon = ty hand typbind i ) ftycon 7! fggh+TEi (127)

6.7 Inference Rules 45 Datatype Bindings ` datbind ) VE; TE

` conbind ) VE h` datbind 0 ) VE0; TE0i ` tyvarseq tycon=conbind hand datbind 0i ) VEh+VE0i; ftycon 7! VEgh+TE0i (128)

Constructor Bindings ` conbind ) VE

h` conbind ) VEi ` vid h-- conbind i ) fvid 7! (vid ; c)g h+VEi (129)

Exception Bindings E ` exbind ) VE

en =2 ens of s s0 = s + feng hs0; E ` exbind ) VE; s00i

s; E ` vid hand exbind i ) fvid 7! (en; e)gh+ VEi; s0h0i (130)

E(longvid ) = (en; e) hE ` exbind ) VEi E ` vid = longvid hand exbind i ) fvid 7! (en; e)gh+ VEi (131)

Comments:

(130) The two side conditions ensure that a new exception name is generated and

recorded as "used" in subsequent states.

Atomic Patterns E; v ` atpat ) VE=FAIL

E; v ` ) fg (132)

v = val(scon) E; v ` scon ) fg (133)

v 6= val(scon) E; v ` scon ) FAIL (134)

vid =2 Dom(E) or is of E(vid ) = v

E; v ` vid ) fvar 7! (v; v)g (135)

E(longvid ) = (v; is) is 6= v

E; v ` longvid ) fg (136)

E(longvid ) = (v0; is) is 6= v v 6= v0

E; v ` longvid ) FAIL (137)

46 6 DYNAMIC SEMANTICS FOR THE CORE

v = fgh+ri in Val hE; r ` patrow ) VE=FAILi

E; v ` - hpatrow i "" ) fgh+VE=FAILi (138)

E; v ` pat ) VE=FAIL E; v ` ( pat ) ) VE=FAIL (139)

Comments:

(134), (137) Any evaluation resulting in FAIL must do so because rule 134, rule 137,

rule 145, or rule 147 has been applied.

Pattern Rows E; r ` patrow ) VE=FAIL

E; r ` ... ) fg (140) E; r(lab) ` pat ) FAIL E; r ` lab = pat h , patrowi ) FAIL (141)

E; r(lab) ` pat ) VE hE; r ` patrow ) VE0=FAILi

E; r ` lab = pat h , patrowi ) VEh+ VE0=FAILi (142)

Comments:

(141),(142) For well-typed programs lab will be in the domain of r.

Patterns E; v ` pat ) VE=FAIL

E; v ` atpat ) VE=FAIL E; v ` atpat ) VE=FAIL (143)

E(longvid ) = (vid ; c) vid 6= ref v = (vid ; v0)

E; v0 ` atpat ) VE=FAIL

E; v ` longvid atpat ) VE=FAIL (144)

E(longvid ) = (vid ; c) vid 6= ref v =2 fvid g \Theta  Val

E; v ` longvid atpat ) FAIL (145)

E(longvid ) = (en; e) v = (en; v0)

E; v0 ` atpat ) VE=FAIL

E; v ` longvid atpat ) VE=FAIL (146)

E(longvid ) = (en; e) v =2 feng \Theta  Val

E; v ` longvid atpat ) FAIL (147)

6.7 Inference Rules 47

s(a) = v s; E; v ` atpat ) VE=FAIL; s

s; E; a ` ref atpat ) VE=FAIL; s (148)

vid =2 Dom(E) or is of E(vid ) = v

E; v ` pat ) VE=FAIL

E; v ` vid as pat ) fvid 7! (v; v)g + VE=FAIL (149) Comments:

(145),(147) Any evaluation resulting in FAIL must do so because rule 134, rule 137,

rule 145, or rule 147 has been applied.

48 7 DYNAMIC SEMANTICS FOR MODULES 7 Dynamic Semantics for Modules 7.1 Reduced Syntax Since signature expressions are mostly dealt with in the static semantics, the dynamic semantics need only take limited account of them. However, they cannot be ignored completely; the reason is that an explicit signature ascription plays the r^ole of restricting the "view" of a structure - that is, restricting the domains of its component environments and imposing identifier status on value identifiers. The syntax is therefore reduced by the following transformations (in addition to those for the Core), for the purpose of the dynamic semantics of Modules:

ffl Qualifications "of ty " are omitted from constructor and exception descriptions. ffl Any qualification sharing type \Delta \Delta \Delta  on a specification or where type \Delta \Delta \Delta  on a sig

nature expression is omitted.

7.2 Compound Objects The compound objects for the Modules dynamic semantics, extra to those for the Core dynamic semantics, are shown in Figure 14. An interface I 2 Int represents a "view" of a

(strid : I; strexp; B) 2 FunctorClosure

= (StrId \Theta  Int) \Theta  StrExp \Theta  Basis I or (SI; TI; VI) 2 Int = StrInt \Theta  TyInt \Theta  ValInt

SI 2 StrInt = StrId fin! Int TI 2 TyInt = TyCon fin! ValInt VI 2 ValInt = VId fin! IdStatus

G 2 SigEnv = SigId fin! Int F 2 FunEnv = FunId fin! FunctorClosure (F; G; E) or B 2 Basis = FunEnv \Theta  SigEnv \Theta  Env

(G; I) or IB 2 IntBasis = SigEnv \Theta  Int

Figure 14: Compound Semantic Objects structure. Specifications and signature expressions will evaluate to interfaces; moreover, during the evaluation of a specification or signature expression, structures (to which a specification or signature expression may refer via datatype replicating specifications) are represented only by their interfaces. To extract a value interface from a dynamic value environment we define the operation Inter : ValEnv ! ValInt as follows:

Inter(VE) = fvid 7! is ; VE(vid ) = (v; is)g

7.3 Inference Rules 49 In other words, Inter(VE) is the value interface obtained from VE by removing all values from VE. We then extend Inter to a function Inter : Env ! Int as follows:

Inter(SE; TE; VE) = (SI; TI; VI) where VI = Inter(VE) and

SI = fstrid 7! Inter E ; SE(strid ) = Eg TI = ftycon 7! Inter VE0 ; TE(tycon) = VE0g

An interface basis IB = (G; I) is a value-free part of a basis, sufficient to evaluate signature expressions and specifications. The function Inter is extended to create an interface basis from a basis B as follows:

Inter(F; G; E) = (G; Inter E)

A further operation

# : Env \Theta  Int ! Env

is required, to cut down an environment E to a given interface I, representing the effect of an explicit signature ascription. We first define #: ValEnv \Theta  ValInt ! ValEnv by

VE # VI = fvid 7! (v; is) ; VE(vid ) = (v; is0) and VI(vid ) = isg (Note that the identifier status is taken from VI.) We then define #: StrEnv \Theta  StrInt ! StrEnv, #: TyEnv \Theta  TyInt ! TyEnv and #: Env \Theta  Int ! Env simultaneously as follows:

SE # SI = fstrid 7! E # I ; SE(strid ) = E and SI(strid) = Ig TE # TI = ftycon 7! VE0 # VI0 ; TE(tycon) = VE0 and TI(tycon) = VI0g

(SE; TE; VE) # (SI; TE; VI) = (SE # SI; TE # TI; VE # VI)

It is important to note that an interface can also be obtained from the static value \Sigma  of a signature expression; it is obtained by first replacing every type structure (`; VE) in the range of every type environment TE by VE and then replacing each pair (oe; is) in the range of every value environment VE by is. Thus in an implementation interfaces would naturally be obtained from the static elaboration; we choose to give separate rules here for obtaining them in the dynamic semantics since we wish to maintain our separation of the static and dynamic semantics, for reasons of presentation.

7.3 Inference Rules The semantic rules allow sentences of the form

s; A ` phrase ) A0; s0

50 7 DYNAMIC SEMANTICS FOR MODULES to be inferred, where A is either a basis, a signature environment or empty, A0 is some semantic object and s,s0 are the states before and after the evaluation represented by the sentence. Some hypotheses in rules are not of this form; they are called side-conditions. The convention for options is the same as for the Core static semantics.

The state and exception conventions are adopted as in the Core dynamic semantics. However, it may be shown that the only Modules phrases whose evaluation may cause a side-effect or generate an exception packet are of the form strexp, strdec, strbind or topdec.

Structure Expressions B ` strexp ) E=p

B ` strdec ) E B ` struct strdec end ) E (150)

B(longstrid ) = E B ` longstrid ) E (151)

B ` strexp ) E Inter B ` sigexp ) I

B ` strexp:sigexp ) E # I (152)

B ` strexp ) E Inter B ` sigexp ) I

B ` strexp:?sigexp ) E # I (153)

B(funid ) = (strid : I; strexp0; B0) B ` strexp ) E B0 + fstrid 7! E # Ig ` strexp0 ) E0

B ` funid ( strexp ) ) E0 (154)

B ` strdec ) E B + E ` strexp ) E0

B ` let strdec in strexp end ) E0 (155)

Comments:

(154) Before the evaluation of the functor body strexp0, the actual argument E is cut

down by the formal parameter interface I, so that any opening of strid resulting from the evaluation of strexp0 will produce no more components than anticipated during the static elaboration.

Structure-level Declarations B ` strdec ) E=p

E of B ` dec ) E0

B ` dec ) E0 (156)

B ` strbind ) SE B ` structure strbind ) SE in Env (157)

7.3 Inference Rules 51

B ` strdec1 ) E1 B + E1 ` strdec2 ) E2

B ` local strdec1 in strdec2 end ) E2 (158)

B ` ) fg in Env (159) B ` strdec1 ) E1 B + E1 ` strdec2 ) E2

B ` strdec1 h;i strdec2 ) E1 + E2 (160)

Structure Bindings B ` strbind ) SE=p

B ` strexp ) E hB ` strbind ) SEi B ` strid = strexp hand strbind i ) fstrid 7! Eg h+ SEi (161)

Signature Expressions IB ` sigexp ) I

IB ` spec ) I IB ` sig spec end ) I (162)

IB(sigid ) = I IB ` sigid ) I (163)

Signature Declarations IB ` sigdec ) G

IB ` sigbind ) G IB ` signature sigbind ) G (164)

Signature Bindings IB ` sigbind ) G

IB ` sigexp ) I hIB ` sigbind ) Gi IB ` sigid = sigexp hand sigbind i ) fsigid 7! Ig h+ Gi (165)

Specifications IB ` spec ) I

` valdesc ) VI IB ` val valdesc ) VI in Int (166)

` typdesc ) TI IB ` type typdesc ) TI in Int (167)

` typdesc ) TI IB ` eqtype typdesc ) TI in Int (168)

52 7 DYNAMIC SEMANTICS FOR MODULES

` datdesc ) VI; TI IB ` datatype datdesc ) (VI; TI) in Int (169)

IB(longtycon ) = VI TI = ftycon 7! VIg IB ` datatype tyvarseq tycon = datatype tyvarseq longtycon ) (VI; TI) in Int (170)

` exdesc ) VI IB ` exception exdesc ) VI in Int (171)

IB ` strdesc ) SI IB ` structure strdesc ) SI in Int (172)

IB ` sigexp ) I IB ` include sigexp ) I (173)

IB ` ) fg in Int (174) IB ` spec1 ) I1 IB + I1 ` spec2 ) I2

IB ` spec1 h;i spec2 ) I1 + I2 (175)

Value Descriptions ` valdesc ) VI

h` valdesc ) VIi ` vid hand valdesci ) fvid 7! vg h+ VIi (176)

Type Descriptions ` typdesc ) TI

h` typdesc ) TIi ` tyvarseq tycon hand typdesci ) ftycon 7! fggh+TIi (177)

Datatype Descriptions ` datdesc ) VI ; TI

` condesc ) VI h` datdesc0 ) VI0; TI0i ` tyvarseq tycon = condesc hand datdesc0i ) VI h+ VI0i; ftycon 7! VIgh+TI0i (178)

7.3 Inference Rules 53 Constructor Descriptions ` condesc ) VI

h` condesc ) VIi ` vid h -- condesci ) fvid 7! cg h+VIi (179)

Exception Descriptions ` exdesc ) VI

h` exdesc ) VIi ` vid hand exdesc i ) fvid 7! eg h+VIi (180)

Structure Descriptions IB ` strdesc ) SI

IB ` sigexp ) I hIB ` strdesc ) SIi IB ` strid : sigexp hand strdesci ) fstrid 7! Ig h+ SIi (181)

Functor Bindings B ` funbind ) F

Inter B ` sigexp ) I hIB ` funbind ) F i IB ` funid ( strid : sigexp ) = strexp hand funbind i )

ffunid 7! (strid : I; strexp; B)g h+ F i

(182)

Functor Declarations B ` fundec ) F

B ` funbind ) F B ` functor funbind ) F (183)

Top-level Declarations B ` topdec ) B0=p

B ` strdec ) E B0 = E in Basis hB + B0 ` topdec ) B00i

B ` strdec htopdeci ) B0h0i (184)

Inter B ` sigdec ) G B0 = G in Basis hB + B0 ` topdec ) B00i

B ` sigdec htopdeci ) B0h0i (185)

B ` fundec ) F B0 = F in Basis hB + B0 ` topdec ) B00i

B ` fundec htopdeci ) B0h0i (186)

54 8 PROGRAMS 8 Programs The phrase class Program of programs is defined as follows

program ::= topdec ; hprogrami Hitherto, the semantic rules have not exposed the interactive nature of the language. During an ML session the user can type in a phrase, more precisely a phrase of the form topdec as defined in Figure 8, page 14. Upon the following semicolon, the machine will then attempt to parse, elaborate and evaluate the phrase returning either a result or, if any of the phases fail, an error message. The outcome is significant for what the user subsequently types, so we need to answer questions such as: if the elaboration of a toplevel declaration succeeds, but its evaluation fails, then does the result of the elaboration get recorded in the static basis?

In practice, ML implementations may provide a directive as a form of top-level declaration for including programs from files rather than directly from the terminal. In case a file consists of a sequence of top-level declarations (separated by semicolons) and the machine detects an error in one of these, it is probably sensible to abort the execution of the directive. Rather than introducing a distinction between, say, batch programs and interactive programs, we shall tacitly regard all programs as interactive, and leave to implementers to clarify how the inclusion of files, if provided, affects the updating of the static and dynamic basis. Moreover, we shall focus on elaboration and evaluation and leave the handling of parse errors to implementers (since it naturally depends on the kind of parser being employed). Hence, in this section the execution of a program means the combined elaboration and evaluation of the program.

So far, for simplicity, we have used the same notation B to stand for both a static and a dynamic basis, and this has been possible because we have never needed to discuss static and dynamic semantics at the same time. In giving the semantics of programs, however, let us rename as StaticBasis the class Basis defined in the static semantics of modules, Section 5.1, and let us use BSTAT to range over StaticBasis. Similarly, let us rename as DynamicBasis the class Basis defined in the dynamic semantics of modules, Section 7.2, and let us use BDYN to range over DynamicBasis. We now define

B or (BSTAT; BDYN) 2 Basis = StaticBasis \Theta  DynamicBasis: Further, we shall use `STAT for elaboration as defined in Section 5, and `DYN for evaluation as defined in Section 7. Then ` will be reserved for the execution of programs, which thus is expressed by a sentence of the form

s; B ` program ) B0; s0 This may be read as follows: starting in basis B with state s the execution of program results in a basis B0 and a state s0.

It must be understood that executing a program never results in an exception. If the evaluation of a topdec yields an exception (for instance because of a raise expression)

55 then the result of executing the program "topdec ;" is the original basis together with the state which is in force when the exception is generated. In particular, the exception convention of Section 6.7 is not applicable to the ensuing rules.

We represent the non-elaboration of a top-level declaration by . . . `STAT topdec 6). (This covers also the case in which a user interrupts the elaboration.)

Programs s; B ` program ) B0; s0

BSTAT of B `STAT topdec 6) hs; B ` program ) B0; s0i

s; B ` topdec ; hprogrami ) Bh0i; sh0i (187)

BSTAT of B `STAT topdec ) B(1)STAT s; BDYN of B `DYN topdec ) p; s0 hs0; B ` program ) B0; s00i

s; B ` topdec ; hprogrami ) Bh0i; s0h0i (188)

BSTAT of B `STAT topdec ) B(1)STAT s; BDYN of B `DYN topdec ) B(1)DYN; s0 B0 = B \Phi  (B(1)STAT; B(1)DYN)

hs0; B0 ` program ) B00; s00i

s; B ` topdec ; hprogrami ) B0h0i; s0h0i (189)

Comments:

(187) A failing elaboration has no effect whatever. (188) An evaluation which yields an exception nullifies the change in the static basis,

but does not nullify side-effects on the state which may have occurred before the exception was raised.

Core language Programs A program is called a core language program if it can be parsed in the reduced grammar defined as follows:

1. Replace the definition of top-level declarations by

topdec ::= strdec

2. Replace the definition of structure-level declarations by

strdec ::= dec

56 A APPENDIX: DERIVED FORMS A Appendix: Derived Forms Several derived grammatical forms are provided in the Core; they are presented in Figures 15, 16 and 17. Each derived form is given with its equivalent form. Thus, each row of the tables should be considered as a rewriting rule

Derived form =) Equivalent form and these rules may be applied repeatedly to a phrase until it is transformed into a phrase of the bare language. See Appendix B for the full Core grammar, including all the derived forms.

In the derived forms for tuples, in terms of records, we use n to mean the ML numeral which stands for the natural number n.

Note that a new phrase class FvalBind of function-value bindings is introduced, accompanied by a new declaration form fun tyvarseq fvalbind . The mixed forms val tyvarseq rec fvalbind , val tyvarseq fvalbind and fun tyvarseq valbind are not allowed - though the first form arises during translation into the bare language.

The following notes refer to Figure 17:

ffl There is a version of the derived form for function-value binding which allows the

function identifier to be infixed; see Figure 21 in Appendix B.

ffl In the two forms involving withtype , the identifiers bound by datbind and by

typbind must be distinct. Then the transformed binding datbind 0 in the equivalent form is obtained from datbind by expanding out all the definitions made by typbind. More precisely, if typbind is

tyvarseq 1 tycon1 =ty1 and \Delta \Delta \Delta  and tyvarseqn tyconn =tyn then datbind 0 is the result of simultaneous replacement (in datbind ) of every type expression tyseq i tyconi (1 ^ i ^ n) by the corresponding defining expression

ty iftyseqi=tyvarseq ig

Figure 18 shows derived forms for functors. They allow functors to take, say, a single type or value as a parameter, in cases where it would seem clumsy to "wrap up" the argument as a structure expression.

Finally, Figure 19 shows the derived forms for specifications. The last derived form for specifications allows sharing between structure identifiers as a shorthand for type sharing specifications. The phrase

spec sharing longstrid 1 = \Delta \Delta \Delta  = longstrid k is a derived form whose equivalent form is

57 Derived Form Equivalent Form Expressions exp () - "" (exp1 , \Delta \Delta \Delta  , expn) -1=exp1, \Delta \Delta \Delta , n=expn"" (n * 2) # lab fn -lab=vid,..."" =? vid (vid new) case exp of match (fn match)(exp) if exp1 then exp2 else exp3 case exp1 of true =? exp2

-- false =? exp3 exp1 orelse exp2 if exp1 then true else exp2 exp1 andalso exp2 if exp1 then exp2 else false (exp1 ; \Delta \Delta \Delta  ; expn ; exp) case exp1 of ( ) =? (n * 1)

\Delta \Delta \Delta  case expn of ( ) =? exp let dec in let dec in (n * 2)

exp1 ; \Delta \Delta \Delta  ; expn end (exp1 ; \Delta \Delta \Delta  ; expn) end while exp1 do exp2 let val rec vid = fn () =? (vid new)

if exp1 then (exp2;vid()) else () in vid() end [exp1 , \Delta \Delta \Delta  , expn] exp1 :: \Delta \Delta \Delta  :: expn :: nil (n * 0)

Figure 15: Derived forms of Expressions spec

sharing type longtycon 1 = longtycon 01 \Delta \Delta \Delta  sharing type longtycon m = longtycon 0m

determined as follows. First, note that spec specifies a set of (possibly long) type constructors and structure identifiers, either directly or via signature identifiers and include specifications. Then the equivalent form contains all type-sharing constraints of the form

sharing type longstrid i:longtycon = longstrid j:longtycon (1 ^ i ! j ^ k), such that both sides of the equation are long type constructors specified by spec.

The meaning of the derived form does not depend on the order of the type-sharing constraints in the equivalent form.

58 A APPENDIX: DERIVED FORMS

Derived Form Equivalent Form Patterns pat () - "" (pat1 , \Delta \Delta \Delta  , patn) -1=pat1, \Delta \Delta \Delta  , n=patn"" (n * 2) [pat1 , \Delta \Delta \Delta  , patn] pat1 :: \Delta \Delta \Delta  :: patn :: nil (n * 0)

Pattern Rows patrow vidh:tyi has pati h, patrowi vid = vidh:tyi has pati h, patrowi

Type Expressions ty ty1 * \Delta \Delta \Delta  * ty n -1:ty1, \Delta \Delta \Delta  , n:tyn"" (n * 2)

Figure 16: Derived forms of Patterns and Type Expressions

Derived Form Equivalent Form Function-value Bindings fvalbind

hopivid = fn vid1=? \Delta \Delta \Delta  fn vidn=? case (vid1, \Delta \Delta \Delta  , vidn) of hopivid atpat 11\Delta \Delta \Delta atpat 1nh:tyi = exp1 (atpat11,\Delta \Delta \Delta ,atpat1n )=?exp1h:tyi --hopivid atpat21\Delta \Delta \Delta atpat 2nh:tyi = exp2 --(atpat21,\Delta \Delta \Delta ,atpat2n )=?exp2h:tyi -- \Delta \Delta \Delta  \Delta \Delta \Delta  -- \Delta \Delta \Delta  \Delta \Delta \Delta  --hopivid atpatm1\Delta \Delta \Delta atpatmnh:tyi = expm --(atpatm1,\Delta \Delta \Delta ,atpatmn )=?expmh:tyi

hand fvalbind i hand fvalbind i

(m; n * 1; vid 1; \Delta \Delta \Delta ; vid n distinct and new)

Declarations dec fun tyvarseq fvalbind val tyvarseq rec fvalbind datatype datbind withtype typbind datatype datbind 0 ; type typbind abstype datbind withtype typbind abstype datbind 0

with dec end with type typbind ; dec end

(see note in text concerning datbind 0)

Figure 17: Derived forms of Function-value Bindings and Declarations

59 Derived Form Equivalent Form Structure Bindings strbind strid :sigexp=strexp hand strbind i strid =strexp:sigexp hand strbind i strid :?sigexp=strexp hand strbind i strid =strexp:?sigexp hand strbind i

Structure Expressions strexp funid ( strdec ) funid ( struct strdec end )

Functor Bindings funbind funid (strid:sigexp): sigexp0 = funid (strid : sigexp) =

strexp hand funbindi strexp:sigexp0 hand funbindi funid (strid:sigexp):?sigexp0 = funid (strid : sigexp) =

strexp hand funbindi strexp:?sigexp0 hand funbindi funid ( spec ) h: sigexpi = funid ( strid * : sig spec end ) =

strexp hand funbindi let open strid * in strexph: sigexpi

end hand funbindi funid ( spec ) h:? sigexpi = funid ( strid * : sig spec end ) =

strexp hand funbindi let open strid * in strexph:?sigexpi

end hand funbindi

(strid * new)

Programs program exp;hprogrami val it = exp;hprogrami

Figure 18: Derived forms of Functors, Structure Bindings and Programs

60 A APPENDIX: DERIVED FORMS

Derived Form Equivalent Form Specifications spec type tyvarseq tycon = ty include

sig type tyvarseq tycon end where type tyvarseq tycon = ty type tyvarseq 1 tycon1 = ty1 type tyvarseq1 tycon1 = ty1

and \Delta \Delta \Delta  type \Delta \Delta \Delta  \Delta \Delta \Delta  \Delta \Delta \Delta  and tyvarseq n tyconn = tyn type tyvarseqn tyconn = ty n include sigid 1 \Delta \Delta \Delta  sigid n (n*2) include sigid 1; \Delta \Delta \Delta  ; include sigidn spec sharing longstrid 1 = \Delta \Delta \Delta  spec

= longstrid k sharing type longtycon 1 =

longtycon 01 \Delta \Delta \Delta  sharing type longtycon m =

longtycon 0m

(see note in text concerning longtycon 1; . . . ; longtycon 0m)

Figure 19: Derived forms of Specifications and Signature Expressions

61 B Appendix: Full Grammar The full grammar of programs is exactly as given at the start of Section 8.

The full grammar of Modules consists of the grammar of Figures 5-8 in Section 3, together with the derived forms of Figures 18 and 19 in Appendix A.

The remainder of this Appendix is devoted to the full grammar of the Core. Roughly, it consists of the grammar of Section 2 augmented by the derived forms of Appendix A. But there is a further difference: two additional subclasses of the phrase class Exp are introduced, namely AppExp (application expressions) and InfExp (infix expressions). The inclusion relation among the four classes is as follows:

AtExp ae AppExp ae InfExp ae Exp The effect is that certain phrases, such as "2 + while \Delta \Delta \Delta  do \Delta \Delta \Delta  ", are now disallowed.

The grammatical rules are displayed in Figures 20, 21, 22 and 23. The grammatical conventions are exactly as in Section 2, namely:

ffl The brackets h i enclose optional phrases. ffl For any syntax class X (over which x ranges) we define the syntax class Xseq (over

which xseq ranges) as follows:

xseq ::= x (singleton sequence)

(empty sequence) (x1,\Delta \Delta \Delta ,xn) (sequence, n * 1)

(Note that the "\Delta \Delta \Delta " used here, a meta-symbol indicating syntactic repetition, must not be confused with "..." which is a reserved word of the language.)

ffl Alternative forms for each phrase class are in order of decreasing precedence. This

precedence resolves ambiguity in parsing in the following way. Suppose that a phrase class -- we take exp as an example -- has two alternative forms F1 and F2, such that F1 ends with an exp and F2 starts with an exp. A specific case is

F1: if exp1 then exp2 else exp3 F2: exp handle match

It will be enough to see how ambiguity is resolved in this specific case. Suppose that the lexical sequence

\Delta \Delta \Delta  \Delta \Delta \Delta  if \Delta \Delta \Delta  then \Delta \Delta \Delta  else exp handle \Delta \Delta \Delta  \Delta \Delta \Delta  is to be parsed, where exp stands for a lexical sequence which is already determined as a subphrase (if necessary by applying the precedence rule). Then the higher

62 B APPENDIX: FULL GRAMMAR

precedence of F2 (in this case) dictates that exp associates to the right, i.e. that the correct parse takes the form

\Delta \Delta \Delta  \Delta \Delta \Delta  if \Delta \Delta \Delta  then \Delta \Delta \Delta  else (exp handle \Delta \Delta \Delta ) \Delta \Delta \Delta  not the form

\Delta \Delta \Delta  (\Delta \Delta \Delta  if \Delta \Delta \Delta  then \Delta \Delta \Delta  else exp) handle \Delta \Delta \Delta  \Delta \Delta \Delta 

Note particularly that the use of precedence does not decrease the class of admissible phrases; it merely rejects alternative ways of parsing certain phrases. In particular, the purpose is not to prevent a phrase, which is an instance of a form with higher precedence, having a constituent which is an instance of a form with lower precedence. Thus for example

if \Delta \Delta \Delta  then while \Delta \Delta \Delta  do \Delta \Delta \Delta  else while \Delta \Delta \Delta  do \Delta \Delta \Delta  is quite admissible, and will be parsed as

if \Delta \Delta \Delta  then (while \Delta \Delta \Delta  do \Delta \Delta \Delta ) else (while \Delta \Delta \Delta  do \Delta \Delta \Delta )

ffl L (resp. R) means left (resp. right) association. ffl The syntax of types binds more tightly than that of expressions. ffl Each iterated construct (e.g. match, \Delta \Delta \Delta  ) extends as far right as possible; thus,

parentheses may be needed around an expression which terminates with a match, e.g. "fn match", if this occurs within a larger match.

63 atexp ::= scon special constant

hopilongvid value identifier - hexprow i "" record # lab record selector () 0-tuple (exp1 , \Delta \Delta \Delta  , expn) n-tuple, n * 2 [exp1 , \Delta \Delta \Delta  , expn] list, n * 0 (exp1 ; \Delta \Delta \Delta  ; expn) sequence, n * 2 let dec in exp1 ; \Delta \Delta \Delta  ; expn end local declaration, n * 1 ( exp )

exprow ::= lab = exp h , exprowi expression row appexp ::= atexp

appexp atexp application expression

infexp ::= appexp

infexp1 vid infexp2 infix expression

exp ::= infexp

exp : ty typed (L) exp1 andalso exp2 conjunction exp1 orelse exp2 disjunction exp handle match handle exception raise exp raise exception if exp1 then exp2 else exp3 conditional while exp1 do exp2 iteration case exp of match case analysis fn match function

match ::= mrule h -- matchi mrule ::= pat =? exp

Figure 20: Grammar: Expressions and Matches

64 B APPENDIX: FULL GRAMMAR

dec ::= val tyvarseq valbind value declaration

fun tyvarseq fvalbind function declaration type typbind type declaration datatype datbind hwithtype typbind i datatype declaration datatype tyvarseq tycon =

datatype tyvarseq longtycon datatype replication abstype datbind hwithtype typbind i abstype declaration

with dec end exception exbind exception declaration local dec1 in dec2 end local declaration open longstrid 1 \Delta \Delta \Delta  longstrid n open declaration, n * 1

empty declaration dec1 h;i dec2 sequential declaration infix hdi vid 1 \Delta \Delta \Delta  vid n infix (L) directive, n * 1 infixr hdi vid 1 \Delta \Delta \Delta  vid n infix (R) directive, n * 1 nonfix vid 1 \Delta \Delta \Delta  vid n nonfix directive, n * 1

valbind ::= pat = exp hand valbind i

rec valbind

fvalbind ::= hopivid atpat 11\Delta \Delta \Delta atpat1nh:tyi=exp1 m; n * 1

--hopivid atpat 21\Delta \Delta \Delta atpat2nh:tyi=exp2 See also note below -- \Delta \Delta \Delta  \Delta \Delta \Delta  --hopivid atpat m1\Delta \Delta \Delta atpatmnh:tyi=expm

hand fvalbind i

typbind ::= tyvarseq tycon = ty hand typbind i datbind ::= tyvarseq tycon = conbind hand datbind i conbind ::= hopivid hof tyi h -- conbindi exbind ::= hopivid hof tyi hand exbind i

hopivid = hopilongvid hand exbind i

Note: In the fvalbind form, if vid has infix status then either op must be present, or vid must be infixed. Thus, at the start of any clause, " op vid (atpat,atpat0) \Delta \Delta \Delta " may be written "(atpat vid atpat0) \Delta \Delta \Delta "; the parentheses may also be dropped if ":ty" or "=" follows immediately.

Figure 21: Grammar: Declarations and Bindings

65 atpat ::= wildcard

scon special constant hopilongvid value identifier - hpatrow i "" record () 0-tuple (pat1 , \Delta \Delta \Delta  , patn) n-tuple, n * 2 [pat1 , \Delta \Delta \Delta  , patn] list, n * 0 ( pat )

patrow ::= ... wildcard

lab = pat h , patrowi pattern row vidh:tyi has pat i h, patrowi label as variable

pat ::= atpat atomic

hopilongvid atpat constructed value pat1 vid pat2 constructed value (infix) pat : ty typed hopividh: tyi as pat layered

Figure 22: Grammar: Patterns

ty ::= tyvar type variable

- htyrow i "" record type expression tyseq longtycon type construction ty 1 * \Delta \Delta \Delta  * tyn tuple type, n * 2 ty -? ty0 function type expression (R) ( ty )

tyrow ::= lab : ty h , tyrowi type-expression row

Figure 23: Grammar: Type expressions

66 C APPENDIX: THE INITIAL STATIC BASIS C Appendix: The Initial Static Basis In this appendix (and the next) we define a minimal initial basis for execution. Richer bases may be provided by libraries. We shall indicate components of the initial basis by the subscript 0. The initial static basis is B0 = T0; F0; G0; E0, where F0 = fg, G0 = fg and

T0 = fbool; int; real; string; char; word; list; ref; exng

The members of T0 are type names, not type constructors; for convenience we have used type-constructor identifiers to stand also for the type names which are bound to them in the initial static type environment TE0. Of these type names, list and ref have arity 1, the rest have arity 0; all except exn admit equality. Finally, E0 = (SE0; TE0; VE0), where SE0 = fg, while TE0 and VE0 are shown in Figures 24 and 25, respectively.

tycon 7! ( `, fvid 1 7! (oe1; is1); . . . ; vid n 7! (oen; isn)g ) (n * 0)

unit 7! ( \Lambda ():fg, fg ) bool 7! ( bool, ftrue 7! (bool; c); false 7! (bool; c)g )

int 7! ( int, fg ) word 7! ( word, fg ) real 7! ( real, fg ) string 7! ( string, fg )

char 7! ( char, fg ) list 7! ( list, fnil 7! (8'a : 'a list; c),

::7! (8'a : 'a \Lambda  'a list ! 'a list; c)g ) ref 7! ( ref, fref 7! (8 'a : 'a ! 'a ref; c)g ) exn 7! ( exn, fg )

Figure 24: Static TE0

NONFIX INFIX vid 7! (oe; is) vid 7! (oe; is) ref 7! (8 'a : 'a ! 'a ref, c) Precedence 5, right associative : nil 7! (8'a: 'a list, c) :: 7! (8'a:'a \Lambda  'a list ! 'a list, c) true 7! (bool; c) Precedence 4, left associative : false 7! (bool; c) = 7! (8''a: ''a \Lambda  ''a ! bool; v) Match 7! (exn; e) Precedence 3, left associative :

Bind 7! (exn; e) := 7! (8'a: 'a ref \Lambda  'a ! unit; v)

Note: In type schemes we have taken the liberty of writing ty 1 \Lambda  ty2 in place of f1 7! ty 1; 2 7! ty2g.

Figure 25: Static VE0

67 D Appendix: The Initial Dynamic Basis We shall indicate components of the initial basis by the subscript 0. The initial dynamic basis is B0 = F0; G0; E0, where F0 = fg, G0 = fg and E0 = (SE0; TE0; VE0), where SE0 = fg, TE0 is shown in Figure 26 and

VE0 = f= 7! (=; v); := 7! (:=; v); Match 7! (Match; e); Bind 7! (Bind; e);

true 7! (true; c); false 7! (false; c); nil 7! (nil; c); :: 7! (::; c); ref 7! (ref; c)g.

tycon 7! fvid 1 7! v1; . . . ; vid n 7! vng (n * 0)

unit 7! fg bool 7! ftrue 7! true false 7! falseg

int 7! fg word 7! fg real 7! fg string 7! fg

char 7! fg list 7! fnil 7! nil; :: 7! ::g

ref 7! fref 7! refg exn 7! fg

Figure 26: Dynamic TE0

68 E OVERLOADING E Overloading Two forms of overloading are available:

ffl Certain special constants are overloaded. For example, 0w5 may have type word or

some other type, depending on the surrounding program text;

ffl Certain operators are overloaded. For example, + may have type int \Lambda  int ! int

or real \Lambda  real ! real, depending on the surrounding program text;

Programmers cannot define their own overloaded constants or operators.

Although a formal treatment of overloading is outside the scope of this document, we do give a complete list of the overloaded operators and of types with overloaded special constants. This list is consistent with the Basis Library[17].

Every overloaded constant and value identifier has among its types a default type, which is ascribed to it, when the surrounding text does not resolve the overloading. For this purpose, the surrounding text is no larger than the smallest enclosing structure-level declaration; an implementation may require that a smaller context determines the type.

E.1 Overloaded special constants Libraries may extend the set T0 of Appendix C with additional type names. Thereafter, certain subsets of T0 have a special significance; they are called overloading classes and they are:

Int ' fintg Real ' frealg Word ' fwordg String ' fstringg Char ' fcharg WordInt = Word [ Int RealInt = Real [ Int Num = Word [ Real [ Int NumTxt = Word [ Real [ Int [ String [ Char

Among these, the five first (Int, Real, Word, String and Char) are said to be basic; the remaining are said to be composite. The reason that the basic classes are specified using ' rather than = is that libraries may extend each of the basic overloading classes with further type names. Special constants are overloaded within each of the basic overloading classes. However, the basic overloading classes must be arranged so that every special constant can be ascribed types from at most one of the basic overloading classes. For example, to 0w5 may be ascribed type word, or some other member of Word, depending on the surrounding text. If the surrounding text does not determine the type of the constant, a default type is used. The default types for the five sets are int, real, word, string and char respectively.

E.2 Overloaded value identifiers 69

NONFIX INFIX var 7! set of monotypes var 7! set of monotypes abs 7! realint ! realint Precedence 7, left associative :

~ 7! realint ! realint div 7! wordint \Lambda  wordint ! wordint

mod 7! wordint \Lambda  wordint ! wordint

* 7! num \Lambda  num ! num / 7! Real \Lambda  Real ! Real Precedence 6, left associative :

+ 7! num \Lambda  num ! num - 7! num \Lambda  num ! num Precedence 4, left associative :

! 7! numtxt \Lambda  numtxt ! numtxt ? 7! numtxt \Lambda  numtxt ! numtxt != 7! numtxt \Lambda  numtxt ! numtxt ?= 7! numtxt \Lambda  numtxt ! numtxt

Figure 27: Overloaded identifiers

Once overloading resolution has determined the type of a special constant, it is a compile-time error if the constant does not make sense or does not denote a value within the machine representation chosen for the type. For example, an escape sequence of the form "uxxxx in a string constant of 8-bit characters only makes sense if xxxx denotes a number in the range [0; 255].

E.2 Overloaded value identifiers Overloaded identifiers all have identifier status v. An overloaded identifier may be rebound with any status (v, c and e) but then it is not overloaded within the scope of the binding.

The overloaded identifiers are given in Figure 27. For example, the entry

abs 7! realint ! realint states that abs may assume one of the types ft ! t j t 2 RealIntg. In general, the same type name must be chosen throughout the entire type of the overloaded operator; thus abs does not have type real ! int.

The operator / is overloaded on all members of Real, with default type real \Lambda  real ! real. The default type of any other identifier is that one of its types which contains the type name int. For example, the program fun double(x) = x + x; declares a function of type int \Lambda  int ! int, while fun double(x:real) = x + x; declares a function of type real \Lambda  real ! real.

The dynamic semantics of the overloaded operators is defined in [17].

70 F APPENDIX: THE DEVELOPMENT OF ML F Appendix: The Development of ML This Appendix records the main stages in the development of ML, and the people principally involved. The main emphasis is upon the design of the language; there is also a section devoted to implementation. On the other hand, no attempt is made to record work on applications of the language.

Origins ML and its semantic description have evolved over a period of about twenty years. It is a fusion of many ideas from many people; in this appendix we try to record and to acknowledge the important precursors of its ideas, the important influences upon it, and the important contributions to its design, implementation and semantic description.

ML, which stands for meta language, was conceived as a medium for finding and performing proofs in a formal logical system. This application was the focus of the initial design effort, by Robin Milner in collaboration first with Malcolm Newey and Lockwood Morris, then with Michael Gordon and Christopher Wadsworth [19]. The intended application to proof affected the design considerably. Higher order functions in full generality seemed necessary for programming proof tactics and strategies, and also a robust type system (see below). At the same time, imperative features were important for practical reasons; no-one had experience of large useful programs written in a pure functional style. In particular, an exception-raising mechanism was highly desirable for the natural presentation of tactics.

The full definition of this first version of ML was included in a book [18] which describes LCF, the proof system which ML was designed to support. The details of how the proof application exerted an influence on design is reported by Milner [38]. Other early influences were the applicative languages already in use in Artificial Intelligence, principally LISP [35], ISWIM [27] and POP2 [9].

Polymorphic types The polymorphic type discipline and the associated type-assignment algorithm were prompted by the need for security; it is vital to know that when a program produces an object which it claims to be a theorem, then it is indeed a theorem. A type discipline provides the security, but a polymorphic discipline also permits considerable flexibility.

The key ideas of the type discipline were evolved in combinatory logic by Haskell Curry and Roger Hindley, who arrived at different but equivalent algorithms for computing principal type schemes. Curry's [13] algorithm was by equation-solving; Hindley [25] used the unification algorithm of Alan Robinson [47] and also presented the precursor of our type inference system. James Morris [42] independently gave an equation-solving algorithm very similar to Curry's. The idea of an algorithm for finding principal type schemes is very natural and may well have been known earlier. Roger Hindley has pointed out that

71 Carew Meredith's inference rule for propositional logic called Condensed Detachment, defined in the early 1950s, clearly suggests that he knew such an algorithm [36].

Milner [37], during the design of ML, rediscovered principal types and their calculation by unification, for a language (slightly richer than combinatory logic) containing local declarations. He and Damas [14] presented the ML type inference systems following Hindley's style. Damas [15], using ideas from Michael Gordon, also devised the first mathematical treatment of polymorphism in the presence of references and assignment. Tofte [53] produced a different scheme employing so-called imperative types, which was adopted in the original version of the language. This approach has been superseded in the present language by a simpler scheme, suggested by Tofte [53], Andrew Wright [56], and Xavier Leroy [28], according to which polymorphic bindings are restricted to nonexpansive expressions.

Refinement of the Core Language Two movements led to the re-design of ML. One was the work of Rod Burstall and his group on specifications, crystallised in the specification language CLEAR [10] and in the functional programming language HOPE [11]; the latter was for expressing executable specifications. The outcome of this work which is relevant here was twofold. First, there were elegant programming features in HOPE, particularly pattern matching and clausal function definitions; second, there were ideas on modular construction of specifications, using signatures in the interfaces. A smaller but significant movement was by Luca Cardelli, who extended the data-type repertoire in ML by adding named records and variant types.

In 1983, Milner (prompted by Bernard Sufrin) wrote the first draft of a standard form of ML attempting to unite these ideas; over the next three years it evolved into the Standard ML core language. Notable here was the harmony found among polymorphism, HOPE patterns and Cardelli records, and the nice generalisations of ML exceptions due to ideas from Alan Mycroft, Brian Monahan and Don Sannella. A simple stream-based I/O mechanism was developed from ideas of Cardelli by Milner and Harper. The Standard ML core language is described in detail in a composite report [22] which also contains a description of the I/O mechanism and MacQueen's proposal for program modules (see later for discussion of this). Since then only few changes to the core language have occurred. Milner proposed equality types, and these were added, together with a few minor adjustments [39]. The last development before the 1990 Definition was in the exception mechanism, by MacQueen using an idea from Burstall [2]; it harmonized the ideas of exception and data type construction.

Modules Besides contributory ideas to the core language, HOPE [11] contained a simple notion of program module. The most important and original feature of ML modules, however, stems from the work on parameterised specifications in CLEAR [10]. MacQueen, who was

72 F APPENDIX: THE DEVELOPMENT OF ML a member of Burstall's group at the time, designed [33] a new parametric module feature for HOPE inspired by the CLEAR work. He later extended the parameterisation ideas by a novel method of specifying sharing of components among the structure parameters of a functor, and produced a draft design which accommodated features already present in ML - in particular the polymorphic type system. This design was discussed in detail at Edinburgh, leading to MacQueen's first report on modules [22].

Thereafter, the design came under close scrutiny through a draft operational static semantics and prototype implementation of it by Harper, through Kevin Mitchell's implementation of the evaluation, through a denotational semantics written by Don Sannella, and then through further work on operational semantics by Harper, Milner, and Tofte. (More is said about this in the later section on Semantics.) In all of this work the central ideas withstood scrutiny, while it also became clear that there were gaps in the design and ambiguities in interpretation. (An example of a gap was the inability to specify sharing between a functor argument structure and its result structure; an example of an ambiguity was the question of whether sharing exists in a structure over and above what is specified in the signature expression which accompanies its declaration.)

Much discussion ensued; it was possible for a wider group to comment on modules through using Harper's prototype implementation, while Harper, Milner and Tofte gained understanding during development of this semantics. In parallel, Sannella and Tarlecki explored the implications of modules for the methodology of program development [48]. Tofte, in his thesis [52], proved several technical properties of modules in a skeletal language, which generated considerable confidence in this design. A key point in this development was the proof of the existence of principal signatures, and, in the careful distinction between the notion of enrichment of structures, which allows more polymorphism and more components, and realisation which allows more sharing.

At a meeting in Edinburgh in 1987 a choice of two designs was presented, hinging upon whether or not a functor application should coerce its actual argument to its argument signature. The meeting chose coercion, and thereafter the production of Section 5 of this report - the static semantics of modules - was a matter of detailed care. That section is undoubtedly the most original and demanding part of this semantics, just as the ideas of MacQueen upon which it is based are the most far-reaching extension to the original design of ML.

Considerable experience was gained in implementing, programming with, and teaching the language during the nearly ten years since the definition was first published. Based on this experience a number of design decisions were revisited at a meeting of the authors in Cambridge at the end of 1995. At this meeting it was decided to make several modest, but significant, changes to the language in order to simplify the semantics and to correct some shortcomings that had come to light. The most important of these changes was the replacement of the imperative type discipline by the so-called value restriction (discussed above), the elimination of structure sharing as a separate concept from type sharing, and the introduction of the closely connected mechanisms of opaque signature matching and type abbreviations in signatures. An important impetus for these changes to the modules language was the work of Leroy [29], and Harper and Lillibridge [20] on the type-theoretic

73 interpretation of modules (described below). Implementation The first implementation of ML was by Malcolm Newey, Lockwood Morris and Robin Milner in 1974, for the DEC10. Later Mike Gordon and Chris Wadsworth joined; their work was mainly in specialising ML towards machine-assisted reasoning. Around 1980 Luca Cardelli implemented a version on VAX; his work was later extended by Alan Mycroft, Kevin Mitchell and John Scott. This version contained one or two new data-type features, and was based upon the Functional Abstract Machine (FAM), a virtual machine which has been a considerable stimulus to later implementation. By providing a reasonably efficient implementation, this work enabled the language to be taught to students; this, in turn, prompted the idea that it could become a useful general purpose language.

In Gothenburg, an implementation was developed by Lennart Augustsson and Thomas Johnsson in 1982, using lazy evaluation rather than call-by-value; the result was called Lazy ML and is described in [5]. This work is part of continuing research in many places on implementation of lazy evaluation in pure functional languages. But for ML, which includes exceptions and assignment, the emphasis has been mainly upon strict evaluation (call-by-value).

In Cambridge, in the early 1980s, Larry Paulson made considerable improvements to the Edinburgh ML compiler, as part of his wider programme of improving Edinburgh LCF to become Cambridge LCF [45]. This system has supported larger proofs than the Edinburgh system, and with greater convenience; in particular, the compiled ML code ran four to five times faster.

Around the same time G'erard Huet at INRIA (Versailles) adapted ML to Maclisp on Multics, again for use in machine-assisted proof. There was close collaboration between INRIA and Cambridge in this period. ML has undergone a separate development in the group at INRIA on the CAML language [12]. Work on CAML included the development of several extensions to the core language, notably updatable fields in record types, values with dynamic types, support for lazy evaluation, and handling of embedded languages with user-defined syntax. It did not, however, include modules.

The first implementation of the Standard ML core language was by Mitchell, Mycroft and Scott at Edinburgh, around 1984. The prototype implementation of modules, before that part of the language settled down, was done in 1985-6; Mitchell dealt with evaluation, while Harper tackled the elaboration (or `signature checking') which raised problems of a kind not previously encountered. Harper's implementation employed a form of unification that was later adopted in the static semantics of modules.

At around the same time the Poly/ML implementation began with a suggestion from Mike Gordon that an interesting application of Matthews' Poly language would be to implement Standard ML. Important experience was gained through Matthews' early implementation of the core language, followed by several versions of the modules language as they were devised. Poly/ML features arbitrary precision arithmetic, a process package, and a windowing system. Considerable experience has been gained with the compiler,

74 F APPENDIX: THE DEVELOPMENT OF ML notably by Larry Paulson at Cambridge and by Abstract Hardware Limited (AHL).

In 1986 Andrew Appel and David MacQueen began work on the Standard ML of New Jersey (SML/NJ) compiler [4]. SML/NJ is a robust and complete environment for Standard ML that supports the implementation of large software systems and generates efficient code for a number of different hardware and software platforms. SML/NJ also serves as a laboratory for compiler research: in implementations of module systems for ML; code optimization based on continuation-passing style; efficient pattern matching; and very fast heap allocation and garbage collection. Dozens of researchers have contributed to the development of the compiler, in such areas as efficient closure representations, first-class continuations, type-directed compilation, concurrent programming, portable code generators, separate compilation, and register allocation.

In 1989, Mads Tofte, Nick Rothwell and David N. Turner started work on the ML Kit Compiler in Edinburgh. The ML Kit is a direct translation of the 1990 Definition into a collection of Standard ML modules, emphasis being on clarity rather than efficiency. During 1992 and 1993, Version 1 of the ML Kit was completed, mostly through the work of Nick Rothwell at Edinburgh and Lars Birkedal at DIKU[8]. In 1994, region inference was added to the ML Kit, by Mads Tofte. Lars Birkedal wrote a region-based C-code generator and a runtime system in C. In 1995, Martin Elsman and Niels Hallenberg extended this work to generate native code for the HP PA-RISC architecture.

Harlequin Ltd. began the implementation of a commercial compiler in 1990. The MLWorks system is a fully-featured graphical programming environment, including an interactive debugger, inspector, browser, extensive profiling facilities, separate compilation and delivery, a foreign-language interface, and libraries for threads and windowing systems.

Caml Light, a lightweight reimplementation of CAML released in 1991, added a simple module system in the style of Modula-2, targeted towards separate compilation of modules: structures and signatures are identified with files, functors and multiple views of a structure are not supported. These were added in the Caml Special Light implementation in 1995, while preserving the support for separate compilation. Caml Special Light and the present version of Standard ML share several important simplifications, such as the value restriction on polymorphism, type definitions in signatures, and the lack of support for structure sharing. The static semantics for Caml Special Light is based on the typetheoretic properties of dependent function types (functor signatures) and manifest types (type definitions in signatures) [29].

Moscow ML is an implementation of core Standard ML, created in 1994 by Sergei Romanenko in Moscow and Peter Sestoft in Copenhagen. The Caml Light system was used to implement the dynamic semantics, and the ML Kit guided the implementation of the static semantics. The result is a compact and robust implementation, suitable for teaching.

The TIL (Typed Intermediate Languages) compiler developed at Carnegie Mellon University by Greg Morrisett, David Tarditi, Perry Cheng, Chris Stone, Robert Harper, and Peter Lee demonstrates the use of types in compilation. All but the last few stages of TIL are expressed as type-directed and type-preserving transforms. Types are used at run

75 time to support unboxed, untagged data representations and natural calling conventions in the presence of variable types and garbage collection. TIL employs a wide variety of conventional functional language optimizations found in other SML compilers, as well as a set of loop-oriented optimizations. A description of the compiler and an analysis of its performance appears in [51].

Other currently active implementations are by Michael Hedlund at the RutherfordAppleton Laboratory, by Robert Duncan, Simon Nichols and Aaron Sloman at the University of Sussex (POPLOG) and by Malcolm Newey and his group at the Australian National University.

Semantics The description of the first version of ML [18] was informal, and in an operational style; around the same time a denotational semantics was written, but never published, by Mike Gordon and Robin Milner. Meanwhile structured operational semantics, presented as an inference system, was gaining credence as a tractable medium. This originates with the reduction rules of *-calculus, but was developed more widely through the work of Plotkin [46], and also by Milner. This was at first only used for dynamic semantics, but later the benefit of using inference systems for both static and dynamic semantics became apparent. This advantage was realised when Gilles Kahn and his group at INRIA were able to execute early versions of both forms of semantics for the ML core language using their Typol system [16]. The static and dynamic semantics of the core language reached a final form mostly through work by Tofte and Milner.

The modules of ML presented little difficulty as far as dynamic semantics is concerned, but the static semantics of modules was a concerted effort by several people. MacQueen's original informal description [22] was the starting point; Sannella wrote a denotational semantics for several versions, which showed that several issues had not been settled by the informal description. Robert Harper, while writing the first implementation of modules, made the first draft of the static semantics. Harper's version made clear the importance of structure names; work by Milner and Tofte introduced further ideas including realisation; thereafter a concerted effort by all three led to several suggestions for modification of the language, and a small range of alternative interpretations; these were assessed in discussion with MacQueen, and more widely with the principal users of the language, and an agreed form was reached.

Concurrently with the formulation of the Definition of Standard ML, Harper and Mitchell took up the challenge adumbrated by MacQueen [32] to find a type-theoretic interpretation of Standard ML [24]. This work led to the formulation of the XML language, an explicitly-typed *-calculus that captured many aspects of Standard ML. Although incomplete, their approach formed the basis for a number of subsequent studies, including the work of Harper and Lillibridge [20] and Leroy [29] on the type-theoretic interpretation of modules. This work influenced the decision to revise the language, and culminated in a type-theoretic interpretation of the present language by Harper and Stone [50]. The TIL/ML compiler (described above) is based directly on this interpretation.

76 F APPENDIX: THE DEVELOPMENT OF ML

There is no doubt that the interaction between design and semantic description of modules has been one of the most striking phases in the entire language development, leading (in the opinion of those involved) to a high degree of confidence both in the language and in the semantics.

Program Libraries During 1989-1991, Dave Berry produced the first program library for Standard ML[6,7]. Subsequently, a partnership between the originators of SML/NJ, ML Works and Moscow ML was formed, with the goal of creating an industrial strength initial basis for Standard ML. The resulting SML Basis Library[17] is a much improved and extended replacement of the initial basis defined in the 1990 Definition of Standard ML.

77 G Appendix: What is New? This appendix gives an overview of how the present Definition differs from the 1990 Definition of Standard ML[41]. For the purpose of this appendix, we write SML '90 for the language defined by the 1990 Definition and SML '96 for the present language. For each major change, we give its rationale and an overview of its practical implications. Also, the index (page 94 ff.) may be used for locating changes.

G.1 Type Abbreviations in Signatures There are cases of type sharing which cannot be expressed in SML '90 signatures although they arise in structures. For example, there is no SML '90 signature which precisely describes the relationship between s and t in

structure a =

struct

datatype s = C type t = s * s end

In SML '96, one can write type abbreviations in signatures, e.g.,

signature A =

sig

type s type t = s * s end

The need for type abbreviations in signatures was clear when SML '90 was defined. However, type abbreviations were not included since, in the presence of both structure sharing and type abbreviations, principal signatures do not exist[40] - and the SML '90 Definition depended strongly upon the notion of principal signature. Subsequently, Harper's and Lillibridge's work on translucent sums[21] and Leroy's work on modules[29] showed that, in the absence of structure sharing and certain other features of the SML '90 signatures, type abbreviations in signatures are possible. Indeed, Leroy provides type abbreviations in signatures in his CAML Special Light[30].

In SML '96, structure sharing has been removed (see Section G.3 below). Type abbreviations are not included directly, but they arise as a derived form, as follows. First, a new form of signature expression is allowed:

sigexp where type tyvarseq longtycon = ty Here longtycon has to be specified by sigexp. The type expression ty may refer to type constructors which are present in the basis in which the whole signature expression is elaborated, but not to type constructors specified in sigexp.

The effect of the where type is, roughly speaking, to instantiate longtycon to ty. For example, the following sequence of declarations is legal:

78 G APPENDIX: WHAT IS NEW?

signature SIG1 = sig type t; val x: t end; signature SIG2 = SIG1 where type t = int*int; structure S1: SIG1 = struct type t = real; val x = 1.0 end; structure S2: SIG2 = struct type t = int*int; val x = (5, 7) end;

Next, a type abbreviation is a derived form. For example, type u = t*t is equivalent to include sig type u end where type u = t*t . In SML '96 it is allowed to include an arbitrary signature expression, not just a signature identifier.

G.2 Opaque Signature Matching In imposing a signature on a structure, one often wants the types of the resulting structure to be "abstract" in order to hide their implementation. (Signature matching in SML '90 hides components, but does not hide type sharing.) MacQueen originally suggested an abstraction declaration for this purpose[31]. In the Commentary[40] it was pointed out that the issue is the semantics of matching. SML '96 provides two kinds of matching, as new forms of structure expression:

strexp : sigexp strexp :? sigexp The first (:) is the SML '90 signature matching; the second (:?) is opaque matching. Opaque matching can be applied to the result structure of a functor; thus it is more general than MacQueen's abstraction declaration. In CAML Special Light, all signature matching is opaque.

With opaque matching, types in the resulting structure will be abstract, to precisely the degree expressed in sigexp. Thus

signature Sig =

sig

type t = int val x: t type u val y: u end; structure S1:? Sig =

struct type t = int

val x = 3 type u = real val y = 3.0 end val r = S1.x + 1

is legal, but a subsequent declaration val s = S1.y + 1.5 will fail to elaborate. Similarly, consider the functor declaration:

G.3 Sharing 79

functor Dict(type t; val leq: t*t-?int):?

sig type u = t*t

type 'a dict end = struct

type u = t*t type 'a dict = (t * 'a) list end

When applied, Dict will propagate the identity of the type t from argument to result, but it will produce a fresh dict type upon each application.

G.3 Sharing Structure sharing is a key idea in MacQueen's original Modules design[31]. The theoretical aspects of structure sharing have been the subject of considerable research attention[23,52, 1,54,34]. However, judging from experience, structure sharing is not often used in its full generality, namely to ensure identity of values. Furthermore, experience from teaching suggests that the structure sharing concept is somewhat hard to grasp. Finally, the semantic accounts of structure sharing that have been proposed are rather complicated.

The static semantics of SML '96 has no notion of structure sharing. However, SML '96 does provide a weaker form of structure sharing constraints, in which structure sharing is regarded as a derived form, equivalent to a collection of type sharing constraints.

G.3.1 Type Sharing In SML '90, a type sharing constraint sharing type longtycon 1 = \Delta \Delta \Delta  = longtycon n was an admissible form of specification. In SML '96 such a constraint does not stand by itself as a specification, but may be used to qualify a specification. Thus there is a new form of specification, which we shall call a qualified specification:

spec sharing type longtycon 1 = \Delta \Delta \Delta  = longtycon n Here the long type constructors have to be specified by spec. The type constructors may have been specified by type, eqtype or datatype specifications, or indirectly through signature identifiers and include. In order for the specification to be legal, all the type constructors must denote flexible type names. More precisely, let B be the basis in which the qualified specification is elaborated. Let us say that a type name t is rigid (in B) if t 2 T of B and that t is flexible (in B) otherwise. For example int is rigid in the initial basis and every datatype declaration introduces additional rigid type names into the basis. For the qualified specification to elaborate in basis B, it is required that each longtycon i denotes a type name which is flexible in B. In particular, no longtycon i may denote a type function which is not also a type name (e.g., a longtycon must not denote \Lambda ():s \Lambda  s).

For example, the two signature expressions

80 G APPENDIX: WHAT IS NEW?

sig sig

type s type s type t datatype t = C sharing type s = t sharing type s = t end end

are both legal. By contrast, the signature expressions

sig sig

type s type s = int type t = s*s datatype t = C sharing type s = t sharing type s = t end end

are both illegal.

G.3.2 The equality attribute of specified types If spec sharing type longtycon 1 = \Delta \Delta \Delta  = longtycon n elaborates successfully, then all n type constructors will thereafter denote the same type name. This type name will admit equality, if spec associates an equality type name with one of the type constructors. Thus

eqtype t type u sharing type t = u

is legal and both t and u are equality types after the sharing qualification. The mechanism for inferring equality attributes for datatype specifications is the same as for inferring equality attributes for datatype declarations. Thus the specification

datatype answer = YES -- NO datatype 'a option = Some of 'a -- None

specifies two equality types. Every specification of the form datatype datdesc introduces one type name for each type constructor described by datdesc. The equality attribute of such a type name is determined at the point where the specification occurs. Thus, in

type s datatype t = C of s

the type name associated with t will not admit equality, even if s later is instantiated to an equality type. Type names associated with datatype specifications can be instantiated to other type names by subsequent type sharing or where type qualifications. In this case, no effort is made to ban type environments that do not respect equality. For example,

G.3 Sharing 81

sig

eqtype s datatype t = C of int -? int sharing type s = t end

is legal in SML '96, even though it cannot be matched by any real structure.

G.3.3 Structure Sharing For convenience, structure sharing constraints are provided, but only as a shorthand for type sharing constraints. There is a derived form of specification

spec sharing longstrid 1 = \Delta \Delta \Delta  = longstrid k (k * 2) Here spec must specify longstrid 1; . . . ; longstrid k. The equivalent form consists of spec qualified by all the type sharing constraints

sharing type longstrid i:longtycon = longstrid j:longtycon (1 ^ i ! j ^ k) such that both longstrid i:longtycon and longstrid j:longtycon are specified by spec.

In SML '90, structure sharing constraints are transitive, but in SML '96 they are not. For example,

structure A: sig type t end structure B: sig end structure C: sig type t end sharing A=B=C

induces type sharing on t, whereas

structure A: sig type t end structure B: sig end structure C: sig type t end sharing A=B sharing B=C

induces no type sharing. Thus a structure sharing constraint in some cases induces less sharing in SML '96 than in SML '90.

Next, SML '96 does not allow structure sharing equations which refer to "external" structures. For example, the program

structure A= struct end; signature SIG = sig structure B : sig end

sharing A = B end;

82 G APPENDIX: WHAT IS NEW? is not legal in SML '96, because the sharing constraint now only qualifies the specification structure B: sig end, which does not specify A. Thus not all legal SML '90 signatures are legal in SML '96.

The removal of structure sharing has a dramatic simplifying effect on the semantics. Most importantly, the elaboration rules can be made monogenic (i.e., "deterministic"), up to renaming of new type names. The need for the notion of principal signature (and even equality-principal signature) disappears. The notions of structure name, structure consistency and well-formed signature are no longer required. The notion of cover can be deleted. Only one kind of realisation, namely type realisation, remains. The notion of type-explication has been removed, since it can be proved that signatures automatically are type-explicit in the revised language.

G.4 Value Polymorphism Imperative types are somewhat subtle and they propagate into signatures in an unpleasant way. Experiments on existing code suggest that the power of imperative types is rarely used fully and that value polymorphism, which can in fact be seen as a restriction of the imperative type discipline, usually suffices[56]. With value polymorphism, there is only one kind of type variable. The definition of non-expansive expressions (see G.13 below) is relaxed to admit more expressions. In a declaration

val x = exp the variable x will only be given a non-trivial polymorphic type scheme (i.e., a type scheme which is not also a type) if exp is non-expansive. This applies even if there is no application of ref in the entire program.

G.5 Identifier Status The 1990 Definition treated identifier status informally (in Section 2.4); a fuller definition was given in the Commentary[40, Appendix B]. However, some problems with the handling of exception constructors remained[26, Sect. 10.3].

In the present document, we have collapsed the three identifier classes Var, ExCon and Con into a single class, VId, of value identifiers. The semantic objects VE previously called variable environments are replaced by value environments. A value environment maps value identifiers to pairs of the form (o; is), where o is some semantic object and is is an identifier status (is 2 fv; c; eg) indicating whether the identifier should be regarded as a value variable (v), a value constructor (c) or an exception constructor (e). These changes have been carried out both in the static and in the dynamic semantics, for both Core and Modules.

The definition of enrichment has been modified so as to allow that an identifier which has been specified as a value can be matched by a value constructor or an exception constructor. However, a specification of a value or exception constructor must be matched by a value or exception constructor, respectively.

G.6 Replication of Datatypes 83

Thus, the status descriptor says more than just what the lexical status of the identifier is -- it is a statement about the value in the corresponding dynamic environment: if the status of id in the static environment is c, then the value in a matching dynamic environment must be a value constructor. Similarly, if the status of id in the static environment is e, then the value in a matching dynamic environment must be an exception name. If the status of id is just v, however, the corresponding value in the dynamic environment can be any kind of value (of the appropriate type), including a value constructor and an exception name.

The exception environment (EE) has been deleted from the semantics, since it is no longer required for the definition of enrichment. Also, the constructor environment CE in the static semantics has been turned into a value environment.

The new handling of identifier status admits some val rec declarations that were illegal in SML '90 (see the comment to Rule 26).

G.6 Replication of Datatypes SML '96 allows datatype replication, i.e. declarations and specifications of the form

datatype tyvarseq tycon = datatype tyvarseq longtycon When elaborated, this binds type constructor tycon to the entire type structure (value constructors included) to which longtycon is bound in the context. Datatype replication does not generate a new datatype: the original and the replicated datatype share.

Here is an example of a use of the new construct:

signature MYBOOL = sig

type bool val xor: bool * bool -? bool end; structure MyBool: MYBOOL = struct

datatype bool = datatype bool (* from the initial basis *) fun xor(true, false) = true

-- xor(false, true) = true -- xor . = false end; val x = MyBool.xor(true, false);

Here MyBool.xor(true, false) evaluates to true. Note the use of transparent signature matching; had opaque matching been used instead, the declaration of x would not have elaborated.

A datatype replication implicitly introduces the value constructors of longtycon into the current scope. This is significant for signature matching. For example, the following program is legal:

84 G APPENDIX: WHAT IS NEW?

datatype t0 = C; structure A : sig type t val C: t end =

struct

datatype t = datatype t0 end;

Note that C is specified as a value in the signature; the datatype replication copies the value environment of t0 into the structure and that is why the structure contains the required C value.

To make it possible for datatype replication to copy value environments associated with type constructors, the dynamic semantics has been modified so that environments now contain a TE component (see Figure 13, page 38). Further, in the dynamic semantics of modules, the # operation, which is used for cutting down structures when they are matched against signatures, has been extended to cover the TE component (see page 49). In the above example, the value environment ascribed to A.t will be empty, signifying that the type has no value constructors. Had the signature instead been

sig datatype t val C: t end then the signature matching would have ascribed A.t a value environment with domain fCg, indicating that A.t has value constructor C.

When the datatype replication is used as a specification, longtycon can refer to a datatype which has been introduced either by declaration or by specification. Here is an example of the former:

datatype t = C -- D; signature SIG = sig

datatype t = datatype t (* replication is not recursive! *) val f: t -? t end

G.7 Local Datatypes This change is concerned with expressions of the form let dec in exp end in which dec contains a datatype declaration. Let us refer to such a datatype declaration as a local datatype declaration. There are two reasons why changes to the handling of local datatype declarations are necessary.

The first is that the rule given for elaboration of let-expressions in the 1990 Definition is unsound[26]; the problem has to do with the ability to export type names of locally declared datatypes out of scope.

The second is that the static semantics relies on the following invariant about all contexts, C, which arise in elaboration from the initial basis:

tynames C ` T of C

G.8 Principal Environments 85 This invariant is used, for example, in the rule for elaborating datatype declarations, where type names are picked "fresh" with respect to T of C. As pointed out by Kahrs, the second premise of rule 16 in the 1990 Definition violates the above invariant.

To solve the first problem, the rule for elaborating let-expressions (rule 4 in the present document) has been provided with a side-condition which prevents the type of exp from containing type names generated by dec. For example,

let datatype t = C in C end was legal SML '90 but is not legal SML '96.

To solve the second problem, a side-condition has been added in the rule for matches and the rule for val rec (rules 14 and 26 of the present document). As a consequence, again fewer programs elaborate. For example, the expression

fn x =? let datatype t = C

val . = if true then x else C in 5 end

is not legal SML '96, although it was legal SML '90.

G.8 Principal Environments In SML '90, the elaboration rule for the production strdec  dec is

C of B ` dec ) E E principal for dec in (C of B)

B ` dec ) E

The side-condition forces the type scheme in E to be as general as possible. However, this side-condition would be undesirably restrictive in SML '96, since the new definition of the Clos operation admits less polymorphism than the one used in SML '90. For example, neither

val f = (fn x =? x)(fn x =? x) structure A = struct end val y = f 7

nor

structure A: sig val f: int -? int end =

struct

val f = (fn x =? x)(fn x =? x) end

would be legal in SML '96, if the side-condition were enforced. (A type-checker may at first infer the type 'a ! 'a from the declaration of f, but since (fn x =? x)(fn x =? x) is expansive, the generalisation to 8'a:'a ! 'a is not allowed.) By dropping

86 G APPENDIX: WHAT IS NEW? the side-condition, it becomes possible to have the textual context of a structure-level declaration constrain free type variables to monotypes. Thus both the above examples can be elaborated.

Rather than lifting the notion of principal environments to the modules level, we have chosen to drop the requirement of principality. Since the notion of principal environments is no longer used in the rules, even the definition of principal environments has been removed. In practice, however, type checkers still have to infer types that are as general as possible, since implementations should not reject programs for which successful elaboration is possible.

In order to avoid reporting free type variables to users, rule 87 requires that the environment to which a topdec elaborates must not contain free type variables. It is possible to satisfy this side-condition by replacing such type variables by arbitrary monotypes; however, implementers may instead choose to refuse elaboration in such situations.

G.9 Consistency and Admissibility The primary purpose of consistency in SML '90 was to allow a very simple elaboration rule for structure sharing. A secondary purpose was to ban any signature which, because it specifies a datatype in inconsistent ways (e.g. with different constructors), can never be matched. With the removal of structure sharing, the primary purpose of consistency has gone away. In our experience, the secondary purpose has turned out not to be very significant in practice. Textual copying of datatype specifications in different signatures is best avoided, since changes in the datatype will have to be done several places. In practice, it is better to specify a datatype in one signature and then access it elsewhere using structure specifications or include. In SML '90 one could specify sharing between a datatype specification and an external (i.e., declared) datatype, and a consistency check was useful in this case. But in SML '96 this form of sharing is not allowed, so there remains no strong reason for preserving consistency; therefore it has been dropped.

In SML '90, admissibility was imposed partly to ensure the existence of principal signatures (which are no longer needed) and partly to ban certain unmatchable signatures. In SML '90, admissibility was the conjunction of well-formedness, cycle-freedom and consistency. Cycle-freedom is no longer relevant, since there is no structure sharing. We have already discussed consistency. Well-formedness of signatures is no longer relevant, but the notion of well-formed type structures is still relevant. It turns out that well-formedness only needs to be checked in one place (in rule 64). Otherwise, well-formedness is preserved by the rules (in a sense which can be made precise). Thus one can avoid a global well-formedness requirement and dispense with admissibility. This we have done.

G.10 Special Constants The class of special constants has been extended with word and char constants and with hexadecimal notation. Also, there are additional escape sequences in strings and support for UNICODE characters. See Section 2.2.

G.11 Comments 87 G.11 Comments A clarification concerning unmatched comment brackets was presented in the Commentary; subsequently, Stefan Kahrs discovered a problem with demanding that an unmatched *) be reported by the compiler. In SML '96, we therefore simply demand that an unmatched (* must be reported by the compiler.

G.12 Infixed Operators The rules for associativity of infix operators at the same level of precedence have been modified, to avoid confusion between right- and left-associative operators with the same binding precedence (see Section 2.6).

G.13 Non-expansive Expressions The class of non-expansive expressions (Section 4.7) has been extended, to compensate for the loss of polymorphism which value polymorphism entails.

G.14 Rebinding of built-in identifiers In SML '96, no datbind or exbind may bind true, false, it, nil, :: or ref (Section 2.9). Similarly, no datdesc or exdesc may describe any of these identifiers (Section 3.5). These changes are made in order to fix the meaning of derived forms and to avoid ambiguity in the handling of ref in the dynamic semantics of the Core.

G.15 Grammar for Modules There are several new derived forms for modules, see Appendix A (Figures 18 and 19). The grammar for topdec has been modified, so that there is no longer any need to put semicolons at the end of signature and functor declarations. Empty and sequential signature and functor declarations have been removed, as they no longer serve any purpose. SML '96 has neither functor signature expressions nor functor specifications, since they could not occur in programs and did not gain wide acceptance.

G.16 Closure Restrictions Section 3.6 of the 1990 Definition has been deleted.

G.17 Specifications open and local specifications have been criticised on the grounds of programming methodology[3]. Also, they are no longer needed for defining the derived forms for functors and they conflict with a desire to have all signatures be type-explicit.

88 G APPENDIX: WHAT IS NEW?

SML '96 therefore admits neither open nor local in specifications. Moreover, sequential specifications must not specify the same identifier twice. As a consequence, the definition of type-explication has been removed: type-explication is automatically preserved by elaboration (if one starts in the initial basis) so there is no need to impose type-explicitness explicitly.

G.18 Scope of Explicit Type Variables A binding construct for explicit type variables has been introduced at val and fun (see Figure 21). For example, one can declare the polymorphic identity function by

fun 'a id(x:'a) = x There is no requirement that all explicit type variables be bound by this binding construct. For those that are not, the scope rules of the 1990 Definition apply. The explicit binding construct has no impact on the dynamic semantics. In particular, there are no explicit type abstractions or applications in the dynamic semantics.

G.19 The Initial Basis To achieve a clean interface to the new Standard ML Basis Library[17], the initial basis (Appendices C and D) has been cut down to a bare minimum. The present Definition only provides what is necessary in order to define the derived forms and special constants of type int, real, word, char and string. The following identifiers are no longer defined in the initial basis: !?, ^, !, @, Abs, arctan, chr, Chr, close in, close out, cos, Diff, Div, end of stream, exp, Exp, explode, floor, Floor, implode, input, instream, Interrupt, Io, ln, Ln, lookahead, map, Mod, Neg, not, real (the coercion function), rev, sin, size, sqrt, Sqrt, std in, std out, Sum, output, outstream, Prod, Quot. The corresponding basic values have also been deleted.

G.20 Overloading The Standard ML Basis Library[17] rests on an overloading scheme for special constants and pre-defined identifiers. We have adopted this scheme (see Appendix E).

REFERENCES 89 References

[1] Maria Virginia Aponte. Extending record typing to type parametric modules with

sharing. In Proc. of the Twentieth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), pages 465-478. ACM Press, January 1993.

[2] Andrew Appel, David MacQueen, Robin Milner, and Mads Tofte. Unifying ex

ceptions with constructors in standard ml. LFCS Report Series ECS-LFCS-88-55, Laboratory for Foundations of Computer Science, Edinburgh University, Mayfield Rd., EH9 3JZ Edinburgh, U.K., June 1988.

[3] Andrew W. Appel. A critique of Standard ML. Journal of Functional Programming,

3(4):391-429, October 1993.

[4] Andrew W. Appel and David B. MacQueen. A Standard ML compiler. In Gilles

Kahn, editor, Functional Programming Languages and Computer Architecture. ACM, Springer-Verlag, Sept 1987.

[5] Lennart Augustsson and Thomas Johnsson. Lazy ML user's manual. Technical

report, Department of Computer Science, Chalmers University of Technology, 1987.

[6] Dave Berry. The Edinburgh SML Library. Technical Report ECS-LFCS-91-148,

Laboratory for Foundations of Computer Science, Department of Computer Science, Edinburgh University, April 1991.

[7] Dave Berry. Lessons from the design of a Standard ML library. Journal of Functional

Programming, 3(4):527-552, October 1993.

[8] Lars Birkedal, Nick Rothwell, Mads Tofte, and David N. Turner. The ML Kit (Ver

sion 1). Technical Report DIKU-report 93/14, Department of Computer Science, University of Copenhagen, Universitetsparken 1, DK-2100 Copenhagen, 1993.

[9] R. M. Burstall and R. Popplestone. POP-2 reference manual. In Dale and Michie,

editors, Machine Intelligence 2. Oliver and Boyd, 1968.

[10] Rod Burstall and Joseph A. Goguen. Putting theories together to make specifica

tions. In Proc. Fifth Int'l Joint Conf. on Artificial Intelligence, pages 1045-1058, 1977.

[11] Rod Burstall, David MacQueen, and Donald Sannella. HOPE: An experimental

applicative language. In Proc. 1980 LISP Conference, pages 136-143, Stanford, California, 1980. Stanford University.

[12] Guy Cousineau, Pierre-Louis Curien, and Michel Mauny. The categorical abstract

machine. Science of Computer Programming, 8, May 1987.

90 REFERENCES [13] H. B. Curry. Modified basic functionality in combinatory logic. Dialectica, 23:83-92,

1969.

[14] Luis Damas and Robin Milner. Principal type schemes for functional programs. In

Proc. Ninth ACM Symposium on Principles of Programming Languages, pages 207- 212, 1982.

[15] Luis Manuel Martins Damas. Type Assignment in Programming Languages. PhD

thesis, Edinburgh University, 1985.

[16] Thierry Despeyroux. Executable specifications of static semantics. In Gilles Kahn,

David MacQueen, and Gordon Plotkin, editors, Semantics of Data Types, volume 173 of Lecture Notes in Computer Science. Springer Verlag, June 1984.

[17] E.R. Gansner and J.H. Reppy (eds.). The Standard ML Basis Library reference

manual. (In preparation).

[18] Michael Gordon, Robin Milner, and Christopher Wadsworth. Edinburgh LCF: A

Mechanized Logic of Computation, volume 78 of Lecture Notes in Computer Science. Springer Verlag, 1979.

[19] M.J.C. Gordon, R. Milner, L. Morris, M.C. Newey, and C.P. Wadsworth. A meta

language for interactive proof in LCF. In Proc. Fifth ACM Symposium on Principles of Programming Languages, Tucson, AZ, 1978.

[20] Robert Harper and Mark Lillibridge. A type-theoretic approach to higher-order

modules with sharing. In Proc. Twenty-First ACM Symposium on Principles of Programming Languages, pages 123-137, Portland, OR, January 1994.

[21] Robert Harper and Mark Lillibridge. A type-theoretic approach to higher-order

modules with sharing. In Conference Record of POPL '94: 21st ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, pages 123-137. ACM Press, January 1994.

[22] Robert Harper, David MacQueen, and Robin Milner. Standard ML. Technical Re

port ECS-LFCS-86-2, Laboratory for Foundations of Computer Science, Edinburgh University, March 1986.

[23] Robert Harper, Robin Milner, and Mads Tofte. A type discipline for program mod

ules. In Proc. Int'l Joint Conf. on Theory and Practice of Software Development (TAPSOFT), pages 308-319. Springer-Verlag, Mar. 1987. Lecture Notes in Computer Science, Vol. 250.

[24] Robert Harper and John C. Mitchell. On the type structure of Standard ML. ACM

Trans. on Prog. Lang. and Sys., 15(2):211-252, April 1993.

[25] J. Roger Hindley. The principal type scheme of an object in combinatory logic.

Transactions of the American Mathematical Society, 146:29-40, 1969.

REFERENCES 91 [26] Stefan Kahrs. Mistakes and ambiguities in the Definition of Standard ML. Techni

cal Report ECS-LFCS-93-257, Dept. of Computer Science, University of Edinburgh, 1993.

[27] Peter J. Landin. The next 700 programming languages. Comm. ACM, 9(3):57-164,

1966.

[28] Xavier Leroy. Polymorphism by name. In Proc. Twentieth ACM Symposium on

Principles of Programming Languages, January 1993.

[29] Xavier Leroy. Manifest types, modules and separate compilation. In Conference

Record of POPL '94: 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 109-122. ACM Press, January 1994.

[30] Xavier Leroy. The Caml Special Light system. Software and documentation available

on the Web, http://pauillac.inria.fr/csl/, 1995.

[31] D. MacQueen. Modules for Standard ML. In Conf. Rec. of the 1984 ACM Symp. on

LISP and Functional Programming, pages 198-207, Aug. 1984.

[32] David MacQueen. Using dependent types to express modular structure. In Proc.

Thirteenth ACM Symposium on Principles of Programming Languages, 1986.

[33] David. B. MacQueen. Structures and parameterisation in a typed functional lan

guage. In Proc. Symposium on Functional Programming and Computer Architecture, Aspinas, Sweden, 1981.

[34] David B. MacQueen and Mads Tofte. A semantics for higher-order functors. In

Donald Sannella, editor, Proceedings of the 5th European Symposium on Programming (ESOP), volume 788 of Lecture Notes in Computer Science, pages 409-423. Springer-Verlag, 1994.

[35] John McCarthy. LISP 1.5 Programmer's Manual. MIT Press, 1956. [36] D. Meredith. In memoriam Carew Arthur Meredith. Notre Dame Journal of Formal

Logic, 18:513-516, 1977.

[37] Robin Milner. A theory of type polymorphism in programming languages. J. Com

puter and Systems Sciences, 17:348-375, 1978.

[38] Robin Milner. How ML evolved. Polymorphism: The ML/LCF/Hope Newsletter,

1(1), 1983.

[39] Robin Milner. Changes to the Standard ML core language. Technical Report ECS

LFCS-87-33, Laboratory for Foundations of Computer Science, Edinburgh University, 1987.

[40] Robin Milner and Mads Tofte. Commentary on Standard ML. MIT Press, 1991.

92 REFERENCES [41] Robin Milner, Mads Tofte, and Robert Harper. The Definition of Standard ML. MIT

Press, 1990.

[42] James H. Morris. Lambda Calculus Models of Programming Languages. PhD thesis,

MIT, 1968.

[43] Colin Myers, Chris Clack, and Ellen Poon. Programming with Standard ML. Prentice

Hall, 1993.

[44] Laurence C. Paulson. ML for the Working Programmer (2nd edition). Cambridge

University Press, 1996.

[45] Lawrence C. Paulson. Logic and Computation: Interactive Proof with LCF. Cam

bridge Tracts in Theoretical Computer Science. Cambridge University Press, 1987.

[46] Gordon Plotkin. A structural approach to operational semantics. Technical Report

DAIMI-FN-19, Computer Science Department, Aarhus University, 1981.

[47] John A. Robinson. A machine-oriented logic based on the resolution principle. J.

ACM, 12(1):23-41, 1965.

[48] Donald Sannella and Andrzej Tarlecki. Program specification and development in

Standard ML. In Proc. Twelfth ACM Symposium on Principles of Programming Languages, New Orleans, 1985.

[49] Ryan Stansifer. ML Primer. Prentice Hall, 1992. [50] Chris Stone and Robert Harper. A type-theoretic account of Standard ML 1996.

Technical Report CMU-CS-96-136, School of Computer Science, Carnegie Mellon University, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213-3891, May 1996.

[51] David Tarditi, Greg Morrisett, Perry Cheng, Chris Stone, Robert Harper, and Peter

Lee. TIL: A type-directed optimizing compiler for ML. In Proc. ACM SIGPLAN Symposium on Programming Language Design and Implementation, Philadelphia, PA, May 1996.

[52] Mads Tofte. Operational Semantics and Polymorphic Type Inference. PhD thesis,

Edinburgh University, Department of Computer Science, Edinburgh University, Mayfield Rd., EH9 3JZ Edinburgh, May 1988. Available as Technical Report CST-52-88.

[53] Mads Tofte. Type inference for polymorphic references. Information and Computa

tion, 89(1), November 1990.

[54] Mads Tofte. Principal signatures for higher-order program modules. Journal of

Functional Programming, 4(3):285-335, July 1994.

[55] Jeffrey D. Ullman. Elements of ML Programming. Prentice Hall, 1994.

REFERENCES 93 [56] Andrew Wright. Simple imperative polymorphism. Journal of Lisp and Symbolic

Computation, 8(4):343-355, December 1995.Index

Concepts that are associated with a special name (e.g, a meta-variable or a mathematical symbol) are listed under both the special name and the full name, often with the former entry being a see reference to the latter. For example, "value binding" is indexed as follows:

valbind . see value binding ValBind (value bindings), 7 \Delta \Delta \Delta  value binding (valbind ), 7, 8, 10, 24, 44

recursive (rec), 3, 9, 10, 24, 44, 56 simple, 10, 24, 39, 44

Items are ordered lexicographically, by extending the usual alphabetical ordering on letters using the rule space = hyphen ! symbol or greek letter ! digit ! latin letter. Upper and lower case letters are regarded as equal and font does not affect ordering. Adjective-noun compounds are normally found under the noun. For example, "nonexpansive expression" is found under "expression". The index includes concepts and identifiers which have been removed from the 1990 Definition.

( )

around expression, 10, 21, 41, 63 around pattern, 8, 26, 46, 65 in sequence, 57, 63 in tuple expression, 57, 63 in tuple pattern, 58, 65 in type expression, 8, 27 [ ], 57, 58, 63, 65 - ""

in atomic expression, 10, 21, 41, 63 in pattern, 8, 26, 46, 65

in record type expression, 8, 27, 65 (* *) (comment brackets), 4, 5, 87 h i. see option () (0-tuple), 57, 58, 63, 65 fg (empty map), 15 , (comma), 3, 7, 57, 58, 61, 63, 65 ; (semicolon), 3, 10, 13, 14, 54, 57, 63,

64, 87 . (period)

in real constants, 3 in long identifiers, 4 ... (wildcard pattern row), 8, 26, 27, 46,

65 . (underbar)

in identifier, 5 wildcard pattern, 8, 26, 45, 65 --

in identifier, 5 reserved word, 3, 10, 63, 64, 65 =

basic value (value equality), 38, 67 in identifier, 5 reserved word, 3, 5, 8, 10, 13, 14, 58,

59, 60, 63, 64, 65 the identifier, 5, 66, 67 ), vii, 2, 20, 31, 40, 49, 54 !. see function type

fin! (finite map), 16

=?, 3

in a match rule, 10, 63 -?, 3, 8, 27, 65 ~, 3, 69 ", 3 "", 4 ", 3, 5 "", 4 !, 5, 88 %, 5 &, 5 $, 5

94

INDEX 95 #

in character constant, 4 in record field selection, 57, 63 in identifier, 5 +, 5, 69 -, 5, 69 /, 5, 69 :

in identifier, 5 reserved word, 3, 10, 13, 14, 58, 59,

63, 64, 65 see also structure expression; expres

sion ::, 9, 12, 57, 66, 67, 87 :=. see assignment :?, 11, 13, 59

see also structure expression !, 5, 69 ?, 5, 69 !=, 69 ?=, 69 !?, 88 ?, 5 @, 5, 88 ', 5 `, 5 ^, 5, 88 *, 5

in type expression, 58, 65 multiplication, 69 +. see modification \Phi , 17 *. see expression, fn \Lambda  (in type function), 16, 17, 18, 23, 25,

29, 33, 34 8 (in type scheme), 17, 18, 19

see also generalisation; polymorphism ff. see type variable %. see row type o/ . see type o/ (k). see type vector oe. see type scheme 8ff(k):o/ . see type scheme

#. see restriction `. see type function (`; VE). see type structure \Lambda ff(k):o/ . see type function \Sigma . see signature (T)E. see signature \Phi . see functor signature (T)(E; (T 0)E0). see functor signature 'Ty (formerly type realisation), 82 'Str (formerly structure realisation), 82 '. see type realisation *. see instance O/. see generalisation; enrichment ` (turnstile), vii, 2, 20, 31, 40, 49, 54 `DYN (evaluation), 54 `STAT (elaboration), 54 0w. see word constant 0wx. see word constant 0x. see integer constant

A a. see address "a, 3 Abs (abstype operation), 20, 23 abs, 69 Abs, 88 abstraction, 78 abstype, 3

see also declaration addition of numbers (+), 5, 69 Addr (addresses), 37 address (a), 37, 47

fresh, 42 admissibility, 86 alphabets (in string and character con

stants), 3, 4 and, 3, 10, 13, 14, 59, 60, 64 andalso, 3

see also expression appending lists (@), 88 appexp (application expression), 61, 63 application

infixed, 6, 61, 63, 87

96 INDEX

of :=, 42 of basic value (APPLY), 42 of exception name, 41 of function closure, 42 of functor, 13, 31, 32, 50, 59 of ref, 42 of type function, 18, 27, 34 of value constructor, 41 APPLY. see application arctan, 88 arity

of type function, 17 of type name, 15 arrow type. see function type as, 3

see also atomic pattern, layered assignment (:=), 38, 42, 66, 67 atexp. see atomic expression atomic expression (atexp), 7, 10, 41, 63

() (0-tuple, unit value), 57, 63 as expression, 10, 22, 41, 63 let, 10, 21, 41, 63, 84 list expression ([ ,\Delta \Delta \Delta , ]), 57, 63 long value identifier, 10, 21, 41, 63 parenthesised (( )), 10, 21, 41, 63 record expression, 10, 21, 41, 63 record selector (#lab), 57, 63 sequence expression (( ;\Delta \Delta \Delta ; )), 57,

63 tuple expression (( ,\Delta \Delta \Delta , )), 57, 63 see also special constant atomic pattern (atpat), 7, 8, 26, 45-46,

65 () (0-tuple), 58, 65 as pattern, 8, 26, 46, 65 list pattern ([ ,\Delta \Delta \Delta , ]), 58, 65 long value identifier, 8, 26, 45, 65 parenthesised (( )), 8, 26, 46, 65 record pattern, 8, 26, 46, 65 tuple pattern (( ,\Delta \Delta \Delta , )), 58, 65 wildcard (.), 8, 26, 45, 65 see also special constant atpat . see atomic pattern

B b. see value "b, 3 B. see basis B0 (initial basis)

dynamic, 67, 88 static, 66, 88 bare language, 1 BasExName (basic exception names), 39 basis (B), 1

combined, 1, 54 dynamic, 1, 48, 54, 67 initial. see B0 static, 1, 21, 29, 31, 54, 66 Basis (bases), 29, 48 Basis Library, 2, 4, 76, 88 BasVal (basic values), 37, 88 BDYN (dynamic basis), 54 Bind (exception), 39, 44, 66, 67 bool, 66, 67 bound names, 29, 30 BSTAT (static basis), 54

C c (value constructor status), 15, 16

see also identifier status descriptor;

value identifier C. see context CAML, 73 Caml Light, 74, 77 case, 3

see also expression CE (constructor environment), 83 char (the type), 15, 66, 67 Char (overloading class), 68 character constant, 4

see also special constant chr, 88 Chr, 88 CLEAR, 71, 72 Clos (closure of types etc.), 19, 23, 25, 33,

35

INDEX 97 close.in, 88 close.out, 88 closure. see function closure; functor clo

sure; Clos; closure rules closure rules (signatures and functors), 87 coercion of numbers (real), 88 comment in program, 4, 5, 87 Commentary on Standard ML, vii composition of functions (o), 88 con (value constructor), 82 Con (value constructors), 82 conbind . see constructor binding ConBind (constructor bindings), 7 concatenating strings (^), 88 condesc. see constructor description ConDesc (constructor descriptions), 12 ConEnv (constructor environments), 83 "consing" an element onto a list. see :: consistency, 86 constant, special. see special constant constructor binding (conbind ), 7, 10, 25,

44, 64 constructor description (condesc), 12, 14,

35, 53 constructor environment (CE), 83 ConsType (constructed types), 16 context (C), 16, 17, 19, 20, 21, 32 Context (contexts), 16 control character, 3 Core Language, 1

dynamic semantics, 37-47 static semantics, 15-28 syntax, 3-10, 55 Core Language Programs, 55 cos, 88 cover, 82 cycle-freedom, 86

D datatype, 3

see also declaration; specification datatype binding (datbind ), 7, 8, 10, 25,

44, 64

datatype declaration. see declaration datatype description (datdesc), 12, 14, 35,

52 datatype replication. see declaration; spec

ification datatype specification. see specification datbind . see datatype binding DatBind (datatype bindings), 7 datdesc. see datatype description DatDesc (datatype descriptions), 12 dec. see declaration Dec (declarations), 7 decimal notation. see integer constant;

word constant declaration (dec), 7, 10, 23-24, 43-44, 64

abstype, 3, 10, 23, 44, 58, 64 as structure-level declaration, 13, 32,

50 datatype replication, 10, 23, 44, 64,

83-84 datatype, 3, 10, 23, 43, 58, 64 empty, 10, 23, 44, 64 exception, 3, 10, 23, 44, 64 function (fun), 3, 56, 58, 64 infix, 3, 6, 10, 64 infrixr, 3, 6, 10, 64 local, 3, 10, 23, 44, 64 nonfix, 3, 6, 10, 64 open, 3, 10, 23, 44, 64 sequential (;), 3, 10, 24, 44, 64 type, 3, 10, 23, 43, 64 value (val), 3, 10, 23, 43, 64 see also structure-level declaration; top

level declaration dereferencing (!), 88 derived forms, 1, 7, 56-60 Diff, 88 digit

in identifier, 5 in integer- and real constants, 3 div, 69 Div, 88 division of reals (/), 69

98 INDEX do, 3

see also expression Dom (domain), 15

E e. see exception value e (exception constructor status), 15, 16 see also identifier status descriptor; value

identifier [e]. see packet e (exponent), 3 E (exponent), 3 E. see environment Edinburgh ML, 73 EE (exception constructor environment),

83 elaboration, 1, 2, 21 else, 3

see also expression en. see exception name end, 3, 10, 13, 57, 58, 63, 64 end.of.stream, 88 enrichment (O/), 30, 31, 82 ens. see exception name set Env (environments), 16, 38 environment (E)

dynamic, 38, 39, 41-47, 48, 49, 50-51,

67 principal, 85-86 static, 16, 17, 21, 23-24, 29, 30, 31-36 eqtype, 11

see also specification equality

admit equality, 15, 17, 20, 29, 33, 34,

66 maximise equality, 20, 23, 34 on abstract types, 20 of structures (sharing), 79 of values. see = respect equality, 20 equality attribute

of type name, 15, 17, 20, 24, 33, 34,

80

of type variable, 5, 15, 17, 18, 19, 66 equality type, 17, 66 equality type function, 18 equality type specification. see specifica

tion equality type variable, 5, 15, 17, 18, 19,

66 escape sequence, 3-4 EtyVar (equality type variables), 5, 15 evaluation, vi, 1, 2, 40, 50 exbind . see exception binding ExBind (exception bindings), 7 exception, 3

see also declaration; specification exception binding (exbind ), 7, 8, 10, 25,

44, 64 exception constructor, 16, 82

see also value identifier exception constructor environment (EE),

83 exception convention, 40 exception declaration. see declaration. exception description (exdesc ), 12, 14, 35,

53 exception name (en), 37

fresh, 37, 45 as value, 38 exception name set (ens), 38, 45 exception packet. see packet exception specification. see specification exception value (e), 38, 42 excon (exception constructor), 82 ExCon (exception constructors), 82 ExConEnv (exception constructor envi

ronments), 83 exdesc. see exception description ExDesc (exception descriptions), 12 execution, 1, 54 exn, 22, 66, 67 ExName (exception names), 37 ExNameSet (exception name sets), 38 exp. see expression Exp (expressions), 7

INDEX 99 exp (exponential), 88 Exp (exception), 88 explode (operation on strings), 88 expression (exp), 7, 10, 63

application, 10, 22, 63 case, 57, 63 conditional (if\Delta \Delta \Delta then\Delta \Delta \Delta else), 57, 63 expansive, 19, 20, 82 explicitly typed (:), 10, 22, 63 handle, 10, 22, 40, 42, 63 infixed, 10, 63 lambda-abstraction (fn), 10, 22, 42,

63 non-expansive, 19, 20, 82, 87 raise, 10, 22, 42, 63 sequential and (andalso), 57, 63 sequential or (orelse), 57, 63 while\Delta \Delta \Delta do, 57, 63 see also atomic expression expression row (exprow ), 7, 8, 10, 22, 41,

63 exprow . see expression row ExpRow (expression rows), 7 ExVal (exception values), 38

F F . see functor environment "f, 3 FAIL (failure in pattern matching), 37,

45, 46 false, 9, 12, 57, 66, 67, 87 FAM (Functional Abstract Machine), 73 FcnClosure (function closures), 38, 39

fin! (finite map), 16

Fin (finite subset), 16 floor, 88 Floor, 88 fn(*), 3

see also expression formatting character, 4 fun, 3

see also declaration

funbind . see functor binding FunBind (functor bindings), 12 function closure, 38, 39, 42 function declaration. see declaration function type (!), 16, 22, 25, 26, 27, 35,

66 function type expression (-?), 3, 8, 27, 65 function-value binding (fvalbind ), 56, 58,

64 exhaustive, 28 functor, 11 see also functor declaration functor application, 13, 31, 32, 50, 59 functor binding (funbind ), 12, 14, 36, 53,

59 functor closure, 48 functor declaration (fundec), 12, 14, 35,

53, 59 in top-level declaration, 14, 36, 53 functor description, 87 functor environment (F )

dynamic, 48, 50, 53 static, 29, 31, 35, 36 functor identifier (funid ), 11 functor signature (\Phi ), 29, 30, 31, 32, 36 functor signature expression, 87 functor signature matching, 87 functor specification, 87 FunctorClosure (functor closures), 48 fundec (functor declaration), 12, 14, 35,

53 FunDec (functor declarations), 12 fundesc (functor description), 87 FunDesc (functor descriptions), 87 FunEnv (functor environments), 29, 48 funid . see functor identifier FunId (functor identifiers), 11 FunSig (functor signatures), 29 funsigexp (functor signature expression),

87 FunSigExp (functor signature expressions),

87 funspec (functor specification), 87

100 INDEX FunSpec (functor specifications), 87 FunType (function types), 16 FvalBind (function-value bindings), 56 fvalbind . see function-value binding

G G. see signature environment generalisation (O/), 18, 21, 26, 30 grammar,

for the Core, 7-10 for Modules, 11-14

H handle, 3

see also expression hexadecimal notation. see integer con

stant; word constant HOPE, 71, 72

I I see interface IB see interface basis identifier (id ), 4-5, 11

alphanumeric, 5 long, 4 qualified, 4, 5 rebinding of, 8-9, 12, 24 symbolic, 5 see also atomic expression; atomic pat

tern; value identifier IdStatus (identifier status descriptors), 15 identifier status descriptor (v, c, e),

defined, 15, 16 dynamic semantics, 39, 41, 45, 46, 47,

48, 49, 53, 67 motivated, 82-83 signature matching, 30, 82-83 static semantics, 19, 21, 25, 26, 27,

30, 35 if, 3

see also expression implementation, 73-75

implode (string operation), 88 in (injection), 17 in, 3, 13, 63, 64 include, 11

see also specification inference rules

dynamic semantics (Core), 40-47 dynamic semantics (Modules), 49-53 programs, 55 static semantics (Core), 21-27 static semantics (Modules), 31-36 infexp (infix expression), 61, 63 InfExp (infix expressions), 61, 63 infix, 3

see also declaration infix identifier, 6, 61, 63, 87

associativity, 6, 87 precedence, 6 scope of fixity directive, 6, 11 see also expression; pattern infixr, 3

see also declaration injection (in), 17 input, 88 input/output, 88 instance (*)

in matching, 30, 31 of functor signature, 30, 31, 32 of signature, 30, 31, instream, 88 int, 15, 66, 67 Int (overloading class), 68 Int (interfaces), 48 IntBasis (interface bases), 48 integer constant, 3

decimal notation, 3 hexadecimal notation, 3 see also special constant Inter, 48-49 interaction, 1, 54-55 interface (I), 48, 49, 50, 51, 52, 53 interface basis (IB), 48, 49, 51, 52, 53 Interrupt, 88

INDEX 101 Io, 88 ISWIM, 70 it, 9, 12, 59, 87

K keyword. see reserved word

L L (left associative), 6, 8 lab. see record label Lab (record labels), 4 lambda-abstraction (fn). see expression Lazy ML, 73 LCF, 70, 73 let, 3

see also atomic expression; structure

expression letter in identifer, 5 lexical analysis, 5 libraries, 66, 76

see also Basis Library LISP, 70 list, 66, 67 list reversal (rev), 88 literal. see special constant ln, 88 Ln, 88 local, 3

see also declaration; structure-level dec

laration lookahead, 88

M m (structure name), 82 M (structure name set), 82 map, 88 match (match), 7, 10, 23, 43, 63

exhaustive, 28, 39 in function closure, 38, 39, 42 irredundant, 28, 39 Match (matches), 7 Match (exception), 39, 42, 66, 67

match rule (mrule), 7, 10, 23, 43, 63 matching

signatures. see signature matching functor signatures, 87 maximise equality. see type environment mem. see memory Mem (memories), 38 memory (mem), 38, 42, 47 ML Kit, 74 MLWorks, 74 mod, 69 Mod, 88 modification (+)

of finite maps, 16 of environments, 17 module, v

see also structure-level declaration; func

tor declaration; Modules, 1 Moscow ML, 74 mrule. see match rule. Mrule (match rules), 7 multiplication of numbers (*), 69

N "n (newline), 3 name. see exception name; type name;

structure name Natural Semantics, 2 Neg, 88 negation of booleans (not), 88 negation of numbers (~), 3, 69 nil, 9, 12, 66, 67, 87 nonfix, 3

see also declaration not, 88 Num (overloading class), 68 NumTxt (overloading class), 68

O o (function composition), 88 of (projection), 17, 29

102 INDEX of, 3

in case expression, 58, 63 in constructor binding, 10 in constructor description, 14 in exception binding, 10 in exception description, 14 op,

before value identifier, 6, 10, 64 in constructor binding, 10 opaque signature constraint. see struc

ture expression open, 3

see also declaration; specification open.in, 88 open.out, 88 option, 7, 21

first (h i), 21 second (hh ii), 21 ord (of string), 88 Ord, 88 orelse, 3

see also expression output, 88 outstream, 88 overloading, 68-69, 88

P p. see packet Pack (packets), 38 packet (p), 38, 40, 41, 42, 43, 44, 55 parsing, 1, 61, 62 pat . see pattern. Pat (patterns), 7 patrow . see pattern row. PatRow (pattern rows), 7 pattern (pat), 7, 8, 46-47, 65

constructed, 8, 26, 46, 65 infix, 8, 65 typed (:), 8, 26, 65 layered (as), 8, 27, 47, 58, 65 see also atomic pattern pattern matching, 27, 28, 45-47

with ref, 47

pattern row (patrow), 7, 8, 26, 46, 65

wildcard(...), 8, 26, 27, 46, 65 Poly/ML, 73 polymorphism, 18, 19, 20, 21, 23, 24, 26,

30, 70-71, 72, 82 POP2, 70 POPLOG, 75 precedence, 6 principal types, v, 85, 86 printable character, 3 Prod, 88 program (program), 1, 54, 59 Program (programs), 1, 54 projection (of), 17, 29

Q Quot, 88

R r. see record "r, 3 R (right associative), 6, 8 raise, 3

see also expression Ran (range), 15 Real (overloading class), 68 real

the type, 15, 66, 67 coercion, 88 real constant, 3 realisation. see type realisation RealNum (overloading class), 68 rec, 3, 9, 10, 24, 44, 56

see also value binding Rec (recursion operator), 39, 42, 44 Record (records), 38 record (r), 38, 41, 46

see also row type; expression row; pat

tern row record label (lab), 4, 5, 8, 65 record selector (# lab), 57, 63 record type. see row type

INDEX 103 record type expression, 8, 27, 65 recursion. see rec; Rec; fun ref

the type constructor, 9, 12, 66, 67, 87 the type name, 17, 20, 66 the value constructor, 9, 12, 19, 42,

47, 66, 67, 87 references

creation. see ref dereferencing. see ref polymorphic, 19, 71, 82 renaming

of type names, 29, 32 of type variables, 18 reserved word, 3, 11 respect equality (see equality) restrictions

closure rules, 87 syntactic (Core), 8-9, 27-28 syntactic (Modules), 12 rev, 88 row type (%), 16, 22, 26, 27 RowType, 16

S s. see state SCon (special constants), 4 scon. see special constant scope

of fixity directive, 6, 11 of explicit type variable, 18, 22, 56,

88 of constructorhood. see identifier sta

tus descriptor SE see structure environment semantic object, 6, 15

compound (Core, Dynamic), 37-38 compound (Core, Static), 15-16 compound (Modules, Dynamic), 48 compound (Modules, Static), 29 simple (Dynamic), 37 simple (Static), 15 well-formed, 20, 33

semantics

of Core, 15-28, 37-47 of Modules, 29-36, 48-53 of Programs, 54-55 sentence, vii, 2, 20, 31, 40, 49, 54 sharing, 3

see also specification SI. see structure interface side-condition, 21 side-effect, 50, 55

see also assignment; ref sig, 11

see also signature expression Sig (signatures), 29 sigbind . see signature binding SigBind (signature bindings), 12 sigdec. see signature declaration SigDec (signature declarations), 12 SigEnv (signature environments), 29, 48 sigexp. see signature expression SigExp (signature expressions), 12 sigid . see signature identifier SigId (signature identifiers), 11 signature (\Sigma ), 29, 30, 31, 32, 33, 36, 49

principal, 77, 86 type-explicit, 82 signature, 11

see also signature declaration signature binding (sigbind ), 12, 13, 33, 51 signature declaration (sigdec), 12, 13, 33,

51 in top-level declaration, 14, 36, 53 signature environment (G)

dynamic, 48, 49, 51, 53 static, 29, 33, 36 signature expression (sigexp), 12, 13, 32-

33, 51 basic (sig\Delta \Delta \Delta end), 13, 32, 51 signature identifier, 13, 33, 51 type realisation (where type), 13, 33,

48, 77-78 signature identifier (sigid ), 11

as signature expression, 13, 33, 51

104 INDEX signature instantiation. see instance signature matching, 30, 31, 32

see also structure expression (: and

:?) sin, 88 size (of strings), 88 spec. see specification Spec (specifications), 12 special constant (scon), 4, 15, 86

as atomic expression, 8, 21, 41, 63 as atomic pattern, 8, 26, 45, 65 special value (sv), 37 specification (spec), 12, 14, 33-34, 51-52

datatype, 14, 33, 51 datatype replication (datatype \Delta \Delta \Delta 

= datatype), 14, 33, 52 empty, 14, 34, 52 equality type (eqtype), 14, 33, 51 exception, 14, 34, 52 include, 14, 34, 52, 60, 78 local, 87 open, 87 sequential (;), 14, 34, 52, 88 structure, 14, 34, 52 structure sharing (sharing), 56-57,

60, 81 type, 14, 33, 51 type abbreviations in, 12, 13, 60, 77-

78 type sharing (sharing type), 14, 34,

48, 79-80 value (val), 14, 33, 51 sqrt (square root), 88 Sqrt, 88 Standard ML (SML), iv, 70-76

Bare language, 1 Core language, 1, 55, 70-71 history of, 70-76 implementations, 73-75 learning about, vii Modules, 1, 71-72 origins of, 70 Programs, 1, 55

revision of, iii, 77-78 semantics, 75-76 Standard ML of New Jersey, 74 state (s), 38, 40, 42, 47, 50, 54, 55 State, 38 state convention, 40, 42 std.in, 88 std.out, 88 strbind . see structure binding StrBind (structure bindings), 12 strdec. see structure-level declaration StrDec (structure-level declarations), 12 strdesc. see structure description StrDesc (structure descriptions), 12 stream (input/output), 88 StrEnv (structure environments), 16, 38 strexp. see structure expression StrExp (structure expressions), 12 strid . see structure identifier StrId (structure identifiers), 4 String (overloading class), 68 string, 15, 66, 67 string constant, 3

see also special constant StrInt (structure interfaces), 48 struct, 11

see also structure expression structure (semantic object), 29 structure, 11

see also structure-level declaration; spec

ification structure binding (strbind ), 12, 32, 51, 59 structure declaration. see structure-level

declaration structure description (strdesc), 12, 14, 35,

53 structure environment (SE)

dynamic, 38, 49, 51, 67 static, 16, 17, 29, 30, 32, 35, 66 structure expression (strexp), 12, 13, 31-

32, 50 basic (struct\Delta \Delta \Delta end), 13, 31, 50 functor application, 13, 31, 32, 50, 59

INDEX 105

let, 13, 31, 50 long structure identifier, 13, 31, 50 opaque signature constraint (:?), 13,

31, 50, 59, 78-79 transparent signature constraint (:),

13, 31, 50, 59, 78-79 structure identifier (strid ), 4

as structure expression, 13, 31, 50 structure interface (SI), 48, 49, 53 structure-level declaration (strdec), 12, 13,

32, 50-51 empty, 13, 32, 51 in top-level declaration, 14, 36, 53 local (local\Delta \Delta \Delta in\Delta \Delta \Delta end), 13, 32, 51,

87 of structure (structure), 13, 32, 50 sequential (;), 13, 32, 51 see also declaration; structure bind

ing structure name (m), 82 structure realisation ('Str), 82 structure specification. see specification subtraction of numbers (-), 69 Sum, 88 SVal (special values), 37 Supp (support), 29 sv. see special value syntactic sugar. see derived form symbol, 5 syntax,

of Core, 3-10, 37 of Modules, 11-14 of Programs, 54

T t. see type name T . see type name set TE. see type environment textbooks, vii then, 3

see also expression TI. see type interface TIL, 74

topdec. see top-level declaration TopDec (top-level declarations), 12 top-level declaration (topdec),

in program, 54-55 sequential (no ;), 14, 36, 53 true, 9, 12, 57, 66, 67, 87 truncation of reals (floor), 88 tuple. see atomic expression; atomic pat

tern ty. see type expression Ty (type expressions), 7 tycon. see type constructor TyCon (type constructors), 4 TyEnv (type environments), 16, 38 TyInt (type interfaces), 48 TyName (type names), 15 tynames (free type names), 16, 21, 23, 24,

31, 32, 33, 35, 36 TyNameSet (type name sets), 16 typbind . see type binding TypBind (type bindings), 7 typdesc. see type description TypDesc (type descriptions), 12 type (o/ ), 16, 17, 18, 19, 21-23, 25-27, 33,

34, 35, 66 constructed (o/ (k)t), 16, 17, 18, 20, 25,

26 as type scheme, 18 default, 68 imperative, 70 principal, v, 85, 86 product, 58, 65 see also type expression; function type Type (types), 16 type, 3

see also declaration; specification type (function on special constants), 15,

21 type abbreviation in signature, 12, 13, 60,

77-78 type binding (typbind ), 7, 8, 10, 25, 44,

64

106 INDEX type constraint (:). see expression; pat

tern type constructor (tycon), 4 type constructor name. see type name type declaration. see declaration type description (typdesc), 12, 14, 35, 52 type environment (TE)

dynamic, 38, 84, 43, 44, 45, 49, 84 respect equality, 20, 80 maximise equality, 20, 23, 34 static, 16, 20, 23, 24, 25, 34, 35, 66 type explication, 82 type expression (ty), 7, 8, 27, 58, 65 type-expression row (tyrow ), 7, 8, 27, 65 type function (`), 16, 17, 18, 23, 25, 29,

33, 34 application of, 18, 27, 34 equality of, 17 type inference, 5

see also elaboration type interface (TI), 48, 49, 51, 52, 84 type name (t), 15

equality attribute of, 15, 17, 20, 24,

33, 34, 80 flexible, 33, 34, 79 fresh, 23, 24, 32, 34, 35 in constructed type, 26 in initial basis, 66 in type structure, 20, 23, 33, 34, 35 substitution for, 18, 29, 30, 33, 34 within functor body, 36 type name set (T ), 16, 17, 21, 24

in signatures, 29, 30, 31, 32, 33, 36 see also type name, fresh type realisation ('), 29, 30, 32, 33, 34, 82 type scheme (oe), 16

generalising a type (oe O/ o/ ), 18, 21,

26 generalising a type scheme (oe O/ oe0),

18, 30 equality of, 18 type sharing. see specification type specification. see specification

type structure (`; VE), 16, 23, 25, 27, 33,

34, 35, 49 enrichment, 30 in initial basis, 66 respect equality, 20, 80 well-formed, 20 type variable (tyvar , ff), 4, 5, 15

applicative, 82 equality, 5, 15, 17, 18, 19, 66 explicit, 18, 22, 56, 88 imperative, 82 implicitly scoped, 18 in type expression, 8, 27 unquarded, 18 type variable set (U ), 16, 23 type vector (o/ (k)), 16 TypeFcn (type functions), 16 TypeScheme (type schemes), 16 tyrow. see type-expression row TyRow (type-expression rows), 7 TyStr (type structures), 16 tyvar . see type variable TyVar (type variables), 4, 15, 16 tyvars (free type variables), 16, 20, 36 tyvarseq (type variable sequence), 7, 8, 12 TyVarSet, 16

U U . see type variable set "u, 4 unit, 66, 67

V v. see value "v, 3 v (value constructor status), 15, 16

see also identifier status descriptor;

value identifier val (function on special constants), 37 Val (values), 38 val, 3, 18

see also declaration; specification

INDEX 107 valbind . see value binding ValBind (value bindings), 7 valdesc . see value description ValDesc (value descriptions), 12 value (v), 38, 41-47, 48, 67

basic (b), 37, 38, 42 ValEnv (value environments), 16, 38 ValInt (value interfaces), 48 value binding (valbind ), 7, 8, 10, 24, 44

recursive (rec), 3, 9, 10, 24, 44, 56 simple, 10, 24, 39, 44 value constructor, 16, 82

see also value identifier value declaration. see declaration value description (valdesc), 12, 14, 34, 52 value environment (VE),

dynamic, 38, 39, 41, 44-47, 49, 67 static, 16, 17, 20, 23-27, 30, 33-35,

49, 66 value identifier (vid , longvid ), 4, 8, 82

as atomic expression, 10, 21, 41, 63 as atomic pattern, 8, 26, 45, 65 as value, 38, 41, 45, 46, 67 status of (c, v, e), see identifier status

descriptor value interface (VI), 48, 49, 51, 52, 53,

82-83 value variable, 16, 82

see also value identifier value specification. see specification var (value variable) , 82 Var (value variables), 82 VarEnv (variable environments), 82 variable. see value variable variable environment (VE), 82 VE (variable environment), 82 VI. see value interface vid . see value identifier VId (value identifiers), 4, 82 view of a structure, 48

W well-formedness

of assembly, 20, 33, 86 of functor signature, 86 of signature, 86 of type structure, 20, 86 while, 3

see also expression wildcard pattern (.), 8, 26, 45, 65 wildcard pattern row (...), 8, 26, 27, 46,

65 with, 3

see also declaration, abstype withtype, 3, 56, 58 Word (overloading class), 68 word (the type), 15, 66, 67 word constant, 3

decimal notation, 3 hexadecimal notation, 3 see also special constant WordInd (overloading class), 68

Y Yield, 29