-
Notifications
You must be signed in to change notification settings - Fork 68
/
Copy pathx6.html
91 lines (90 loc) · 4.98 KB
/
x6.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
<!DOCTYPE html>
<html class="split chapter"><head><meta charset="utf-8"><title>6 Source Text # Ⓣ Ⓔ ① Ⓐ — Annotated ES5</title><link rel="stylesheet" href="style.css"><link href="x5.html" title="5 Notational Conventions " rel="prev">
<link href="spec.html" title="TOC" rel="index">
<link href="x7.html" title="7 Lexical Conventions " rel="next">
</head><body><div class="head">
<h2 id="top">Annotated ECMAScript 5.1 <span id="timestamp"></span></h2>
<div id="mascot-treehouse">
<img id="mascot" align="left" src="js-mascot.svg" alt=""><img id="bubble" src="bubble.svg" alt=""></div>
<p id="slogan">‟Ex igne vita”</p>
<div id="annotations"></div>
<script src="timestamp.js"></script></div>
<nav>
<a href="x5.html">← 5 Notational Conventions </a> –
<a href="spec.html" class="toc-nav">TOC</a> –
<a href="x7.html">7 Lexical Conventions →</a>
</nav>
<h2 id="x6">6 Source Text <a href="#x6">#</a> <a href="#x6-toc" class="bak">Ⓣ</a> <b class="erra">Ⓔ</b> <b class="rev1">①</b> <b class="anno">Ⓐ</b></h2>
<p>
ECMAScript
source text is represented as a sequence of characters in the
Unicode character encoding, version 3.0 or later. The text is
expected to have been normalised to Unicode Normalised Form C
(canonical composition), as described in Unicode Technical Report
#15. Conforming ECMAScript implementations are not required to
perform any normalisation of text, or behave as though they were
performing normalisation of text, themselves. ECMAScript source
text is assumed to be a sequence of 16-bit code units for the
purposes of this specification. Such a source text may include
sequences of 16-bit code units that are not valid UTF-16 character
encodings. If an actual source text is encoded in a form other than
16-bit code units it must be processed as if it was first convert to
UTF-16.</p>
<p class="def1">
<i>SourceCharacter </i><b>::</b></p>
<p class="def2-alt">
any
Unicode code unit</p>
<p>
Throughout
the rest of this document, the phrase “code unit” and the word
“character” will be used to refer to a 16-bit unsigned value
used to represent a single 16-bit unit of text. The phrase “Unicode
character” will be used to refer to the abstract linguistic or
typographical unit represented by a single Unicode scalar value
(which may be longer than 16 bits and thus may be represented by
more than one code unit). The phrase “code point” refers to such
a Unicode scalar value. “Unicode character” only refers to
entities represented by single Unicode scalar values: the components
of a combining character sequence are still individual “Unicode
characters,” even though a user might think of the whole sequence
as a single character.</p>
<p>
In
string literals, regular expression literals, and identifiers, any
character (code unit) may also be expressed as a Unicode escape
sequence consisting of six characters, namely <code><b>\u</b></code>
plus four hexadecimal digits. Within a comment, such an escape
sequence is effectively ignored as part of the comment. Within a
string literal or regular expression literal, the Unicode escape
sequence contributes one character to the value of the literal.
Within an identifier, the escape sequence contributes one character
to the identifier.</p>
<p><b class="note">NOTE</b> Although
this document sometimes refers to a “transformation” between a
“character” within a “string” and the 16-bit unsigned
integer that is the code unit of that character, there is actually
no transformation because a “character” within a “string” is
actually represented using that 16-bit unsigned value.</p>
<p>
ECMAScript
differs from the Java programming language in the behaviour of
Unicode escape sequences. In a Java program, if the Unicode escape
sequence <code><b>\u000A</b></code>,
for example, occurs within a single-line comment, it is interpreted
as a line terminator (Unicode character <code><b>000A</b></code>
is line feed) and therefore the next character is not part of the
comment. Similarly, if the Unicode escape sequence <code><b>\u000A</b></code>
occurs within a string literal in a Java program, it is likewise
interpreted as a line terminator, which is not allowed within a
string literal—one must write <code><b>\n</b></code>
instead of <code><b>\u000A</b></code>
to cause a line feed to be part of the string value of a string
literal. In an ECMAScript program, a Unicode escape sequence
occurring within a comment is never interpreted and therefore cannot
contribute to termination of the comment. Similarly, a Unicode
escape sequence occurring within a string literal in an ECMAScript
program always contributes a character to the String value of the
literal and is never interpreted as a line terminator or as a quote
mark that might terminate the string literal.</p>
</body><script src="anno.js"></script></html>