Skip to content

Commit c5147d3

Browse files
committed
IDNA 17 alpha & docs
1 parent 84bf4cb commit c5147d3

File tree

4 files changed

+194
-120
lines changed

4 files changed

+194
-120
lines changed

docs/data-workflow.md

Lines changed: 27 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,10 @@ List of these files (see https://www.unicode.org/Public/UCD/latest/ucd/):
1919

2020
Process:
2121
* The “source of truth” is the Unihan database maintained by the CJK/Unihan group, including data maintained by Michel.
22-
* The CJK/Unihan group posts data files into an internal location.
23-
* KenW vets these files and posts them to https://www.unicode.org/Public/draft/UCD/ucd/ .
22+
* The CJK/Unihan group maintains the data files in the Unicode-internal unihan-tools repo
23+
and creates GitHub releases with the /Public data files.
24+
* These include RSIndex.txt and RSIndex.pdf which are published in the charts folder, not in the ucd folder.
25+
* An infrastructure person copies these files to /Public/draft/ucd, /Public/draft/charts, /Public/{version}/... as appropriate.
2426
* A unicodetools GitHub contributor fetches these files, preprocesses the contents of Unihan.zip,
2527
and creates a pull request as for “regular” data files.
2628
(The processed data files go into .../unicodetools/data/ucd/dev/Unihan.)
@@ -48,6 +50,7 @@ Changes are made in a GitHub pull request.
4850
* Updated files could be shared in various ways including via email or via private FTP areas.
4951
* Updated files should be based on the latest (or fairly recent) data in the unicodetools repo.
5052
* Updated files should not be posted directly to https://www.unicode.org/Public/...
53+
* We work with an infra person to publish whole UCD/alpha/beta/final data file drops into /Public .
5154

5255
Pull request cycle:
5356
* One commit for manual or contributed data changes.
@@ -81,8 +84,8 @@ https://github.com/unicode-org/unicodetools/tree/main/unicodetools/data/emoji/de
8184

8285
Certain snapshots of the .../dev/ files are copied into https://www.unicode.org/Public/draft/
8386
for Unicode alpha, beta, and final releases, and more as appropriate.
84-
* UCD files go into https://www.unicode.org/Public/draft/UCD/
85-
* UCA files go into https://www.unicode.org/Public/draft/UCA/
87+
* UCD files go into https://www.unicode.org/Public/draft/ucd/
88+
* UCA files go into https://www.unicode.org/Public/draft/uca/
8689
* emoji files go into https://www.unicode.org/Public/draft/emoji/
8790
* etc.
8891
* Inside “draft” there are no folder levels with version numbers.
@@ -104,18 +107,16 @@ script from an up-to-date repo workspace.
104107
The script copies the set of the .../dev/ data files for an alpha snapshot
105108
from a unicodetools workspace to a target folder with the layout of https://www.unicode.org/Public/draft/ .
106109

107-
Send the resulting zip file to Rick for posting to https://www.unicode.org/Public/draft/ .
108-
Ask Rick to add other files that are not tracked in the unicodetools repo:
109-
* Unihan.zip to .../draft/UCD/ucd
110-
111-
TODO: Figure out new process & people replacing Rick in 2025.
110+
Send the resulting zip file to an infra person for posting to https://www.unicode.org/Public/draft/ .
111+
Ask the infra person to add other files that are not tracked in the unicodetools repo:
112+
* Unihan.zip to .../draft/ucd
112113

113114
Note: No version/delta infixes in names of data files.
114115
We simply use the “draft” folder and the file-internal time stamps for versioning.
115116

116117
### Publish an alpha snapshot
117118

118-
For the alpha review, publish (at least) the UCD and emoji files, and the charts.
119+
For the alpha review, publish (at least) the UCD and emoji files, the IDNA files, and the charts.
119120

120121
Review/edit the pub/*.sh scripts and advance the version numbers and copyright years.
121122

@@ -124,10 +125,10 @@ script from an up-to-date repo workspace.
124125
The script copies the set of the .../dev/ data files for an alpha snapshot
125126
from a unicodetools workspace to a target folder with the layout of https://www.unicode.org/Public/draft/ .
126127

127-
Send the resulting zip file to Rick for posting to https://www.unicode.org/Public/draft/ .
128-
Ask Rick to add other files that are not tracked in the unicodetools repo:
129-
* Unihan.zip to .../draft/UCD/ucd
130-
* alpha charts to .../draft/UCD/charts
128+
Send the resulting zip file to an infra person for posting to https://www.unicode.org/Public/draft/ .
129+
Ask the infra person to add other files that are not tracked in the unicodetools repo:
130+
* Unihan.zip to .../draft/ucd
131+
* alpha charts to .../draft/charts
131132

132133
Note: No version/delta infixes in names of data files.
133134
We simply use the “draft” folder and the file-internal time stamps for versioning.
@@ -141,11 +142,11 @@ script from an up-to-date repo workspace.
141142
The script copies the set of the .../dev/ data files for a beta snapshot
142143
from a unicodetools workspace to a target folder with the layout of https://www.unicode.org/Public/draft/ .
143144

144-
Send the resulting zip file to Rick for posting to https://www.unicode.org/Public/draft/ .
145-
Ask Rick to add other files that are not tracked in the unicodetools repo:
146-
* Unihan.zip to .../draft/UCD/ucd
147-
* UCDXML files to .../draft/UCD/ucdxml
148-
* beta charts to .../draft/UCD/charts
145+
Send the resulting zip file to an infra person for posting to https://www.unicode.org/Public/draft/ .
146+
Ask the infra person to add other files that are not tracked in the unicodetools repo:
147+
* Unihan.zip to .../draft/ucd
148+
* UCDXML files to .../draft/ucdxml
149+
* beta charts to .../draft/charts
149150

150151
### Publish a release snapshot
151152

@@ -158,19 +159,19 @@ Verify the final set of files in the draft folder.
158159
Run the [pub/copy-final.sh](https://github.com/unicode-org/unicodetools/blob/main/pub/copy-final.sh)
159160
script from an up-to-date repo workspace.
160161

161-
Send the resulting zip file to Rick for posting to https://www.unicode.org/Public/ (not .../Public/draft/).
162-
Ask Rick to add other files that are not tracked in the unicodetools repo:
162+
Send the resulting zip file to an infra person for posting to https://www.unicode.org/Public/ (not .../Public/draft/).
163+
Ask the infra person to add other files that are not tracked in the unicodetools repo:
163164
* Unihan.zip to .../{version}/ucd
164165
* UCDXML files to .../{version}/ucdxml
165166
* final charts to .../{version}/charts
166167

167-
This script works much like the beta script, except it:
168-
* assembles all of the files for Public/ in their release folder structure,
169-
rather than for Public/draft/
170-
* creates a zipped/{version} folder with UCD.zip
168+
TODO: Starting with 17.0, the folder structure of /Public/draft is the same as that of /Public/{version} .
169+
Consider moving the final files from /Public/draft to /Public/{version} rather than running another script.
171170

172171
### Before a release
173172

173+
TODO: Review this section, and merge it into the previous one.
174+
174175
When the data files are supposed to be final, about a week or two before the release:
175176

176177
Verify once more that the unicodetools repo .../dev/ files match the released/published files.
@@ -182,7 +183,7 @@ https://github.com/unicode-org/unicodetools/releases/tag/final-15.1-20230908
182183
### After a release
183184

184185
Copy a snapshot of the unicodetools repo .../dev/ files to a versioned unicodetools folder;
185-
for example: .../unicodetools/data/ucd/16.0.0/ .
186+
for example: .../unicodetools/data/ucd/17.0.0/ .
186187
(We no longer append a “-Update” suffix to the folder name.)
187188
List: emoji, idna, security, uca, ucd, ucdxml
188189
Watch for different naming conventions: emoji versions use only two fields, not three.

docs/idna.md

Lines changed: 8 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -8,16 +8,11 @@
88
2. Run GenerateIdna.java
99
* It will generate
1010
{Generated}/idna/{version}/**IdnaMappingTable.txt**
11-
* Before [UTS #46 table 4](https://www.unicode.org/reports/tr46/#Table_IDNA_Comparisons)
12-
was fixed at Unicode 11:
13-
* The data for the last 4 columns (h, i, j, k) of Table 4 IDNA
14-
Comparisons for the UTR are listed at the bottom of the console output.
15-
* Fix Table 4 (h, i, j, k) with that data, and check into the repo.
1611
* Diff with the previous version, and make sure everything is understood,
1712
then copy back into the dev folder.
1813
```
19-
Generated$ meld ../src/unicodetools/data/idna/dev/IdnaMappingTable.txt idna/15.1.0/IdnaMappingTable.txt
20-
Generated$ cp idna/15.1.0/IdnaMappingTable.txt ../src/unicodetools/data/idna/dev/IdnaMappingTable.txt
14+
Generated$ meld ../src/unicodetools/data/idna/dev/IdnaMappingTable.txt idna/17.0.0/IdnaMappingTable.txt
15+
Generated$ cp idna/17.0.0/IdnaMappingTable.txt ../src/unicodetools/data/idna/dev/IdnaMappingTable.txt
2116
```
2217
* *Important:* The mapping table file must be copied into the dev folder
2318
before running GenerateIdnaTest.java!
@@ -27,18 +22,11 @@
2722
2. Diff with the previous version, and make sure everything is understood,
2823
then copy back into the dev folder.
2924
```
30-
Generated$ meld ../src/unicodetools/data/idna/dev/IdnaTestV2.txt idna/15.1.0/IdnaTestV2.txt
31-
Generated$ cp idna/15.1.0/IdnaTestV2.txt ../src/unicodetools/data/idna/dev/IdnaTestV2.txt
25+
Generated$ meld ../src/unicodetools/data/idna/dev/IdnaTestV2.txt idna/17.0.0/IdnaTestV2.txt
26+
Generated$ cp idna/17.0.0/IdnaTestV2.txt ../src/unicodetools/data/idna/dev/IdnaTestV2.txt
3227
```
33-
4. Edit the ReadMe.txt if necessary.
34-
1. Fix the copyright date
35-
2. Add or remove "draft" in front of "data files", according to the status
36-
of the data.
37-
5. Run TestIdna.java as a JUnit test.
28+
4. Run TestIdna.java as a JUnit test.
29+
5. Diff the old and new files, and sanity check. Pull request...
30+
6. The idna files are published in the alpha, beta, and final drops.
31+
See [Data Files Workflow](data-workflow.md)
3832
39-
To push to production
40-
41-
1. Diff the old and new files, and sanity check.
42-
2. Copy the files
43-
* from {Generated}/idna/{version}/*
44-
* to https://www.unicode.org/Public/idna/{version}/*

0 commit comments

Comments
 (0)