-
-
Notifications
You must be signed in to change notification settings - Fork 670
Uri encode / decode #1733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Uri encode / decode #1733
Changes from all commits
Commits
Show all changes
93 commits
Select commit
Hold shift + click to select a range
4f7bc3d
init (wip)
MaxGraey b0fbf21
wip
MaxGraey 15a9e91
Merge branch 'master' into uri-encodes
MaxGraey 82da348
refactor
MaxGraey f8308f8
more
MaxGraey 68e8cc9
wip
MaxGraey 2f6a92c
wip
MaxGraey 76f3ac8
more
MaxGraey 0f99ff1
refactor
MaxGraey d403720
more
MaxGraey 890e65d
Merge branch 'master' into uri-encodes
MaxGraey ac22e36
remove some verbosity
MaxGraey 8a30361
update heap methods
MaxGraey 342da70
fix
MaxGraey 7051dc4
add more refs
MaxGraey 514a4c7
simplify
MaxGraey 6b01c7d
store utf8 code points
MaxGraey bf5e656
simplify names
MaxGraey 6f0d5fa
refactor
MaxGraey 9abda8f
more
MaxGraey 651702f
fixes
MaxGraey f6c5051
refactors
MaxGraey 2408cf2
progress
MaxGraey 1e4eed8
refactor
MaxGraey 2810119
more
MaxGraey 92ba20e
fix leaks
MaxGraey 27f881b
wip
MaxGraey ece370c
wip
MaxGraey e603cd6
wip
MaxGraey c551f94
fixes (wip)
MaxGraey 4c0555f
more (wip)
MaxGraey a0c49f2
more (wip)
MaxGraey 6c13459
better size estimation
MaxGraey d011a38
add URIError
MaxGraey 163ff90
wip
MaxGraey 44b7705
minor opt
MaxGraey 632db8a
refactor
MaxGraey aa26e2a
minor opt
MaxGraey 4c812f3
refactoring
MaxGraey 20e2065
better
MaxGraey b1fe142
more tests
MaxGraey 67c1b91
fix
MaxGraey 1ed516d
more tests
MaxGraey cbd5ea4
invert tables for better memory packing
MaxGraey 953c43b
even more smaller
MaxGraey 2174b69
more detailed comments
MaxGraey 3a4cdd4
add encodeURI tests
MaxGraey 4938263
wip
MaxGraey d8814fe
fixes (wip)
MaxGraey a350c00
fixes (wip)
MaxGraey 43f90cb
opt (experimental)
MaxGraey 16cfeb3
refactor
MaxGraey 5b9ba29
more tests
MaxGraey 54fa6dd
fix
MaxGraey f6a9c89
wip
MaxGraey 8204e0e
refactor
MaxGraey 62d0fef
fix
MaxGraey c94869e
add assert
MaxGraey 4efd052
opt
MaxGraey 96d413e
refactoring
MaxGraey 8050ce7
more
MaxGraey 4b4fdc8
optimize utf8 byte count
MaxGraey 69a45b9
opt utf8_len table
MaxGraey 29ed2a8
refactor
MaxGraey f599d38
comment
MaxGraey ed44562
more tests
MaxGraey 00561ee
opt
MaxGraey 9475f6b
opt more
MaxGraey abec3f0
fix
MaxGraey 3f3c1b8
more
MaxGraey 62e0ee2
fix
MaxGraey b4da6ce
better comments
MaxGraey 620e5d2
more opts
MaxGraey afa90fd
revert more precise hex char checks
MaxGraey daa7296
refactor
MaxGraey a42b0c8
more tests
MaxGraey e00b7e4
align shinked unicode range for 4 bytes range with table
MaxGraey b3952b0
simplify utf8LenFromUpperByte
MaxGraey 81e5a13
better comments
MaxGraey 3250ef8
refactor
MaxGraey 71a40ce
Merge branch 'master' into uri-encodes
MaxGraey 91be0b2
refactoring
MaxGraey 3ca72e4
more
MaxGraey ed60c38
Merge branch 'master' into uri-encodes
MaxGraey 1132f6c
remove fast pathes
MaxGraey 94041a5
Merge branch 'master' into uri-encodes
MaxGraey b5ef31b
update fixture
MaxGraey 0c5a7bb
Merge branch 'master' into uri-encodes
MaxGraey e23fa80
upd fixture
MaxGraey 48a032a
more comments
MaxGraey d10f016
fix typos
MaxGraey 1e78551
more typos
MaxGraey a544405
fix
MaxGraey File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
import { encode, decode, URI_UNSAFE, URL_UNSAFE } from "./util/uri"; | ||
|
||
export function encodeURI(str: string): string { | ||
return changetype<string>(encode(changetype<usize>(str), str.length, URI_UNSAFE)); | ||
} | ||
|
||
export function decodeURI(str: string): string { | ||
return changetype<string>(decode(changetype<usize>(str), str.length, false)); | ||
} | ||
|
||
export function encodeURIComponent(str: string): string { | ||
return changetype<string>(encode(changetype<usize>(str), str.length, URL_UNSAFE)); | ||
} | ||
|
||
export function decodeURIComponent(str: string): string { | ||
return changetype<string>(decode(changetype<usize>(str), str.length, true)); | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,276 @@ | ||
import { E_URI_MALFORMED } from "./error"; | ||
import { CharCode } from "./string"; | ||
|
||
// Truncated lookup boolean table that helps us quickly determine | ||
// if a char needs to be escaped for URIs (RFC 2396). | ||
// @ts-ignore: decorator | ||
@lazy export const URI_UNSAFE = memory.data<u8>([ | ||
/* skip 32 + 1 always set to '1' head slots | ||
*/ 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, | ||
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, | ||
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, | ||
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, | ||
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, | ||
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, /* | ||
skip 128 + 1 always set to '1' tail slots */ | ||
]); | ||
|
||
// Truncated lookup boolean table that helps us quickly determine | ||
// if a char needs to be escaped for URLs (RFC 3986). | ||
// @ts-ignore: decorator | ||
@lazy export const URL_UNSAFE = memory.data<u8>([ | ||
/* skip 32 + 1 always set to '1' head slots | ||
*/ 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, | ||
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, | ||
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, | ||
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, | ||
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, | ||
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, /* | ||
skip 128 + 1 always set to '1' tail slots */ | ||
]); | ||
|
||
// Truncated lookup boolean table for determine reserved chars: ;/?:@&=+$,# | ||
// @ts-ignore: decorator | ||
@lazy export const URI_RESERVED = memory.data<u8>([ | ||
/* skip 32 + 3 always set to '0' head slots | ||
*/ 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, | ||
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, | ||
1, /* skip 191 always set to '0' tail slots */ | ||
]); | ||
|
||
export function encode(src: usize, len: usize, table: usize): usize { | ||
if (!len) return src; | ||
|
||
var i: usize = 0, offset: usize = 0, outSize = len << 1; | ||
MaxGraey marked this conversation as resolved.
Show resolved
Hide resolved
|
||
var dst = __new(outSize, idof<String>()); | ||
|
||
while (i < len) { | ||
let org = i; | ||
let c: u32, c1: u32; | ||
// fast scan a check chars until it valid ASCII | ||
// and safe for copying withoud escaping. | ||
do { | ||
c = <u32>load<u16>(src + (i << 1)); | ||
// is it valid ASII and safe? | ||
if (c - 33 < 94) { // 127 - 33 | ||
if (load<u8>(table + (c - 33))) break; | ||
} else break; | ||
} while (++i < len); | ||
|
||
// if we have some safe range of sequence just copy it without encoding | ||
if (i > org) { | ||
let size = i - org << 1; | ||
if (offset + size > outSize) { | ||
outSize = offset + size; | ||
dst = __renew(dst, outSize); | ||
} | ||
// TODO: should we optimize for short cases like 2 byte size? | ||
memory.copy( | ||
dst + offset, | ||
src + (org << 1), | ||
size | ||
); | ||
offset += size; | ||
// return if we reach end on input string | ||
if (i >= len) break; | ||
} | ||
|
||
// decode UTF16 with checking for unpaired surrogates | ||
if (c >= 0xD800) { | ||
if (c >= 0xDC00 && c <= 0xDFFF) { | ||
throw new URIError(E_URI_MALFORMED); | ||
} | ||
if (c <= 0xDBFF) { | ||
if (i >= len) { | ||
throw new URIError(E_URI_MALFORMED); | ||
} | ||
c1 = <u32>load<u16>(src + (++i << 1)); | ||
if (c1 < 0xDC00 || c1 > 0xDFFF) { | ||
throw new URIError(E_URI_MALFORMED); | ||
} | ||
c = (((c & 0x3FF) << 10) | (c1 & 0x3FF)) + 0x10000; | ||
} | ||
} | ||
|
||
let estSize = offset + (c < 0x80 ? 1 * 6 : 4 * 6); | ||
if (estSize > outSize) { | ||
// doubling estimated size but only for greater than one | ||
// input lenght due to we already estemated it for worst case | ||
outSize = len > 1 ? estSize << 1 : estSize; | ||
dst = __renew(dst, outSize); | ||
} | ||
|
||
if (c < 0x80) { | ||
// encode ASCII unsafe code point | ||
storeHex(dst, offset, c); | ||
offset += 6; | ||
} else { | ||
// encode UTF-8 unsafe code point | ||
if (c < 0x800) { | ||
storeHex(dst, offset, (c >> 6) | 0xC0); | ||
offset += 6; | ||
} else { | ||
if (c < 0x10000) { | ||
storeHex(dst, offset, (c >> 12) | 0xE0); | ||
offset += 6; | ||
} else { | ||
storeHex(dst, offset, (c >> 18) | 0xF0); | ||
offset += 6; | ||
storeHex(dst, offset, (c >> 12 & 0x3F) | 0x80); | ||
offset += 6; | ||
} | ||
storeHex(dst, offset, (c >> 6 & 0x3F) | 0x80); | ||
offset += 6; | ||
} | ||
storeHex(dst, offset, (c & 0x3F) | 0x80); | ||
offset += 6; | ||
} | ||
++i; | ||
} | ||
// shink output string buffer if necessary | ||
if (outSize > offset) { | ||
dst = __renew(dst, offset); | ||
} | ||
return dst; | ||
} | ||
|
||
export function decode(src: usize, len: usize, component: bool): usize { | ||
if (!len) return src; | ||
|
||
var i: usize = 0, offset: usize = 0, ch: u32 = 0; | ||
var dst = __new(len << 1, idof<String>()); | ||
|
||
while (i < len) { | ||
let org = i; | ||
while (i < len && (ch = load<u16>(src + (i << 1))) != CharCode.PERCENT) i++; | ||
|
||
if (i > org) { | ||
let size = i - org << 1; | ||
// TODO: should we optimize for short cases like 2 byte size? | ||
memory.copy( | ||
dst + offset, | ||
src + (org << 1), | ||
size | ||
); | ||
offset += size; | ||
if (i >= len) break; | ||
} | ||
|
||
// decode hex | ||
if ( | ||
i + 2 >= len || | ||
ch != CharCode.PERCENT || | ||
(ch = loadHex(src, i + 1 << 1)) == -1 | ||
) throw new URIError(E_URI_MALFORMED); | ||
|
||
i += 3; | ||
if (ch < 0x80) { | ||
if (!component && isReserved(ch)) { | ||
ch = CharCode.PERCENT; | ||
i -= 2; | ||
} | ||
} else { | ||
// decode UTF-8 sequence | ||
let nb = utf8LenFromUpperByte(ch); | ||
// minimal surrogate: 2 => 0x80, 3 => 0x800, 4 => 0x10000, _ => -1 | ||
let lo: u32 = 1 << (17 * nb >> 2) - 1; | ||
// mask: 2 => 31, 3 => 15, 4 => 7, _ => 0 | ||
ch &= nb ? (0x80 >> nb) - 1 : 0; | ||
|
||
while (--nb != 0) { | ||
let c1: u32; | ||
// decode hex | ||
if ( | ||
i + 2 >= len || | ||
load<u16>(src + (i << 1)) != CharCode.PERCENT || | ||
(c1 = loadHex(src, i + 1 << 1)) == -1 | ||
) throw new URIError(E_URI_MALFORMED); | ||
|
||
i += 3; | ||
if ((c1 & 0xC0) != 0x80) { | ||
ch = 0; | ||
break; | ||
} | ||
ch = (ch << 6) | (c1 & 0x3F); | ||
} | ||
|
||
// check if UTF8 code point properly fit into invalid UTF16 encoding | ||
if (ch < lo || lo == -1 || ch > 0x10FFFF || (ch >= 0xD800 && ch < 0xE000)) { | ||
throw new URIError(E_URI_MALFORMED); | ||
} | ||
|
||
// encode UTF16 | ||
if (ch >= 0x10000) { | ||
ch -= 0x10000; | ||
let lo = ch >> 10 | 0xD800; | ||
let hi = (ch & 0x03FF) | 0xDC00; | ||
store<u32>(dst + offset, lo | (hi << 16)); | ||
offset += 4; | ||
continue; | ||
} | ||
} | ||
store<u16>(dst + offset, ch); | ||
offset += 2; | ||
} | ||
|
||
assert(offset <= (len << 1)); | ||
// shink output string buffer if necessary | ||
if ((len << 1) > offset) { | ||
dst = __renew(dst, offset); | ||
} | ||
return dst; | ||
} | ||
|
||
function storeHex(dst: usize, offset: usize, ch: u32): void { | ||
// @ts-ignore: decorator | ||
const HEX_CHARS = memory.data<u8>([ | ||
0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, | ||
0x38, 0x39, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46 | ||
]); | ||
|
||
store<u16>(dst + offset, CharCode.PERCENT, 0); // % | ||
store<u32>( | ||
dst + offset, | ||
<u32>load<u8>(HEX_CHARS + (ch >> 4 & 0x0F)) | | ||
<u32>load<u8>(HEX_CHARS + (ch & 0x0F)) << 16, | ||
2 | ||
); // XX | ||
} | ||
|
||
function loadHex(src: usize, offset: usize): u32 { | ||
let c0 = <u32>load<u16>(src + offset, 0); | ||
let c1 = <u32>load<u16>(src + offset, 2); | ||
return isHex(c0) && isHex(c1) | ||
? fromHex(c0) << 4 | fromHex(c1) | ||
: -1; | ||
} | ||
|
||
// @ts-ignore: decorator | ||
@inline function fromHex(ch: u32): u32 { | ||
return (ch | 32) % 39 - 9; | ||
} | ||
|
||
// @ts-ignore: decorator | ||
@inline function utf8LenFromUpperByte(c0: u32): u32 { | ||
// same as | ||
// if (c0 - 0xC0 <= 0xDF - 0xC0) return 2; | ||
// if (c0 - 0xE0 <= 0xEF - 0xE0) return 3; | ||
// if (c0 - 0xF0 <= 0xF7 - 0xF0) return 4; | ||
// return 0; | ||
return c0 - 0xC0 < 56 | ||
? clz(~(c0 << 24)) | ||
: 0; | ||
} | ||
|
||
// @ts-ignore: decorator | ||
@inline function isReserved(ch: u32): bool { | ||
return ch - 35 < 30 | ||
? <bool>load<u8>(URI_RESERVED + (ch - 35)) | ||
: false; | ||
} | ||
|
||
// @ts-ignore: decorator | ||
@inline function isHex(ch: u32): bool { | ||
// @ts-ignore | ||
return (ch - CharCode._0 < 10) | ((ch | 32) - CharCode.a < 6); | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
{ | ||
"asc_flags": [ | ||
], | ||
"asc_rtrace": true | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.