Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for trailing whitespaces & newlines #34

Merged
merged 6 commits into from
May 28, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
fix comments for utf8 situation
  • Loading branch information
complected committed May 27, 2022
commit c50e8b84104bceb1143210d4b644f5adc470db2f
3 changes: 1 addition & 2 deletions jams.js
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@ ANY ::= (SAFE | WS | SYN | #x5C)
SAFE ::= #x21 | [#x23-#x5A] | [#x5E-#x7A] | #x7C | #x7E
`)

// Expect a well-formed JSON string, already decoded, not UTF-8/UTF-16 encoded bytes etc.
// Ref: https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
// Expects a JAMS string, the leaves/strings within must be valid JSON strings.
export const jams =s=> {
const ast = read(s)
if (ast === null) throw new Error('Syntax error')
Expand Down
31 changes: 30 additions & 1 deletion test/test.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { test } from 'tapzero'

import { jams, read } from '../jams.js'
import { jams } from '../jams.js'

import { readFileSync, readdirSync } from 'fs'

Expand Down Expand Up @@ -40,3 +40,32 @@ test('failing files', t=> {
})
})
})

/*
A humble note on how a newline (and escaped sequences in general) is kept in-file vs in-memory.

---

> r = encoding => require('fs').readFileSync('patch.jams', encoding)

// Let's assume the file has "\n".
// The code reads 2 bytes: a backslash and alphabet "n".
// Thus, the reading kept the backslash character.
> r(null)
<Buffer 5c 6e>
> let x = r('utf8'); [x, x.length] // Same content, but the bytes get decoded according to utf8 to a string of length 2
['\\n', 2]

// In contrast, parsing with JSON.parse removes backslash. Because JSON spec states "\\" followed by "n" should be interpreted as newline.
> let j = JSON.parse(String.raw`"${x}"`); [j, j.length]
['\n', 1]

---

Therefore,
- reading a file with UTF8 encoding "escapes"/treats backslash as standalone.
- JSON.parse "un-escapes"/parses valid escape sequence (e.g. "\\" + "n") into 1 character.
- More?
- json.org
- https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
*/