-
Notifications
You must be signed in to change notification settings - Fork 878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add standard newline/quoting behavior to dotenv store #622
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #622 +/- ##
===========================================
+ Coverage 36.98% 39.13% +2.14%
===========================================
Files 21 22 +1
Lines 2893 3023 +130
===========================================
+ Hits 1070 1183 +113
- Misses 1728 1735 +7
- Partials 95 105 +10
Continue to review full report at Codecov.
|
ebc316d
to
ea7acae
Compare
Alright, this is now to a point where I'm content with the quality of it and I promise to not push anymore unless someone requests changes. 😄 |
Okay so I lied. I added whitespace trimming to comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, but overall this looks awesome @scjudd! Thank you very much.
06098a7
to
43df2c9
Compare
Rationale ========= The dotenv store as it exists right now performs splitting on newlines to determine where a new key-value pair or comment begins. This works remarkably well, up until you need to handle values that contain newlines. While I couldn't find an offical dotenv file format spec, I sampled a number of open-source dotenv parsers and it seems that they typically apply the following rules: Comments: * Comments may be written by starting a line with the `#` character. Newline handling: * If a value is unquoted or single-quoted and contains the character sequence `\n` (`0x5c6e`), it IS NOT decoded to a line feed (`0x0a`). * If a value is double-quoted and contains the character sequence `\n` (`0x5c6e`), it IS decoded to a line feed (`0x0a`). Whitespace trimming: * For comments, the whitespace immediately after the `#` character and any trailing whitespace is trimmed. * If a value is unquoted and contains any leading or trailing whitespace, it is trimmed. * If a value is either single- or double-quoted and contains any leading or trailing whitespace, it is left untrimmed. Quotation handling: * If a value is surrounded by single- or double-quotes, the quotation marks are interpreted and not included in the value. * Any number of single-quote characters may appear in a double-quoted value, or within a single-quoted value if they are escaped (i.e., `'foo\'bar'`). * Any number of double-quote characters may appear in a single-quoted value, or within a double-quoted value if they are escaped (i.e., `"foo\"bar"`). Because single- and double-quoted values may contain actual newlines, we cannot split our input data on newlines as this may be in the middle of a quoted value. This, along with the other rules around handling quoted values, prompted me to try and implement a more robust parsing solution. This commit is my first stab at that. Special Considerations ====================== This is _not_ a backwards-compatible change: * The `dotenv` files produced by this version of SOPS _cannot_ be read by an earlier version. * The `dotenv` files produced by an earlier version of SOPS _can_ be read by this version, with the understanding that the semantics around quotations and newlines have changed. Examples ======== The below examples show how double-quoted values are passed to the running environment: ```console $ echo 'FOO="foo\\nbar\\nbaz"' > plaintext.env $ sops -e --output ciphertext.env plaintext.env $ sops exec-env ciphertext.env 'env | grep FOO | xxd' 00000000: 464f 4f3d 666f 6f5c 6e62 6172 5c6e 6261 FOO=foo\nbar\nba 00000010: 7a0a z. ``` ```console $ echo 'FOO="foo\nbar\nbaz"' > plaintext.env $ sops -e --output ciphertext.env plaintext.env $ sops exec-env ciphertext.env 'env | grep -A2 FOO | xxd' 00000000: 464f 4f3d 666f 6f0a 6261 720a 6261 7a0a FOO=foo.bar.baz. ```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code itself LGTM.
What parsers did you look at? Could we have the parser in a separate repository, perhaps? It might be useful to other people.
Hey @autrilla, I can't remember all of the different parsers I looked at, but the rules defined for https://github.com/motdotla/dotenv#rules are what I based a lot of this on. As far as breaking this out into a separate repo, I'm not opposed to that, but right now it is very purpose-built for SOPS. If you have any suggestions on what changes you'd like to see, I'm down to push this forward a little further. We need to land something along these lines to fix the issue for newlines and |
Hey @autrilla, just checking in on this. Is there anything I can do to help get this landed? |
Sorry, I forgot about this. LGTM, thanks for the patch! |
Rationale ========= The dotenv store as it exists right now performs splitting on newlines to determine where a new key-value pair or comment begins. This works remarkably well, up until you need to handle values that contain newlines. While I couldn't find an offical dotenv file format spec, I sampled a number of open-source dotenv parsers and it seems that they typically apply the following rules: Comments: * Comments may be written by starting a line with the `#` character. Newline handling: * If a value is unquoted or single-quoted and contains the character sequence `\n` (`0x5c6e`), it IS NOT decoded to a line feed (`0x0a`). * If a value is double-quoted and contains the character sequence `\n` (`0x5c6e`), it IS decoded to a line feed (`0x0a`). Whitespace trimming: * For comments, the whitespace immediately after the `#` character and any trailing whitespace is trimmed. * If a value is unquoted and contains any leading or trailing whitespace, it is trimmed. * If a value is either single- or double-quoted and contains any leading or trailing whitespace, it is left untrimmed. Quotation handling: * If a value is surrounded by single- or double-quotes, the quotation marks are interpreted and not included in the value. * Any number of single-quote characters may appear in a double-quoted value, or within a single-quoted value if they are escaped (i.e., `'foo\'bar'`). * Any number of double-quote characters may appear in a single-quoted value, or within a double-quoted value if they are escaped (i.e., `"foo\"bar"`). Because single- and double-quoted values may contain actual newlines, we cannot split our input data on newlines as this may be in the middle of a quoted value. This, along with the other rules around handling quoted values, prompted me to try and implement a more robust parsing solution. This commit is my first stab at that. Special Considerations ====================== This is _not_ a backwards-compatible change: * The `dotenv` files produced by this version of SOPS _cannot_ be read by an earlier version. * The `dotenv` files produced by an earlier version of SOPS _can_ be read by this version, with the understanding that the semantics around quotations and newlines have changed. Examples ======== The below examples show how double-quoted values are passed to the running environment: ```console $ echo 'FOO="foo\\nbar\\nbaz"' > plaintext.env $ sops -e --output ciphertext.env plaintext.env $ sops exec-env ciphertext.env 'env | grep FOO | xxd' 00000000: 464f 4f3d 666f 6f5c 6e62 6172 5c6e 6261 FOO=foo\nbar\nba 00000010: 7a0a z. ``` ```console $ echo 'FOO="foo\nbar\nbaz"' > plaintext.env $ sops -e --output ciphertext.env plaintext.env $ sops exec-env ciphertext.env 'env | grep -A2 FOO | xxd' 00000000: 464f 4f3d 666f 6f0a 6261 720a 6261 7a0a FOO=foo.bar.baz. ```
Turns out this is a pseudo-breaking change for people using the dotenv format (including me), because the previous version of sops doesn't know what to do with the new quoted values (so in our build pipeline, which uses sops 3.5.0, it's unable to parse the new format coming from our local environments as people start upgrading to 3.6.0) I'm not sure if there's a good place to put these notes. Ideally they wouldn't have been pointed out more prominently in the changelog - maybe I'll put in an issue to request a flag for --omit-dotenv-quotes or something in case others are encountering this issue... |
Is this at all related to my issue at #784? In my use case, I was loading environment variables which contain multiple lines (stored in encrypted |
Don't think it's related to this (this was only present in 3.5.0) - except that it's probably related to the dotenv encoder. Commented on your issue. |
I think you're right @mltsy, I see the operative line that affects my use case were reverted in 5d32d9a. However, I will say that investigating my issue which is related via the shared use of the Basically, I recommend using something like joho/godotenv to encode/decode |
Cherry-picked from: Add standard newline/quoting behavior to dotenv store (getsops#622) 4507019 Closes issue: https://github.com/arrikto/rok/issues/5563 Rationale ========= The dotenv store as it exists right now performs splitting on newlines to determine where a new key-value pair or comment begins. This works remarkably well, up until you need to handle values that contain newlines. While I couldn't find an offical dotenv file format spec, I sampled a number of open-source dotenv parsers and it seems that they typically apply the following rules: Comments: * Comments may be written by starting a line with the `#` character. Newline handling: * If a value is unquoted or single-quoted and contains the character sequence `\n` (`0x5c6e`), it IS NOT decoded to a line feed (`0x0a`). * If a value is double-quoted and contains the character sequence `\n` (`0x5c6e`), it IS decoded to a line feed (`0x0a`). Whitespace trimming: * For comments, the whitespace immediately after the `#` character and any trailing whitespace is trimmed. * If a value is unquoted and contains any leading or trailing whitespace, it is trimmed. * If a value is either single- or double-quoted and contains any leading or trailing whitespace, it is left untrimmed. Quotation handling: * If a value is surrounded by single- or double-quotes, the quotation marks are interpreted and not included in the value. * Any number of single-quote characters may appear in a double-quoted value, or within a single-quoted value if they are escaped (i.e., `'foo\'bar'`). * Any number of double-quote characters may appear in a single-quoted value, or within a double-quoted value if they are escaped (i.e., `"foo\"bar"`). Because single- and double-quoted values may contain actual newlines, we cannot split our input data on newlines as this may be in the middle of a quoted value. This, along with the other rules around handling quoted values, prompted me to try and implement a more robust parsing solution. This commit is my first stab at that. Special Considerations ====================== This is _not_ a backwards-compatible change: * The `dotenv` files produced by this version of SOPS _cannot_ be read by an earlier version. * The `dotenv` files produced by an earlier version of SOPS _can_ be read by this version, with the understanding that the semantics around quotations and newlines have changed. Examples ======== The below examples show how double-quoted values are passed to the running environment: ```console $ echo 'FOO="foo\\nbar\\nbaz"' > plaintext.env $ sops -e --output ciphertext.env plaintext.env $ sops exec-env ciphertext.env 'env | grep FOO | xxd' 00000000: 464f 4f3d 666f 6f5c 6e62 6172 5c6e 6261 FOO=foo\nbar\nba 00000010: 7a0a z. ``` ```console $ echo 'FOO="foo\nbar\nbaz"' > plaintext.env $ sops -e --output ciphertext.env plaintext.env $ sops exec-env ciphertext.env 'env | grep -A2 FOO | xxd' 00000000: 464f 4f3d 666f 6f0a 6261 720a 6261 7a0a FOO=foo.bar.baz. ```
Rationale
The dotenv store as it exists right now performs splitting on newlines to determine where a new key-value pair or comment begins. This works remarkably well, up until you need to handle values that contain newlines.
While I couldn't find an offical dotenv file format spec, I sampled a number of open-source dotenv parsers and it seems that they typically apply the following rules:
Newline handling:
If a value is unquoted and contains a literal
\n
(0x5c6e
), it is interpretted literally and NOT converted to an actual newline (0x0a
).If a value is single-quoted and contains a literal
\n
(0x5c6e
), it is interpretted literally and NOT converted to an actual newline (0x0a
).If a value is double-quoted and contains a literal
\n
(0x5c6e
), it is converted to an actual newline (0x0a
).If a value is either single- or double-quoted, it may contain an actual newline (
0x0a
).Whitespace trimming:
If a value is unquoted and contains any leading or trailing whitespace, it is trimmed.
If a value is either single- or double-quoted and contains any leading or trailing whitespace, it is left untrimmed.
Quotation handling:
Because single- and double-quoted values may contain actual newlines, we cannot split our input data on newlines as this may be in the middle of a quoted value. This, along with the other rules around handling quoted values, prompted me to try and implement a more robust parsing solution. This commit is my first stab at that.
Special Considerations
This is not a backwards-compatible change:
The
dotenv
files produced by this version of SOPS cannot be read by an earlier version.The
dotenv
files produced by an earlier version of SOPS can be read by this version, with the understanding that the semantics around quotations and newlines have changed.Examples
The below examples show how double-quoted values are passed to the running environment: