Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add standard newline/quoting behavior to dotenv store #622

Merged
merged 1 commit into from
Mar 20, 2020

Conversation

scjudd
Copy link
Contributor

@scjudd scjudd commented Feb 4, 2020

Rationale

The dotenv store as it exists right now performs splitting on newlines to determine where a new key-value pair or comment begins. This works remarkably well, up until you need to handle values that contain newlines.

While I couldn't find an offical dotenv file format spec, I sampled a number of open-source dotenv parsers and it seems that they typically apply the following rules:

Newline handling:

  • If a value is unquoted and contains a literal \n (0x5c6e), it is interpretted literally and NOT converted to an actual newline (0x0a).

  • If a value is single-quoted and contains a literal \n (0x5c6e), it is interpretted literally and NOT converted to an actual newline (0x0a).

  • If a value is double-quoted and contains a literal \n (0x5c6e), it is converted to an actual newline (0x0a).

  • If a value is either single- or double-quoted, it may contain an actual newline (0x0a).

Whitespace trimming:

  • If a value is unquoted and contains any leading or trailing whitespace, it is trimmed.

  • If a value is either single- or double-quoted and contains any leading or trailing whitespace, it is left untrimmed.

Quotation handling:

  • Because quotations around values have special meaning, they are interpretted and are not included in the parsed value. Literal quotes may be included within a quoted string either by escaping them or using the opposite quotation mark.

Because single- and double-quoted values may contain actual newlines, we cannot split our input data on newlines as this may be in the middle of a quoted value. This, along with the other rules around handling quoted values, prompted me to try and implement a more robust parsing solution. This commit is my first stab at that.

Special Considerations

This is not a backwards-compatible change:

  • The dotenv files produced by this version of SOPS cannot be read by an earlier version.

  • The dotenv files produced by an earlier version of SOPS can be read by this version, with the understanding that the semantics around quotations and newlines have changed.

Examples

The below examples show how double-quoted values are passed to the running environment:

$ echo 'FOO="foo\\nbar\\nbaz"' > plaintext.env
$ sops -e --output ciphertext.env plaintext.env
$ sops exec-env ciphertext.env 'env | grep FOO | xxd'
00000000: 464f 4f3d 666f 6f5c 6e62 6172 5c6e 6261  FOO=foo\nbar\nba
00000010: 7a0a                                     z.
$ echo 'FOO="foo\nbar\nbaz"' > plaintext.env
$ sops -e --output ciphertext.env plaintext.env
$ sops exec-env ciphertext.env 'env | grep -A2 FOO | xxd'
00000000: 464f 4f3d 666f 6f0a 6261 720a 6261 7a0a  FOO=foo.bar.baz.

@codecov-io
Copy link

codecov-io commented Feb 4, 2020

Codecov Report

Merging #622 into develop will increase coverage by 2.14%.
The diff coverage is 84.1%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #622      +/-   ##
===========================================
+ Coverage    36.98%   39.13%   +2.14%     
===========================================
  Files           21       22       +1     
  Lines         2893     3023     +130     
===========================================
+ Hits          1070     1183     +113     
- Misses        1728     1735       +7     
- Partials        95      105      +10
Impacted Files Coverage Δ
stores/dotenv/store.go 25% <100%> (-7.46%) ⬇️
stores/dotenv/parser.go 83.33% <83.33%> (ø)
stores/flatten.go 91.52% <0%> (+4.23%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1634350...fe789bf. Read the comment docs.

@scjudd scjudd force-pushed the develop branch 17 times, most recently from ebc316d to ea7acae Compare February 5, 2020 17:17
@scjudd
Copy link
Contributor Author

scjudd commented Feb 5, 2020

Alright, this is now to a point where I'm content with the quality of it and I promise to not push anymore unless someone requests changes. 😄

@scjudd
Copy link
Contributor Author

scjudd commented Feb 10, 2020

Okay so I lied. I added whitespace trimming to comments.

@scjudd scjudd mentioned this pull request Feb 11, 2020
@scjudd scjudd requested a review from ajvb February 11, 2020 16:34
@ajvb ajvb requested a review from autrilla February 11, 2020 20:15
Copy link
Contributor

@ajvb ajvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, but overall this looks awesome @scjudd! Thank you very much.

cmd/sops/subcommand/exec/exec.go Show resolved Hide resolved
stores/dotenv/parser.go Show resolved Hide resolved
stores/dotenv/parser.go Show resolved Hide resolved
@scjudd scjudd force-pushed the develop branch 2 times, most recently from 06098a7 to 43df2c9 Compare February 12, 2020 02:37
Rationale
=========

The dotenv store as it exists right now performs splitting on newlines
to determine where a new key-value pair or comment begins. This works
remarkably well, up until you need to handle values that contain
newlines.

While I couldn't find an offical dotenv file format spec, I sampled a
number of open-source dotenv parsers and it seems that they typically
apply the following rules:

Comments:

* Comments may be written by starting a line with the `#` character.

Newline handling:

* If a value is unquoted or single-quoted and contains the character
  sequence `\n` (`0x5c6e`), it IS NOT decoded to a line feed (`0x0a`).

* If a value is double-quoted and contains the character sequence `\n`
  (`0x5c6e`), it IS decoded to a line feed (`0x0a`).

Whitespace trimming:

* For comments, the whitespace immediately after the `#` character and any
  trailing whitespace is trimmed.

* If a value is unquoted and contains any leading or trailing whitespace, it
  is trimmed.

* If a value is either single- or double-quoted and contains any leading or
  trailing whitespace, it is left untrimmed.

Quotation handling:

* If a value is surrounded by single- or double-quotes, the quotation marks
  are interpreted and not included in the value.

* Any number of single-quote characters may appear in a double-quoted
  value, or within a single-quoted value if they are escaped (i.e.,
  `'foo\'bar'`).

* Any number of double-quote characters may appear in a single-quoted
  value, or within a double-quoted value if they are escaped (i.e.,
  `"foo\"bar"`).

Because single- and double-quoted values may contain actual newlines,
we cannot split our input data on newlines as this may be in the middle
of a quoted value. This, along with the other rules around handling
quoted values, prompted me to try and implement a more robust parsing
solution. This commit is my first stab at that.

Special Considerations
======================

This is _not_ a backwards-compatible change:

* The `dotenv` files produced by this version of SOPS _cannot_ be read
  by an earlier version.

* The `dotenv` files produced by an earlier version of SOPS _can_ be
  read by this version, with the understanding that the semantics around
  quotations and newlines have changed.

Examples
========

The below examples show how double-quoted values are passed to the
running environment:

```console
$ echo 'FOO="foo\\nbar\\nbaz"' > plaintext.env
$ sops -e --output ciphertext.env plaintext.env
$ sops exec-env ciphertext.env 'env | grep FOO | xxd'
00000000: 464f 4f3d 666f 6f5c 6e62 6172 5c6e 6261  FOO=foo\nbar\nba
00000010: 7a0a                                     z.
```

```console
$ echo 'FOO="foo\nbar\nbaz"' > plaintext.env
$ sops -e --output ciphertext.env plaintext.env
$ sops exec-env ciphertext.env 'env | grep -A2 FOO | xxd'
00000000: 464f 4f3d 666f 6f0a 6261 720a 6261 7a0a  FOO=foo.bar.baz.
```
Copy link
Contributor

@autrilla autrilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code itself LGTM.

What parsers did you look at? Could we have the parser in a separate repository, perhaps? It might be useful to other people.

@scjudd
Copy link
Contributor Author

scjudd commented Feb 27, 2020

Hey @autrilla, I can't remember all of the different parsers I looked at, but the rules defined for https://github.com/motdotla/dotenv#rules are what I based a lot of this on.

As far as breaking this out into a separate repo, I'm not opposed to that, but right now it is very purpose-built for SOPS. If you have any suggestions on what changes you'd like to see, I'm down to push this forward a little further. We need to land something along these lines to fix the issue for newlines and exec-env.

@scjudd
Copy link
Contributor Author

scjudd commented Mar 19, 2020

Hey @autrilla, just checking in on this. Is there anything I can do to help get this landed?

@autrilla
Copy link
Contributor

Sorry, I forgot about this. LGTM, thanks for the patch!

@autrilla autrilla merged commit 4507019 into getsops:develop Mar 20, 2020
rochaporto pushed a commit to rochaporto/sops that referenced this pull request Jun 22, 2020
Rationale
=========

The dotenv store as it exists right now performs splitting on newlines
to determine where a new key-value pair or comment begins. This works
remarkably well, up until you need to handle values that contain
newlines.

While I couldn't find an offical dotenv file format spec, I sampled a
number of open-source dotenv parsers and it seems that they typically
apply the following rules:

Comments:

* Comments may be written by starting a line with the `#` character.

Newline handling:

* If a value is unquoted or single-quoted and contains the character
  sequence `\n` (`0x5c6e`), it IS NOT decoded to a line feed (`0x0a`).

* If a value is double-quoted and contains the character sequence `\n`
  (`0x5c6e`), it IS decoded to a line feed (`0x0a`).

Whitespace trimming:

* For comments, the whitespace immediately after the `#` character and any
  trailing whitespace is trimmed.

* If a value is unquoted and contains any leading or trailing whitespace, it
  is trimmed.

* If a value is either single- or double-quoted and contains any leading or
  trailing whitespace, it is left untrimmed.

Quotation handling:

* If a value is surrounded by single- or double-quotes, the quotation marks
  are interpreted and not included in the value.

* Any number of single-quote characters may appear in a double-quoted
  value, or within a single-quoted value if they are escaped (i.e.,
  `'foo\'bar'`).

* Any number of double-quote characters may appear in a single-quoted
  value, or within a double-quoted value if they are escaped (i.e.,
  `"foo\"bar"`).

Because single- and double-quoted values may contain actual newlines,
we cannot split our input data on newlines as this may be in the middle
of a quoted value. This, along with the other rules around handling
quoted values, prompted me to try and implement a more robust parsing
solution. This commit is my first stab at that.

Special Considerations
======================

This is _not_ a backwards-compatible change:

* The `dotenv` files produced by this version of SOPS _cannot_ be read
  by an earlier version.

* The `dotenv` files produced by an earlier version of SOPS _can_ be
  read by this version, with the understanding that the semantics around
  quotations and newlines have changed.

Examples
========

The below examples show how double-quoted values are passed to the
running environment:

```console
$ echo 'FOO="foo\\nbar\\nbaz"' > plaintext.env
$ sops -e --output ciphertext.env plaintext.env
$ sops exec-env ciphertext.env 'env | grep FOO | xxd'
00000000: 464f 4f3d 666f 6f5c 6e62 6172 5c6e 6261  FOO=foo\nbar\nba
00000010: 7a0a                                     z.
```

```console
$ echo 'FOO="foo\nbar\nbaz"' > plaintext.env
$ sops -e --output ciphertext.env plaintext.env
$ sops exec-env ciphertext.env 'env | grep -A2 FOO | xxd'
00000000: 464f 4f3d 666f 6f0a 6261 720a 6261 7a0a  FOO=foo.bar.baz.
```
@mltsy
Copy link

mltsy commented Jul 23, 2020

Turns out this is a pseudo-breaking change for people using the dotenv format (including me), because the previous version of sops doesn't know what to do with the new quoted values (so in our build pipeline, which uses sops 3.5.0, it's unable to parse the new format coming from our local environments as people start upgrading to 3.6.0)

I'm not sure if there's a good place to put these notes. Ideally they wouldn't have been pointed out more prominently in the changelog - maybe I'll put in an issue to request a flag for --omit-dotenv-quotes or something in case others are encountering this issue...

@patricknelson
Copy link

Is this at all related to my issue at #784?

In my use case, I was loading environment variables which contain multiple lines (stored in encrypted JSON and YAML). It appears to be interpreted properly when viewing/editing directly within sops itself but not when loaded into environment variables via exec-env.

@mltsy
Copy link

mltsy commented Dec 10, 2020

Don't think it's related to this (this was only present in 3.5.0) - except that it's probably related to the dotenv encoder. Commented on your issue.

@patricknelson
Copy link

patricknelson commented Dec 10, 2020

I think you're right @mltsy, I see the operative line that affects my use case were reverted in 5d32d9a. However, I will say that investigating my issue which is related via the shared use of the dotenv output formatting, I found a possible issue with how the dotenv output format encodes/decodes environment variables (i.e. it'll add \n's even though the variable in it's raw form may already contain \n's, meaning it is mixing unencoded content with encoded content and potentially mangling things). I have more details in my other comment.

Basically, I recommend using something like joho/godotenv to encode/decode dotenv values and then separating that process from the process used for calling os.StartProcess which may accept an entirely different environment variable encoding (something I don't know, since I'm totally new to golang).

karlschriek pushed a commit to arrikto/sops that referenced this pull request Mar 16, 2022
Cherry-picked from:

Add standard newline/quoting behavior to dotenv store (getsops#622)
4507019

Closes issue:
https://github.com/arrikto/rok/issues/5563

Rationale
=========

The dotenv store as it exists right now performs splitting on newlines
to determine where a new key-value pair or comment begins. This works
remarkably well, up until you need to handle values that contain
newlines.

While I couldn't find an offical dotenv file format spec, I sampled a
number of open-source dotenv parsers and it seems that they typically
apply the following rules:

Comments:

* Comments may be written by starting a line with the `#` character.

Newline handling:

* If a value is unquoted or single-quoted and contains the character
  sequence `\n` (`0x5c6e`), it IS NOT decoded to a line feed (`0x0a`).

* If a value is double-quoted and contains the character sequence `\n`
  (`0x5c6e`), it IS decoded to a line feed (`0x0a`).

Whitespace trimming:

* For comments, the whitespace immediately after the `#` character and any
  trailing whitespace is trimmed.

* If a value is unquoted and contains any leading or trailing whitespace, it
  is trimmed.

* If a value is either single- or double-quoted and contains any leading or
  trailing whitespace, it is left untrimmed.

Quotation handling:

* If a value is surrounded by single- or double-quotes, the quotation marks
  are interpreted and not included in the value.

* Any number of single-quote characters may appear in a double-quoted
  value, or within a single-quoted value if they are escaped (i.e.,
  `'foo\'bar'`).

* Any number of double-quote characters may appear in a single-quoted
  value, or within a double-quoted value if they are escaped (i.e.,
  `"foo\"bar"`).

Because single- and double-quoted values may contain actual newlines,
we cannot split our input data on newlines as this may be in the middle
of a quoted value. This, along with the other rules around handling
quoted values, prompted me to try and implement a more robust parsing
solution. This commit is my first stab at that.

Special Considerations
======================

This is _not_ a backwards-compatible change:

* The `dotenv` files produced by this version of SOPS _cannot_ be read
  by an earlier version.

* The `dotenv` files produced by an earlier version of SOPS _can_ be
  read by this version, with the understanding that the semantics around
  quotations and newlines have changed.

Examples
========

The below examples show how double-quoted values are passed to the
running environment:

```console
$ echo 'FOO="foo\\nbar\\nbaz"' > plaintext.env
$ sops -e --output ciphertext.env plaintext.env
$ sops exec-env ciphertext.env 'env | grep FOO | xxd'
00000000: 464f 4f3d 666f 6f5c 6e62 6172 5c6e 6261  FOO=foo\nbar\nba
00000010: 7a0a                                     z.
```

```console
$ echo 'FOO="foo\nbar\nbaz"' > plaintext.env
$ sops -e --output ciphertext.env plaintext.env
$ sops exec-env ciphertext.env 'env | grep -A2 FOO | xxd'
00000000: 464f 4f3d 666f 6f0a 6261 720a 6261 7a0a  FOO=foo.bar.baz.
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants