Rust Dot Env Parser

This repository contains a relatively simple dotenv file parser. This project is being developed for educational purposes and should not be used in production. Also note that the library on crates.io can be found here--this was the development repository for that library.

dotenv Files

While the concept of environment variables is much older, the .env file syntax were introduced in 2012 and popularized in 2013 as a way for developers to store important environment variables / secrets / keys outside of source control (like Git) (at least, according to dotenv).

An example file looks like the following:

# example .env file
STRIPE_API_KEY=scr_12345
TWILIO_API_KEY=abcd1234

The .env syntax consists of:

a set of keys and values, where
the keys are assigned to values via the = assignment operator, such that
there should be no spaces between the key, the assignment operator, and the value, and
each key-value pair is separated by a new line,
and comments are indicated by a # sign and end at the end of a line, and
if there are spaces, equal signs, or newlines (' ' or = or \n) used in the key or value, the entire key or value must be enclosed in quotation marks,
and other typical escape character rules apply within quotation marked keys and values

Implementation

This module breaks down the process of reading the .env file into two steps: (1) lexing the .env file and (2) parsing the lexed content.

When most people say 'parsing', they mean transforming some raw input into a well-defined, often pre-defined, structure--like a struct, array, or other similar thing. Another step in that transformation, though, is lexing, the process of transforming raw input into meaningful chunks. This is also known as tokenizing, although lexing is subtly different. Tokenizing involves breaking raw input into tokens, and lexing does the same but adds some additional content to those tokens.

Once the input is lexed, it can be parsed. This involves making sure that the tokens follow the pre-defined syntax rules of the structure that the content is being parsed into. For example, above I outlined the basic syntax of .env files.

In my lexing function, I read the contents into the following tokens, using Rust's enums:

enum EnvToken {
    Character(char),
    AssignmentOperator,
    NewLine,
    EOF,
    Comment,
    Whitespace,
}

Supposing we had an .env file like this:

HELLO=WORLD

The array of lexed tokens would look like this:

[Character("H"), Character("E"), Character("L"), Character("L"), Character("O"), AssignmentOperator, Character("W"), Character("O"), Character("R"), Character("L"), Character("D"), NewLine, EOF]

From there, the parsing function takes in this array and uses the syntax rules to determine if the tokens follow the right rules. The parser ensures that the ordering of the tokens goes something like:

Line Start -> Key
Line Start -> Comment
Line Start -> Line Start
Key -> AssignmentOperator
AssignmentOperator -> Value
Value -> NewLine
Value -> Comment
Value -> Whitespace
Value -> EOF

Where any sequence that breaks these rules returns an error.

The following code is an example of use of my module:

use std::collections::HashMap;
use std::fs;

fn main() {
    let contents = fs::read_to_string("Test.env").expect("unable to read file");
    let new_env_map: HashMap<String, String> = process_dot_env(contents).expect("unable to parse env file");
    for (k, v) in new_env_map.iter() {
        println!("{} : {}", k, v);
    }
}

Only after all that lexing and parsing does the resulting key-value contents get returned to the programmer in the form of a regular hash map.

This repository's main.rs file opens, parses, and prints the variables in the Test.env file at the directory's root.

Limitations

The module cannot parse multi-line or quoted keys/values. I may add this to complete the project but my main task of learning more basic Rust concepts was completed.

Why parse with Rust?

This is the first real parser I've written. I have written regular expressions before, and some rudementary parsing, but nothing that fully tokenized and then parsed content in this way. Despite the fact that I did not choose Rust because I wanted to build a .env parser (I chose to write an .env parser because I wanted to write a project in Rust), it ended up being the right language for the job.

Rust's enums were an excellent language feature that allowed me to lex the .env content. Enums are basically exhaustive lists of possible values of a given type. For example, enums for the state of whether a door is open might be Open or Closed. Similarly, .env files can only have Keys, Values, or Comments (forgetting, for a moment, whitespaces and newline). Rust's enums not only allowed me to turn a into the enum EnvToken::Character but also attach the value a to it (like EnvToken::Character("a"))

This is much different than, for example, Go which (1) doesn't have proper enums at all and (2) has no way of attaching values to enums. a would have turned into Character and I would have needed to use a different strategy for lexing the content.

(Instead, I would have had to lex the content into full keys and full values, rather than characters. Right now, the lexer generates many Character() tokens and the parser puts the keys back together from the tokens).

Conclusion

I enjoyed this project and may try to build other lexers and parsers with Rust down the road. I also have not set up this project as a proper module yet, so I may learn more about the Rust toolchain by publishing it as a crate.

As I am learning, if I've said anything incorrect, please feel free to make an issue of the issue tracker.

Addendum 1: Error Types

After implementing the core parsing, it was pointed out that I'm not really returning real error types, but just strings that contain error information. Instead, it's more idiomatic to enumerate (more enums!) the variety of errors that one might encounter while parsing the tokens. These are the errors that I implemented:

pub enum EnvError {
    UnexpectedToken {
        expected: String,
        found: String,
        line: i64,
        character: i64,
    },
    MissingAssignmentOperator {
        key: String,
        line: i64,
        character: i64,
    },
    ExpectedValueButFoundAssignment {
        line: i64,
        character: i64,
    },
    MissingKey {
        line: i64,
    },
    MissingValue {
        line: i64,
    },
    FoundOnlyKey {
        line: i64,
    },
}

Now, those errors can still carry the important information that I was packing into the string (namely the position of where an error was enountered like 'line 2, character 3') by implementing the fmt::Display trait. This looks like:

impl fmt::Display for EnvError {
fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
    match self {
        EnvError::UnexpectedToken {
            expected,
            found,
            line,
            character,
        } => write!(
            f,
            "Unexpected token: expected {expected} but found '{found}' at line {line}, character {character}",
        ),
        ...
        ...

Where the parser returns those errors like:

...
return Err(EnvError::UnexpectedToken {
    expected: "comment of new line".to_string(),
    found: c.to_string(),
    line: line_counter,
    character: character_counter,
});

Now, we're preserving the valuable debugging information for the developer, gaining control over our error state, and still formatting custom string debugging messages. Coming from doing a lot of Go programming recently, I never knew enums could be so powerful!

Artificial Intelligence

Although I hand-wrote all of the code, as I am still learning, I used large language models to help me understand some of the syntax of Rust.

License

All source code in this repository is licensed under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
Test.env		Test.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Rust Dot Env Parser

dotenv Files

Implementation

Limitations

Why parse with Rust?

Conclusion

Addendum 1: Error Types

Artificial Intelligence

License

About

Uh oh!

Releases

Packages

Languages

License

cameronmore/RustEnv

Folders and files

Latest commit

History

Repository files navigation

Rust Dot Env Parser

dotenv Files

Implementation

Limitations

Why parse with Rust?

Conclusion

Addendum 1: Error Types

Artificial Intelligence

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages