Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add xonsh history #1678

Merged
merged 6 commits into from
Feb 12, 2024
Merged

Add xonsh history #1678

merged 6 commits into from
Feb 12, 2024

Conversation

jfmontanaro
Copy link
Contributor

@jfmontanaro jfmontanaro commented Feb 6, 2024

Following up on #1375 by @Matthieu-LAURENT39, this PR adds history importers for Xonsh. Since Xonsh has two different history storage backends (one using JSON files and another using a SQLite db) there are two importers.

I followed the examples set of Nu and Zsh and named the JSON one simply xonsh, since it's the default. I named the other one xonsh_sqlite, though, rather than xonsh_histdb, since sqlite is the name Xonsh uses for it (i.e. you enable it by setting $XONSH_HISTORY_BACKEND = 'sqlite'). Happy to change that, though, if it would be better to maintain consistency.

The reason I'm marking this PR as a draft is because I have a couple of questions that should be answered before it gets merged. I realized after I wrote them all up that it's a bit of a wall-o-text, so here's the TLDR:

  • A lot of my tests depend on env vars, which is a problem because they run in parallel and env vars are global state. What to do about this?
  • I added some files for test data in atuin-client/tests/data, is that ok?
  • Due to limited float precision, timestamps and durations might be off by a few tens of nanoseconds. I could probably find a way to fix this, but it would require a fair bit of fiddling and possibly some extra dependencies. Is it worth it?
  • I'm assuming that the user/hostname for all commands in the Xonsh history match the current user/hostname. None of the other shell importers do this, is that ok?

Env vars in tests

All of my tests wound up depending on various env vars, which it turns out is dicey in a cargo test-ing context because it runs tests in parallel, and the environment is global state. So right now the tests are semi-randomly failing because they wind up fighting over the same env vars.

atuin_common::utils has a similar issue, which it solves by grouping all the env-var dependent tests into a single super-test so that they run sequentially. I tried doing the same thing, but it didn't help much because I'm running two very similar sets of tests across the two modules, and they both want to mess with the same env vars.

I could extract the tests and run them as integration tests instead, but that's difficult because they need to access non-public struct fields and things, which is no problem when they're in a submodule but does constitute a problem when you want to run them as integration tests.

The ideal way to solve this would be some sort of thread-local way of setting env vars that's specifically for testing, but I don't think that exists. Barring that, I could just ditch most of the tests (since most of them are only testing what happens when various env vars are set) - I'd still have to figure out what to do about the import tests, since I'm using the env vars there to force a certain hostname so that it matches the expected one, but I guess I don't have to compare hostnames in the tests.

Test data location

Also related to tests: I really wanted to test the import logic against actual real-world history files generated by actual real-world Xonsh, which meant that I had to find somewhere to put them. I chose atuin-client/tests/data/xonsh/ for the JSON files and atuin-client/tests/data/xonsh-history.sqlite for the SQLite db, but I wasn't sure about this since there aren't any other test data files in the repo. I'm happy to move these, or take them out altogether if it doesn't seem worth the hassle.

Nanoseconds and floating-point precision

Xonsh stores its timestamps as decimal numbers of seconds, e.g. the current timestamp as I'm writing this is 1707249574.752465. This gets stored as either a JSON number or a SQLite REAL. The obvious target when parsing from either of these formats is an f64, but unfortunately the Xonsh timestamps are sometimes represented with more digits than can properly fit into an f64, so you end up losing some precision. I'm not sure how big of a deal this is - the error is on the order of tens of nanoseconds, compared to durations that usually number in the milliseconds. But it is a deviation from what's technically recorded in the Xonsh history, so I thought I'd ask.

It's also possible that these timestamps were never that precise to begin with: The Python docs indicate that there is a limit to how much precision can be faithfully represented with a float, and on my machine that limit is 15 decimal digits. Some of the timestamps I'm getting from Xonsh history files have more digits than that, so I'm not sure what is going on - it's possible that Xonsh is using a different type than the native Python float to represent its timestamps, but I'd have to dig into it to be sure.

If we want to fix this, it will require parsing those numbers into a different type from f64. The easiest thing would be to find a crate somewhere that defines a larger floating-point type, such as e.g. f128, and use that, if there exists one that plays nicely with deserialization from JSON and from SQLite. If not, I could probably figure something out, but it would definitely take some fiddling. Also there would be a question about where to put it - since both modules will need it, the logical place would be in the parent module, I guess? Anyway, probably not an issue unless we are particularly concerned about nanosecond precision.

Hostname inference

Xonsh doesn't explicitly store the user/hostname in either of its history formats. It seems reasonable to me to assume that the history was all generated on the current device, and so just use the current user/hostname when importing. Does that make sense?

I briefly discussed this with @ellie in #1375, but wanted to check back about it because I realized that none of the other history importers do it. If you want I could add similar functionality to the other importers, and maybe even add a CLI option for it so that you could explicitly set the user/hostname when importing? That might be a little feature-creep-y, though.

Checks

  • I am happy for maintainers to push small adjustments to this PR, to speed up the review cycle
  • I have checked that there are no existing pull requests for the same thing

@ellie
Copy link
Member

ellie commented Feb 8, 2024

Firstly - thank you for all the work here!

I realized after I wrote them all up that it's a bit of a wall-o-text, so here's the TLDR:

I appreciate it! The extra detail is great though, but listing the questions makes life much easier 🙏

A lot of my tests depend on env vars, which is a problem because they run in parallel and env vars are global state. What to do about this?

If it's possible/easy to abstract away the env-var-reading, so we can test the underlying function (which can then be passed an env var, static string, whatever), then that would be great.

We don't need to test whether env-vars are read OK. I'd like to rework what we already have with env-vars to do this as well.

I added some files for test data in atuin-client/tests/data, is that ok?

Definitely ok! So long as they're small (I checked, they're OK). My only concerns with this are when people add huge test artifacts.

Due to limited float precision, timestamps and durations might be off by a few tens of nanoseconds. I could probably find a way to fix this, but it would require a fair bit of fiddling and possibly some extra dependencies. Is it worth it?

Short answer: Yes, this is fine. I'm happy to move forward without f128/whatever.

Long answer:

It's... almost certainly fine.

Sync as is the default now depends on timestamps (why did I do this 🙃). If there are too many history records with the same timestamp, sync will not complete properly. This is because the Postgres timestamp type "only` has microsecond precision, not nanosecond. If an entire "page" of history has the same timestamp (when converted nano -> micro), then it won't work correctly.

This would only happen if we end up with >1000 history items that occured in the same microsecond. In real life, this doesn't happen. It can only happen if an importer imports history incorrectly, or with the wrong assumptions. I really don't see this happening due to a loss of precision.

Even if it does, v18 is going to be releasing both this + a new version of sync (opt in), without this limitation.

So it's very unlikely to be an issue, and if it is, we have a workaround.

I'm assuming that the user/hostname for all commands in the Xonsh history match the current user/hostname. None of the other shell importers do this, is that ok?

That's ok - really it would make sense to change the other importers.

@jfmontanaro
Copy link
Contributor Author

jfmontanaro commented Feb 9, 2024

If it's possible/easy to abstract away the env-var-reading, so we can test the underlying function (which can then be passed an env var, static string, whatever), then that would be great.

In the process of working on this, I realized that most of the test cases that depended on env vars (primarily testing the different ways it might determine where to find the xonsh history to import) are actually just testing the behavior of the directories crate: it's responsible for looking at $XDG_DATA_HOME, $HOME, etc. Since it (presumably) has its own test cases for managing this stuff, all this was really testing was:

  1. My code's use of $XONSH_DATA_DIR, and
  2. My code's use of $ATUIN_HOST_NAME and $ATUIN_HOST_USER.

So I removed the extraneous test cases, refactored the history-path-resolver function to accept a passed-in value for xonsh_data_dir instead of looking it up directly, and added a hostname field to the importers so that it could be overridden in testing.

I think this means everything is done on my end, so I've rebased on upstream and am ready to go!

ellie
ellie previously approved these changes Feb 12, 2024
Copy link
Member

@ellie ellie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you for working on this 🙏

It would be great if we could also get an update to the docs - https://github.com/atuinsh/docs

@ellie ellie merged commit 87e19df into atuinsh:main Feb 12, 2024
15 checks passed
ellie pushed a commit to atuinsh/docs that referenced this pull request Feb 21, 2024
Documentation for Xonsh importers, as discussed in atuinsh/atuin#1678
and atuinsh/atuin#1711.

I also noticed that the TOC wasn't generating properly on the page, so I
changed the h1's to h2's and that seems to have fixed it. This does
slightly change the appearance of the page, so I'm happy to revert that
if you'd prefer to keep it the way it was.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants