Add xonsh history #1678

jfmontanaro · 2024-02-06T20:10:22Z

Following up on #1375 by @Matthieu-LAURENT39, this PR adds history importers for Xonsh. Since Xonsh has two different history storage backends (one using JSON files and another using a SQLite db) there are two importers.

I followed the examples set of Nu and Zsh and named the JSON one simply xonsh, since it's the default. I named the other one xonsh_sqlite, though, rather than xonsh_histdb, since sqlite is the name Xonsh uses for it (i.e. you enable it by setting $XONSH_HISTORY_BACKEND = 'sqlite'). Happy to change that, though, if it would be better to maintain consistency.

The reason I'm marking this PR as a draft is because I have a couple of questions that should be answered before it gets merged. I realized after I wrote them all up that it's a bit of a wall-o-text, so here's the TLDR:

A lot of my tests depend on env vars, which is a problem because they run in parallel and env vars are global state. What to do about this?
I added some files for test data in atuin-client/tests/data, is that ok?
Due to limited float precision, timestamps and durations might be off by a few tens of nanoseconds. I could probably find a way to fix this, but it would require a fair bit of fiddling and possibly some extra dependencies. Is it worth it?
I'm assuming that the user/hostname for all commands in the Xonsh history match the current user/hostname. None of the other shell importers do this, is that ok?

Env vars in tests

All of my tests wound up depending on various env vars, which it turns out is dicey in a cargo test-ing context because it runs tests in parallel, and the environment is global state. So right now the tests are semi-randomly failing because they wind up fighting over the same env vars.

atuin_common::utils has a similar issue, which it solves by grouping all the env-var dependent tests into a single super-test so that they run sequentially. I tried doing the same thing, but it didn't help much because I'm running two very similar sets of tests across the two modules, and they both want to mess with the same env vars.

I could extract the tests and run them as integration tests instead, but that's difficult because they need to access non-public struct fields and things, which is no problem when they're in a submodule but does constitute a problem when you want to run them as integration tests.

The ideal way to solve this would be some sort of thread-local way of setting env vars that's specifically for testing, but I don't think that exists. Barring that, I could just ditch most of the tests (since most of them are only testing what happens when various env vars are set) - I'd still have to figure out what to do about the import tests, since I'm using the env vars there to force a certain hostname so that it matches the expected one, but I guess I don't have to compare hostnames in the tests.

Test data location

Also related to tests: I really wanted to test the import logic against actual real-world history files generated by actual real-world Xonsh, which meant that I had to find somewhere to put them. I chose atuin-client/tests/data/xonsh/ for the JSON files and atuin-client/tests/data/xonsh-history.sqlite for the SQLite db, but I wasn't sure about this since there aren't any other test data files in the repo. I'm happy to move these, or take them out altogether if it doesn't seem worth the hassle.

Nanoseconds and floating-point precision

Xonsh stores its timestamps as decimal numbers of seconds, e.g. the current timestamp as I'm writing this is 1707249574.752465. This gets stored as either a JSON number or a SQLite REAL. The obvious target when parsing from either of these formats is an f64, but unfortunately the Xonsh timestamps are sometimes represented with more digits than can properly fit into an f64, so you end up losing some precision. I'm not sure how big of a deal this is - the error is on the order of tens of nanoseconds, compared to durations that usually number in the milliseconds. But it is a deviation from what's technically recorded in the Xonsh history, so I thought I'd ask.

It's also possible that these timestamps were never that precise to begin with: The Python docs indicate that there is a limit to how much precision can be faithfully represented with a float, and on my machine that limit is 15 decimal digits. Some of the timestamps I'm getting from Xonsh history files have more digits than that, so I'm not sure what is going on - it's possible that Xonsh is using a different type than the native Python float to represent its timestamps, but I'd have to dig into it to be sure.

If we want to fix this, it will require parsing those numbers into a different type from f64. The easiest thing would be to find a crate somewhere that defines a larger floating-point type, such as e.g. f128, and use that, if there exists one that plays nicely with deserialization from JSON and from SQLite. If not, I could probably figure something out, but it would definitely take some fiddling. Also there would be a question about where to put it - since both modules will need it, the logical place would be in the parent module, I guess? Anyway, probably not an issue unless we are particularly concerned about nanosecond precision.

Hostname inference

Xonsh doesn't explicitly store the user/hostname in either of its history formats. It seems reasonable to me to assume that the history was all generated on the current device, and so just use the current user/hostname when importing. Does that make sense?

I briefly discussed this with @ellie in #1375, but wanted to check back about it because I realized that none of the other history importers do it. If you want I could add similar functionality to the other importers, and maybe even add a CLI option for it so that you could explicitly set the user/hostname when importing? That might be a little feature-creep-y, though.

Checks

I am happy for maintainers to push small adjustments to this PR, to speed up the review cycle
I have checked that there are no existing pull requests for the same thing

ellie · 2024-02-08T08:31:50Z

Firstly - thank you for all the work here!

I realized after I wrote them all up that it's a bit of a wall-o-text, so here's the TLDR:

I appreciate it! The extra detail is great though, but listing the questions makes life much easier 🙏

A lot of my tests depend on env vars, which is a problem because they run in parallel and env vars are global state. What to do about this?

If it's possible/easy to abstract away the env-var-reading, so we can test the underlying function (which can then be passed an env var, static string, whatever), then that would be great.

We don't need to test whether env-vars are read OK. I'd like to rework what we already have with env-vars to do this as well.

I added some files for test data in atuin-client/tests/data, is that ok?

Definitely ok! So long as they're small (I checked, they're OK). My only concerns with this are when people add huge test artifacts.

Due to limited float precision, timestamps and durations might be off by a few tens of nanoseconds. I could probably find a way to fix this, but it would require a fair bit of fiddling and possibly some extra dependencies. Is it worth it?

Short answer: Yes, this is fine. I'm happy to move forward without f128/whatever.

Long answer:

It's... almost certainly fine.

Sync as is the default now depends on timestamps (why did I do this 🙃). If there are too many history records with the same timestamp, sync will not complete properly. This is because the Postgres timestamp type "only` has microsecond precision, not nanosecond. If an entire "page" of history has the same timestamp (when converted nano -> micro), then it won't work correctly.

This would only happen if we end up with >1000 history items that occured in the same microsecond. In real life, this doesn't happen. It can only happen if an importer imports history incorrectly, or with the wrong assumptions. I really don't see this happening due to a loss of precision.

Even if it does, v18 is going to be releasing both this + a new version of sync (opt in), without this limitation.

So it's very unlikely to be an issue, and if it is, we have a workaround.

I'm assuming that the user/hostname for all commands in the Xonsh history match the current user/hostname. None of the other shell importers do this, is that ok?

That's ok - really it would make sense to change the other importers.

jfmontanaro · 2024-02-09T19:05:04Z

If it's possible/easy to abstract away the env-var-reading, so we can test the underlying function (which can then be passed an env var, static string, whatever), then that would be great.

In the process of working on this, I realized that most of the test cases that depended on env vars (primarily testing the different ways it might determine where to find the xonsh history to import) are actually just testing the behavior of the directories crate: it's responsible for looking at $XDG_DATA_HOME, $HOME, etc. Since it (presumably) has its own test cases for managing this stuff, all this was really testing was:

My code's use of $XONSH_DATA_DIR, and
My code's use of $ATUIN_HOST_NAME and $ATUIN_HOST_USER.

So I removed the extraneous test cases, refactored the history-path-resolver function to accept a passed-in value for xonsh_data_dir instead of looking it up directly, and added a hostname field to the importers so that it could be overridden in testing.

I think this means everything is done on my end, so I've rebased on upstream and am ready to go!

ellie

Looks great! Thank you for working on this 🙏

It would be great if we could also get an update to the docs - https://github.com/atuinsh/docs

atuin-client/src/import/xonsh.rs

atuin-client/tests/data/xonsh/xonsh-82eafbf5-9f43-489a-80d2-61c7dc6ef542.json

Documentation for Xonsh importers, as discussed in atuinsh/atuin#1678 and atuinsh/atuin#1711. I also noticed that the TOC wasn't generating properly on the page, so I changed the h1's to h2's and that seems to have fixed it. This does slightly change the appearance of the page, so I'm happy to revert that if you'd prefer to keep it the way it was.

jfmontanaro added 4 commits February 9, 2024 11:03

add importers for xonsh JSON files and SQLite db

f9af63e

rustfmt xonsh importers

3dd3b25

remove env-dependent tests from xonsh importers

c1a322a

pass xonsh_data_dir into path resolver instead of looking up in env

b2245c3

ellie previously approved these changes Feb 12, 2024

View reviewed changes

atuin-client/src/import/xonsh.rs Show resolved Hide resolved

atuin-client/tests/data/xonsh/xonsh-82eafbf5-9f43-489a-80d2-61c7dc6ef542.json Show resolved Hide resolved

review: run format

4c6241f

ellie dismissed their stale review via 4c6241f February 12, 2024 10:12

review: fix clippy errors

7c4971b

ellie merged commit 87e19df into atuinsh:main Feb 12, 2024
15 checks passed

jfmontanaro mentioned this pull request Feb 16, 2024

Add documentation for Xonsh import atuinsh/docs#29

Merged

BrewTestBot mentioned this pull request Mar 11, 2024

atuin 18.1.0 Homebrew/homebrew-core#165754

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add xonsh history #1678

Add xonsh history #1678

jfmontanaro commented Feb 6, 2024 •

edited

Loading

ellie commented Feb 8, 2024

jfmontanaro commented Feb 9, 2024 •

edited

Loading

ellie left a comment

Add xonsh history #1678

Add xonsh history #1678

Conversation

jfmontanaro commented Feb 6, 2024 • edited Loading

Env vars in tests

Test data location

Nanoseconds and floating-point precision

Hostname inference

Checks

ellie commented Feb 8, 2024

jfmontanaro commented Feb 9, 2024 • edited Loading

ellie left a comment

Choose a reason for hiding this comment

jfmontanaro commented Feb 6, 2024 •

edited

Loading

jfmontanaro commented Feb 9, 2024 •

edited

Loading