-
-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add xonsh history #1678
Add xonsh history #1678
Conversation
Firstly - thank you for all the work here!
I appreciate it! The extra detail is great though, but listing the questions makes life much easier 🙏
If it's possible/easy to abstract away the env-var-reading, so we can test the underlying function (which can then be passed an env var, static string, whatever), then that would be great. We don't need to test whether env-vars are read OK. I'd like to rework what we already have with env-vars to do this as well.
Definitely ok! So long as they're small (I checked, they're OK). My only concerns with this are when people add huge test artifacts.
Short answer: Yes, this is fine. I'm happy to move forward without f128/whatever. Long answer: It's... almost certainly fine. Sync as is the default now depends on timestamps (why did I do this 🙃). If there are too many history records with the same timestamp, sync will not complete properly. This is because the Postgres timestamp type "only` has microsecond precision, not nanosecond. If an entire "page" of history has the same timestamp (when converted nano -> micro), then it won't work correctly. This would only happen if we end up with >1000 history items that occured in the same microsecond. In real life, this doesn't happen. It can only happen if an importer imports history incorrectly, or with the wrong assumptions. I really don't see this happening due to a loss of precision. Even if it does, v18 is going to be releasing both this + a new version of sync (opt in), without this limitation. So it's very unlikely to be an issue, and if it is, we have a workaround.
That's ok - really it would make sense to change the other importers. |
In the process of working on this, I realized that most of the test cases that depended on env vars (primarily testing the different ways it might determine where to find the xonsh history to import) are actually just testing the behavior of the
So I removed the extraneous test cases, refactored the history-path-resolver function to accept a passed-in value for I think this means everything is done on my end, so I've rebased on upstream and am ready to go! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thank you for working on this 🙏
It would be great if we could also get an update to the docs - https://github.com/atuinsh/docs
atuin-client/tests/data/xonsh/xonsh-82eafbf5-9f43-489a-80d2-61c7dc6ef542.json
Show resolved
Hide resolved
Documentation for Xonsh importers, as discussed in atuinsh/atuin#1678 and atuinsh/atuin#1711. I also noticed that the TOC wasn't generating properly on the page, so I changed the h1's to h2's and that seems to have fixed it. This does slightly change the appearance of the page, so I'm happy to revert that if you'd prefer to keep it the way it was.
Following up on #1375 by @Matthieu-LAURENT39, this PR adds history importers for Xonsh. Since Xonsh has two different history storage backends (one using JSON files and another using a SQLite db) there are two importers.
I followed the examples set of Nu and Zsh and named the JSON one simply
xonsh
, since it's the default. I named the other onexonsh_sqlite
, though, rather thanxonsh_histdb
, sincesqlite
is the name Xonsh uses for it (i.e. you enable it by setting$XONSH_HISTORY_BACKEND = 'sqlite'
). Happy to change that, though, if it would be better to maintain consistency.The reason I'm marking this PR as a draft is because I have a couple of questions that should be answered before it gets merged. I realized after I wrote them all up that it's a bit of a wall-o-text, so here's the TLDR:
atuin-client/tests/data
, is that ok?Env vars in tests
All of my tests wound up depending on various env vars, which it turns out is dicey in a
cargo test
-ing context because it runs tests in parallel, and the environment is global state. So right now the tests are semi-randomly failing because they wind up fighting over the same env vars.atuin_common::utils
has a similar issue, which it solves by grouping all the env-var dependent tests into a single super-test so that they run sequentially. I tried doing the same thing, but it didn't help much because I'm running two very similar sets of tests across the two modules, and they both want to mess with the same env vars.I could extract the tests and run them as integration tests instead, but that's difficult because they need to access non-public struct fields and things, which is no problem when they're in a submodule but does constitute a problem when you want to run them as integration tests.
The ideal way to solve this would be some sort of thread-local way of setting env vars that's specifically for testing, but I don't think that exists. Barring that, I could just ditch most of the tests (since most of them are only testing what happens when various env vars are set) - I'd still have to figure out what to do about the import tests, since I'm using the env vars there to force a certain hostname so that it matches the expected one, but I guess I don't have to compare hostnames in the tests.
Test data location
Also related to tests: I really wanted to test the import logic against actual real-world history files generated by actual real-world Xonsh, which meant that I had to find somewhere to put them. I chose
atuin-client/tests/data/xonsh/
for the JSON files andatuin-client/tests/data/xonsh-history.sqlite
for the SQLite db, but I wasn't sure about this since there aren't any other test data files in the repo. I'm happy to move these, or take them out altogether if it doesn't seem worth the hassle.Nanoseconds and floating-point precision
Xonsh stores its timestamps as decimal numbers of seconds, e.g. the current timestamp as I'm writing this is
1707249574.752465
. This gets stored as either a JSON number or a SQLiteREAL
. The obvious target when parsing from either of these formats is anf64
, but unfortunately the Xonsh timestamps are sometimes represented with more digits than can properly fit into anf64
, so you end up losing some precision. I'm not sure how big of a deal this is - the error is on the order of tens of nanoseconds, compared to durations that usually number in the milliseconds. But it is a deviation from what's technically recorded in the Xonsh history, so I thought I'd ask.It's also possible that these timestamps were never that precise to begin with: The Python docs indicate that there is a limit to how much precision can be faithfully represented with a float, and on my machine that limit is 15 decimal digits. Some of the timestamps I'm getting from Xonsh history files have more digits than that, so I'm not sure what is going on - it's possible that Xonsh is using a different type than the native Python float to represent its timestamps, but I'd have to dig into it to be sure.
If we want to fix this, it will require parsing those numbers into a different type from
f64
. The easiest thing would be to find a crate somewhere that defines a larger floating-point type, such as e.g.f128
, and use that, if there exists one that plays nicely with deserialization from JSON and from SQLite. If not, I could probably figure something out, but it would definitely take some fiddling. Also there would be a question about where to put it - since both modules will need it, the logical place would be in the parent module, I guess? Anyway, probably not an issue unless we are particularly concerned about nanosecond precision.Hostname inference
Xonsh doesn't explicitly store the user/hostname in either of its history formats. It seems reasonable to me to assume that the history was all generated on the current device, and so just use the current user/hostname when importing. Does that make sense?
I briefly discussed this with @ellie in #1375, but wanted to check back about it because I realized that none of the other history importers do it. If you want I could add similar functionality to the other importers, and maybe even add a CLI option for it so that you could explicitly set the user/hostname when importing? That might be a little feature-creep-y, though.
Checks