-
Notifications
You must be signed in to change notification settings - Fork 414
Add an option to make .tasproj files smaller. #4331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The dialog says this should be set as high as possible, and that it should not be "overly cumbersome to replay this many frames". Thus, the user should never have to replay more than this many frames (since that would be overly cumbersome) unless states were invalidated.
Thanks! Finally someone figured this out!
IMO yes. |
That test class' purpose is to ensure that no unexpected JSON structures end up in the config, and it's using hardcoded strings because that was easier than trying to recursively check class fields. You can just add the new prop to the end of the string to match the new value. |
Having an option for only reserved states sounds useful, but I'm not sure "none" is ever useful. However, I don't quite understand what you mean with |
Some of the previous talk for the reference #2613 |
src/BizHawk.Client.Common/movie/tasproj/ZwinderStateManagerSettings.cs
Outdated
Show resolved
Hide resolved
The vast majority of space in a tasproj for the case of DS (and other "large" state cores) is not from "normal" states, but rather the reserved states. This is not even from ancient states here, but rather states nearby branches and markers (and of course branches themselves). This happens due to the fact reserved states do not have a size limit (besides hitting OOM), while other states are limited to a set buffer size (the defaults of which are rather small for large state cores but that's besides the point). By only saving reserved states, you're not fixing any problem here (you will still have very large tasprojs for large state cores). Note anyways for small state cores, most of space might be in non-reserve states, but the space taken it up rather small, the largest is typically 200MBs (and more often it's around a 100MBs for a "large" case, depends on how well zstd compresses it, often times it's way smaller) Also if you have an 8GB tasproj, then loading that is going to be taking an absurd amount of memory regardless because that's the compressed size, when it's loaded into memory it's uncompressed. A typical user is going to be hitting into OOM sooner than getting an 8GB tasproj (again, all from reserved states). Recent changes have made it so the user can delete branches and markers to reclaim memory anyways (which they probably should be doing if they have 100s of branches/markers) |
The ancient interval would matter because it will affect how many states are saved with "reserved only". So, in my case, saving a tasproj with reserved only means I can open it and be effectively back where I was in <10 seconds, counting file load time and time to seek to the project's current frame (restoring the state buffers along the way, at least partially since they might have previously had states older than the closest reserved state). Yes, the file size still will be large if there are a lot of ancient states. But it is still far smaller than keeping the current and recent buffers! For some context, here's my usual settings when doing a TAS of Mario Kart DS: When I'm not doing a full-game TAS, there are only a handful of "ancient" states and they take far less space than the "current" buffer, which needs to be HUGE for me to have a smooth experience. I can see how a full-game TAS could be different. Perhaps a "markers only" option would be good for that? The usefulness of "none" is questionable... I personally don't plan on using it as-is. But I would like to have an option to also remove states for branches at some point, and then "none" could make actually small files and I would use it then. |
I have spend a LOT of time making TASes with TAStudio. And you're telling me the thing I want, and worked to implement, is useless? That's rude. But here, have some numbers. I just opened a real project (not some toy one) that I have, and saved it with all three options: So that's a 82.5% reduction in file size plus I gain the ability to save without significantly interrupting my work. You say that "largest is typically 200MBs". Your experience obviously does not represent mine. See my file size above. That is my normal. When I said 8GB I meant the un-compressed size, the size of my "current" buffer. I will admit I could have clarified that. My .tasproj files themselves are quite a bit smaller than 8GB but have been over 2GB for a full-game TAS (which "reserved only" would not help quite so much with, but still would be nice to have). |
Another option to consider having, perhaps more useful than "none" would be to not keep "ancient" but do keep marker states. I'd use that for a full-game TAS over the current reserved only. |
If you put 8GB as the current buffer then those numbers make more sense. Even so that seems kind of overkill, generally 1GB (or maybe 2GB) is good enough in this case. The result is typically then what I described. In my experience, the largest issue ends up being markers and branches, since those again don't have a size limit (although even then they don't become problematic until 100-200 or so branches/markers which is a lot). |
…ause branches have their own state anyway (separate from the state manager) and we use that when loading.
Force push, now based on ancient_interval branch to make use of its code. Reserved Only is now the default. Branches do not reserve states. Saving with reserved only now includes states not yet in the reserved list but that should be. New option to only save marker frame states. |
I'm quite against doing this as doing this would only be useful in the case the user is using non-default settings in the first place (i.e. changing the buffer sizes to much much much higher numbers). If Reserved Only is useful to the user they can change it in the config themselves while they're also changing the buffer sizes.
This should not be done. At most, make it an option, do not remove this feature outright entirely. |
I just now looked in the commit history to try to understand why this feature exists. I found that this change was already made a month ago, and then reverted by you with no commit or comments explanation. The discussion in Discord around this leads me to believe that the value you place on this feature comes from a misunderstanding of how things actually work. If this issue is really that important to you, you can revert it or block this PR. BizHawk isn't my personal project after all, and this is why I make pull requests instead of committing to master. I'll keep it in my own fork either way. |
re: defaults, generally space is effectively free, while time is incredibly valuable. RAM and disk space are there to be used. |
I wouldn't say disk space is free, though it is rather cheap. |
Auto-save complaints have historically largely been before zstd compression was added. After that, save times have been reduced enormously. Zstd compression is much much faster than deflate compression (not to mention reducing the size drastically, although this is highly variable depending on the core; the 200MiB is a very high end guess, it's sometimes as low as ~10MiBs, maybe even ~2MiBs for some cores, of course before you get into weeds with branches and such growing the movie size), and as such largely reduced the time, now simply a minor hitch (although this can grow in length, mainly due to the reserved states again along with branches, especially for large state cores, since those do not have a cap in size while the main greenzone states do have a cap). If anything, if we want to reduce the autosave length here, it probably would be better to default to autosaving backup files (which already do not save greenzone, perhaps this could be changed to only save reserve states; although granted this still will grow in length with more reserved states and branches added, so perhaps not), or even autosaving backup bk2 files (the absolute fastest autosave, with of course obvious caveats; but considering autosave is for worst case crash scenarios this might be fine to do). |
TASEditor has this nice thing that we never replicated Increasing state gaps when saving by a fixed number was our solution and it worked 10 years ago before cores like PSX became the norm. Then with lots of help from r57shell I switched to smart gaps that follow playback and increase on states that are farther back (and ahead), we called it state decay. It was very complicated and fragile, but it worked. Then WBX was updated to only save changed memory and all the states got lumped together hard. So our hack around that was dropping them all from backups, and recommending users to use backups, even tho we never ended up defaulting to that. Maybe we should have. If we can have some granularity on what gets saved, that's better than saving all or nothing. How much exactly should be the default is debatable, but I'd argue that anything that distracts you regularly is worth reducing - either in duration or in frequency. We know that simply autosaving less is not a great solution to it being slow. So for those who want autosave to happen often, it should be possible to configure saving to be quick. And who doesn't mind the time would just not use those options. I personally prefer autosaving more often and faster. But regular saving should not take 10 seconds either! So ideally we'd have a dialog that lets us check which parts of the projects to save for which save type, then maybe everyone would be happy. |
Tasproj files from TAStudio can easily be gigabytes in size. The vast majority of that comes from savestates that don't need to be saved. The increased storage space used, increased time spend saving, and increased time spend loading are far worse than having to emulate some frames after loading, at least in my case. (8+ GB files for Nintendo DS projects, with an ancient interval than can be emulated in a few seconds)
Should we make "Reserved Only" be the default?
Check if completed:
I ran the tests after committing, and a serialization test fails. It seems to expect that the actual string output of serialization exactly matches an input string. This test makes no sense to me. Why would one ever care about that? (I would understand doing it the other way: validating that serializing then deserializing results in a matching object... but not comparing serialized strings.)
Because I don't understand why the test exists, I have not updated it. It currently fails, so this PR is a draft for now. Why does this test exist, or what should be done to get it passing again?