Skip to content

Add an option to make .tasproj files smaller. #4331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

SuuperW
Copy link
Contributor

@SuuperW SuuperW commented May 24, 2025

Tasproj files from TAStudio can easily be gigabytes in size. The vast majority of that comes from savestates that don't need to be saved. The increased storage space used, increased time spend saving, and increased time spend loading are far worse than having to emulate some frames after loading, at least in my case. (8+ GB files for Nintendo DS projects, with an ancient interval than can be emulated in a few seconds)

Should we make "Reserved Only" be the default?

Check if completed:

I ran the tests after committing, and a serialization test fails. It seems to expect that the actual string output of serialization exactly matches an input string. This test makes no sense to me. Why would one ever care about that? (I would understand doing it the other way: validating that serializing then deserializing results in a matching object... but not comparing serialized strings.)

Because I don't understand why the test exists, I have not updated it. It currently fails, so this PR is a draft for now. Why does this test exist, or what should be done to get it passing again?

SuuperW added 2 commits May 24, 2025 01:36
The dialog says this should be set as high as possible, and that it should not be "overly cumbersome to replay this many frames". Thus, the user should never have to replay more than this many frames (since that would be overly cumbersome) unless states were invalidated.
@vadosnaprimer
Copy link
Contributor

Thanks! Finally someone figured this out!

Should we make "Reserved Only" be the default?

IMO yes.

@YoshiRulz
Copy link
Member

That test class' purpose is to ensure that no unexpected JSON structures end up in the config, and it's using hardcoded strings because that was easier than trying to recursively check class fields. You can just add the new prop to the end of the string to match the new value.

@Morilli
Copy link
Collaborator

Morilli commented May 24, 2025

Having an option for only reserved states sounds useful, but I'm not sure "none" is ever useful.

However, I don't quite understand what you mean with (8+ GB files for Nintendo DS projects, with an ancient interval than can be emulated in a few seconds). How does the ancient interval matter here? If you're saving reserved only, you would still get a large file size if there are many ancient states, and if not the interval doesn't matter, no?

@vadosnaprimer
Copy link
Contributor

Some of the previous talk for the reference #2613

@CasualPokePlayer
Copy link
Member

CasualPokePlayer commented May 24, 2025

The vast majority of space in a tasproj for the case of DS (and other "large" state cores) is not from "normal" states, but rather the reserved states. This is not even from ancient states here, but rather states nearby branches and markers (and of course branches themselves).

This happens due to the fact reserved states do not have a size limit (besides hitting OOM), while other states are limited to a set buffer size (the defaults of which are rather small for large state cores but that's besides the point). By only saving reserved states, you're not fixing any problem here (you will still have very large tasprojs for large state cores).

Note anyways for small state cores, most of space might be in non-reserve states, but the space taken it up rather small, the largest is typically 200MBs (and more often it's around a 100MBs for a "large" case, depends on how well zstd compresses it, often times it's way smaller)

Also if you have an 8GB tasproj, then loading that is going to be taking an absurd amount of memory regardless because that's the compressed size, when it's loaded into memory it's uncompressed. A typical user is going to be hitting into OOM sooner than getting an 8GB tasproj (again, all from reserved states). Recent changes have made it so the user can delete branches and markers to reclaim memory anyways (which they probably should be doing if they have 100s of branches/markers)

@SuuperW
Copy link
Contributor Author

SuuperW commented May 24, 2025

Having an option for only reserved states sounds useful, but I'm not sure "none" is ever useful.

However, I don't quite understand what you mean with (8+ GB files for Nintendo DS projects, with an ancient interval than can be emulated in a few seconds). How does the ancient interval matter here? If you're saving reserved only, you would still get a large file size if there are many ancient states, and if not the interval doesn't matter, no?

The ancient interval would matter because it will affect how many states are saved with "reserved only". So, in my case, saving a tasproj with reserved only means I can open it and be effectively back where I was in <10 seconds, counting file load time and time to seek to the project's current frame (restoring the state buffers along the way, at least partially since they might have previously had states older than the closest reserved state).

Yes, the file size still will be large if there are a lot of ancient states. But it is still far smaller than keeping the current and recent buffers!

For some context, here's my usual settings when doing a TAS of Mario Kart DS:
Current size: 8,000 (I will make it bigger once I start TASing with this commit in effect!)
Current frame length: 1,200 (same, it will get bigger)
Recent and gaps: Both are minimum size (64) and short frame length, since I hardly use them anyway. I personally prefer just using the current buffer.
Ancient state interval: 500

When I'm not doing a full-game TAS, there are only a handful of "ancient" states and they take far less space than the "current" buffer, which needs to be HUGE for me to have a smooth experience.

I can see how a full-game TAS could be different. Perhaps a "markers only" option would be good for that?

The usefulness of "none" is questionable... I personally don't plan on using it as-is. But I would like to have an option to also remove states for branches at some point, and then "none" could make actually small files and I would use it then.

@SuuperW
Copy link
Contributor Author

SuuperW commented May 24, 2025

The vast majority of space in a tasproj for the case of DS (and other "large" state cores) is not from "normal" states, but rather the reserved states. This is not even from ancient states here, but rather states nearby branches and markers (and of course branches themselves).

This happens due to the fact reserved states do not have a size limit (besides hitting OOM), while other states are limited to a set buffer size (the defaults of which are rather small for large state cores but that's besides the point). By only saving reserved states, you're not fixing any problem here (you will still have very large tasprojs for large state cores).

Note anyways for small state cores, most of space might be in non-reserve states, but the space taken it up rather small, the largest is typically 200MBs (and more often it's around a 100MBs for a "large" case, depends on how well zstd compresses it, often times it's way smaller)

Also if you have an 8GB tasproj, then loading that is going to be taking an absurd amount of memory regardless because that's the compressed size, when it's loaded into memory it's uncompressed. A typical user is going to be hitting into OOM sooner than getting an 8GB tasproj (again, all from reserved states). Recent changes have made it so the user can delete branches and markers to reclaim memory anyways (which they probably should be doing if they have 100s of branches/markers)

I have spend a LOT of time making TASes with TAStudio. And you're telling me the thing I want, and worked to implement, is useless? That's rude. But here, have some numbers. I just opened a real project (not some toy one) that I have, and saved it with all three options:
All: 838,758 KB, and takes a long enough time to save the file that I am discouraged from regularly saving.
Reserved only: 147,025 KB, and saving takes an amount of time I consider reasonable.
None: 107,470 KB, a bit faster saving than reserved only but not a lot.

So that's a 82.5% reduction in file size plus I gain the ability to save without significantly interrupting my work.

You say that "largest is typically 200MBs". Your experience obviously does not represent mine. See my file size above. That is my normal.

When I said 8GB I meant the un-compressed size, the size of my "current" buffer. I will admit I could have clarified that. My .tasproj files themselves are quite a bit smaller than 8GB but have been over 2GB for a full-game TAS (which "reserved only" would not help quite so much with, but still would be nice to have).

@SuuperW
Copy link
Contributor Author

SuuperW commented May 24, 2025

Another option to consider having, perhaps more useful than "none" would be to not keep "ancient" but do keep marker states. I'd use that for a full-game TAS over the current reserved only.

@CasualPokePlayer
Copy link
Member

If you put 8GB as the current buffer then those numbers make more sense. Even so that seems kind of overkill, generally 1GB (or maybe 2GB) is good enough in this case. The result is typically then what I described. In my experience, the largest issue ends up being markers and branches, since those again don't have a size limit (although even then they don't become problematic until 100-200 or so branches/markers which is a lot).

@SuuperW
Copy link
Contributor Author

SuuperW commented May 28, 2025

Force push, now based on ancient_interval branch to make use of its code. Reserved Only is now the default. Branches do not reserve states. Saving with reserved only now includes states not yet in the reserved list but that should be. New option to only save marker frame states.

@SuuperW SuuperW marked this pull request as ready for review May 28, 2025 02:40
@CasualPokePlayer
Copy link
Member

CasualPokePlayer commented Jun 1, 2025

Reserved Only is now the default.

I'm quite against doing this as doing this would only be useful in the case the user is using non-default settings in the first place (i.e. changing the buffer sizes to much much much higher numbers). If Reserved Only is useful to the user they can change it in the config themselves while they're also changing the buffer sizes.

Branches do not reserve states.

This should not be done. At most, make it an option, do not remove this feature outright entirely.

@SuuperW
Copy link
Contributor Author

SuuperW commented Jun 1, 2025

This should not be done. At most, make it an option, do not remove this feature outright entirely.

I just now looked in the commit history to try to understand why this feature exists. I found that this change was already made a month ago, and then reverted by you with no commit or comments explanation. The discussion in Discord around this leads me to believe that the value you place on this feature comes from a misunderstanding of how things actually work.

If this issue is really that important to you, you can revert it or block this PR. BizHawk isn't my personal project after all, and this is why I make pull requests instead of committing to master. I'll keep it in my own fork either way.

@YoshiRulz
Copy link
Member

re: defaults, generally space is effectively free, while time is incredibly valuable. RAM and disk space are there to be used.

@SuuperW
Copy link
Contributor Author

SuuperW commented Jun 1, 2025

I wouldn't say disk space is free, though it is rather cheap.
Either way, it sounds to me like you are arguing against reserve only being the default. But valuing time over all else is actually a reason to make reserve only (or even "none") the default. This is because auto-save is turned on at a 2-minute interval by default. A user cannot work on a tasproj during an autosave (as it freezes the emulator and UI), so making this faster is highly valuable. Complaints about how long auto-save takes are pretty common, from my experience, which backs up the idea that we should not default to saving all states.

@CasualPokePlayer
Copy link
Member

CasualPokePlayer commented Jun 1, 2025

Auto-save complaints have historically largely been before zstd compression was added. After that, save times have been reduced enormously. Zstd compression is much much faster than deflate compression (not to mention reducing the size drastically, although this is highly variable depending on the core; the 200MiB is a very high end guess, it's sometimes as low as ~10MiBs, maybe even ~2MiBs for some cores, of course before you get into weeds with branches and such growing the movie size), and as such largely reduced the time, now simply a minor hitch (although this can grow in length, mainly due to the reserved states again along with branches, especially for large state cores, since those do not have a cap in size while the main greenzone states do have a cap).

If anything, if we want to reduce the autosave length here, it probably would be better to default to autosaving backup files (which already do not save greenzone, perhaps this could be changed to only save reserve states; although granted this still will grow in length with more reserved states and branches added, so perhaps not), or even autosaving backup bk2 files (the absolute fastest autosave, with of course obvious caveats; but considering autosave is for worst case crash scenarios this might be fine to do).

@vadosnaprimer
Copy link
Contributor

TASEditor has this nice thing that we never replicated

изображение

Increasing state gaps when saving by a fixed number was our solution and it worked 10 years ago before cores like PSX became the norm. Then with lots of help from r57shell I switched to smart gaps that follow playback and increase on states that are farther back (and ahead), we called it state decay. It was very complicated and fragile, but it worked.

Then WBX was updated to only save changed memory and all the states got lumped together hard. So our hack around that was dropping them all from backups, and recommending users to use backups, even tho we never ended up defaulting to that. Maybe we should have.

If we can have some granularity on what gets saved, that's better than saving all or nothing. How much exactly should be the default is debatable, but I'd argue that anything that distracts you regularly is worth reducing - either in duration or in frequency. We know that simply autosaving less is not a great solution to it being slow. So for those who want autosave to happen often, it should be possible to configure saving to be quick. And who doesn't mind the time would just not use those options.

I personally prefer autosaving more often and faster. But regular saving should not take 10 seconds either!

So ideally we'd have a dialog that lets us check which parts of the projects to save for which save type, then maybe everyone would be happy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants