Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why my chaindata size up to 400GB? #15797

Closed
xuzhiping7 opened this issue Jan 3, 2018 · 14 comments
Closed

why my chaindata size up to 400GB? #15797

xuzhiping7 opened this issue Jan 3, 2018 · 14 comments

Comments

@xuzhiping7
Copy link

I am confuse about the chaindata size , it seems growing unresonable.

image

I have set up several server for running geth-full-node . and the chaindata size is not the same size .394GB,212GB...etc.

I try the commond:
geth removedb
nohup ./geth --fast --cache=1024 --rpc --rpcapi "db,eth,net,web3,personal" &

and download again the chaindata , it take serveral hours up to 50GB , and every thing is ok .my wallet data and the block data is the latest.

so , how can i cutdown the chaindata size ? is 400GB normal ?

thanks.

@karalabe
Copy link
Member

karalabe commented Jan 3, 2018

After it's initial sync, Geth switches to "full sync" where all historical state form that point onward is retained. If you resync, then only the latest state is downloaded. The latest state with the blockchain data is worth about 50GB, but since we don't have state pruning yet, after a sync the data just keeps accumulating.

@xuzhiping7
Copy link
Author

so , the chaindata size would be the longer time it run ,the size would much larger than fast-mode chaindata size?

and how i can get the real "full sync" chaindata-size from beginning till now ?

thank @karalabe ~

@MysticRyuujin
Copy link
Contributor

@karalabe What do you recommend doing to reduce the size but stay in sync?

I have limited SSD storage and an always on node, I let it run 24/7 but it keeps getting too big for my SSD. So I have to removedb and re-sync from scratch every once in a while...which of course takes at least a few hours...is there a better method? Like some kind of Export -> Quick Import we could do? I have storage space available just not SSD storage 😄

@karalabe
Copy link
Member

karalabe commented Jan 11, 2018

@MysticRyuujin We're working on a memory cache to reduce database writes quite significantly (PoC tests show about 60-70% less data written to disk). That will hopefully land in Geth 1.8.0 and make this problem a rarer issue #15857.

Geth also supports "fast syncing" with itself, which you can use to synchronize an existing chain into a fresh data directory and then swap out the old one with the fresh one:

geth --datadir=/my/temp/datadir copydb --cache=512 /my/main/datadir/geth/chaindata/
rm -rf /my/main/datadir/geth/chaindata/
mv /my/temp/datadir/geth/chaindata /my/main/datadir/geth/

Please do the above manually, I just wrote up the rm and mv commands there to make it clearer what I mean.

@xuzhiping7
Copy link
Author

thx @karalabe ~

@didil
Copy link

didil commented Apr 30, 2018

any workaround so far ? this makes it hard to run a production node with high availability without having a large disk (unnecessarily expensive ... )

@Shazam14
Copy link

Hi,
has this been resolved?
Thank you, I have the same issue with the limited ssd

@xuzhiping7
Copy link
Author

xuzhiping7 commented Jun 13, 2018

@Shazam14

Hi,as far as i know ,no.
You can only level up your ssd ~

@karalabe
Copy link
Member

Partially. The chain still grows, but at a much much slower rate than before. Doing a full sync on mainnet (i.e. not fast) results in about a 2x database size compared to a pruned node. We're still working on getting final pruning implemented to keep it even slower.

@Querzel
Copy link

Querzel commented Jun 20, 2018

Hi,
why can't we just split the data directory like in Bitcoin ? https://en.bitcoin.it/wiki/Splitting_the_data_directory
So the old less used states & blocks go on the cheap HDD and the newer ones with high load on the SSD ...
And Parity already has "auto" pruning implemented in it's warp mode.

@oleksiikoshlatyi
Copy link

oleksiikoshlatyi commented Mar 1, 2020

To starting from Geth 1.9.+ there are new feature, that calls ancient folder

Regarding to the https://github.com/ethereum/go-ethereum/wiki/command-line-options
you can specify the key:

--datadir.ancient value Data directory for ancient chain segments (default = inside chaindata)
This folder contains information with infrequent access, info is here:

Freezer basics By default Geth will place your freezer inside your
chaindata folder, into the ancient subfolder. The reason for using a
sub-folder was to avoid breaking any automated tooling that might be
moving the database around or across instances. You can explicitly
place the freezer in a different location via the --datadir.ancient
CLI flag.

When you update to v1.9.0 from an older version, Geth will
automatically being migrating blocks and receipts from the LevelDB
database into the freezer. If you haven’t specified --datadir.ancient
at that time, but would like to move it later, you will need to copy
the existing ancient folder manually and then start Geth with
--datadir.ancient set to the correct path.

Reference

If you do not want to spend time syncing Ethereum node --syncmode full you can download whole chaindata folder from https://www.chaindata.club and continue to sync node

@david-drinn
Copy link

david-drinn commented Jan 21, 2021

I was led here for the copydb help. Just for future reference for people landing here, the copydb interface has apparently also been updated with the ancient folder feature. You have to specify the ancient folder as the second argument. For example:

geth --datadir=/my/temp/datadir copydb --cache=512 /my/main/datadir/geth/chaindata/ \
    /my/main/datadir/geth/chaindata/ancient/

I was assuming that it would search for the ancient folder inside the chaindata folder if it wasn't specified, so I was hung up on this for a minute. #22203

@dorianhenning
Copy link

Hi @oleksiikoshlatyi , thanks very much for your explanation!
I was wondering if with the new --datadir.ancient flag pointing to a location on an HDD, but the --datadir being on the SSD, if the syncronisation is still fast enough?

I'm referring to @karalabe 's reply here, where he points out that synching the chain on an HDD is extremely slow if not impossible.

@MysticRyuujin
Copy link
Contributor

@dorianhenning - Yes, it's fast enough

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants