Open
Description
I'm archiving this data:
std::vector<std::shared_ptr<Lemma>> lemmas;
std::map<std::string, std::vector<std::shared_ptr<Form>>> forms;
The size of lemmas vector and forms map is ~100,000 each. The problem is deserializing from portable binary takes 30 secs of my Core i5!
std::ifstream is("dict.cereal", std::ios::binary);
cereal::PortableBinaryInputArchive iarchive(is); // Create an input archive
iarchive(lemmas, forms);
Is that normal or what?
Activity
AzothAmmo commentedon Jun 15, 2017
What is the structure of
Lemma
andForm
? If you can't post actual code, can you just describe what they are serializing (and also the sizeof)? Are you using polymorphism?I'll try and see if I can reproduce this. Our binary serialization should be very fast.
temehi commentedon Sep 7, 2017
I am also experiencing a similar problem.
I have the following data to serialize:
my_map
contains about 8-billion elements, and the binary file saved is around 33 GB on disk. when I deserialize it usingI takes about 2500 secs. Isn't that a bit slow?
erichkeane commentedon Sep 7, 2017
I would say that depends. loading that much data into memory is going to be time consuming either way. having to send that to swap is going to be quite time consuming.
Additionally, with that much data, the unordered_map is going to be re-indexing near-constantly. With that much data indexed by a uint64_t, you are likely better off choosing a different data structure (depending on your distribution of keys).
temehi commentedon Sep 7, 2017
Thanks for your reply
No need to send to swap, for my particular problem, having enough memory is not an issue.
One way to avoid re-indexing/hashing to call
reserve
(size_type count);
function on the unordered_map object.If I do that, the loading time goes down to ~1000secs.
erichkeane commentedon Sep 7, 2017
Well, ifstream seems to do an additional copy as a part of it as well, so you're copying the data at least 2x. Perhaps consider using something like boost::iostreams::mapped_file. That'll probably save you another few hundred seconds.
Additionally, are you compiling with optimizations on? The cereal code is pretty template heavy, so it benefits extremely well from higher optimization levels. Particularly setting things like -march=native (if that is acceptable).
AzothAmmo commentedon Sep 7, 2017
We can definitely add a call to
reserve
forunordered_map
loads.Rinkss commentedon Jul 31, 2018
serialization and deserialization of map<int, vector > is very slow. i am passing this as object to the archives . It's taking around 3secs to deserialize 113MB file
Rinkss commentedon Jul 31, 2018
Also similar problem arise when I try using map<int,string>
Batodalaev commentedon Jul 14, 2024
Can we add
map.reserve(size);
here - https://github.com/USCiLab/cereal/blob/master/include/cereal/types/concepts/pair_associative_container.hpp#L56 ?UPD. also:
set.reserve(size);
https://github.com/USCiLab/cereal/blob/master/include/cereal/types/set.hpp#L58