Skip to content

Conversation

maxnoe
Copy link
Member

@maxnoe maxnoe commented Oct 7, 2025

Fixes #2839

@maxnoe maxnoe force-pushed the remove-index-tables branch from da5ff99 to e6b629d Compare October 7, 2025 15:11
Copy link

ctao-sonarqube bot commented Oct 7, 2025

Quality Gate passed Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

@maxnoe maxnoe requested review from TjarkMiener and kosack October 8, 2025 14:52
@maxnoe maxnoe added this to the v0.27.0 milestone Oct 9, 2025
@kosack
Copy link
Member

kosack commented Oct 9, 2025

That is strange, since that is part of pytables, so shouldn't cause any corruption - I wonder if there is an upstream bug? Nominally adding the index tables should increase the speed of doing merges, since it basically pre-computes the event locations. However as we now do the merging with astropy.table and not via the HDF5 interface directly, I'm not sure it has any effect for us, though I didn't check. Did anyone look to see if removing this has a performance hit? It was added specifially to speed up io.

My question is more: should we just turn it off by default until we see if there is an upstream bug, before we just delete this feature?

There was no test for this, since I wasn't sure really how to test it. However, we also don'thave a test for the compression settings we use, etc, and those were determined from a set of IO studies I did a long time ago, when the HDF5TableWriter was first developed. Adding the index tables did show some increase in read speed in these early tests, but as I said, maybe with the way we read data now, it's moot.

@maxnoe
Copy link
Member Author

maxnoe commented Oct 9, 2025

It's already turned off by default, and it wasn't tested at all.

You can see here that I completely removed it and didn't modify any tests.

We also didn't activate it e.g. for grid productions.

@maxnoe
Copy link
Member Author

maxnoe commented Oct 9, 2025

I also don't see how it would speed up merges using tables, as that doesn't perform any joins or similar on columns, it just appends data

@maxnoe
Copy link
Member Author

maxnoe commented Oct 9, 2025

And yes, I suspect an upstream bug and will try to produce a minimal example for a report

@kosack
Copy link
Member

kosack commented Oct 10, 2025

Ok, fine to remove it then -I did check and there is no real need since we do the merging in-memory with astropy.table, so these indices are never used. I think pytables doesn't even have a join operation, which is a bit odd, since that is where these would be useful...

@kosack kosack merged commit fd1fb78 into main Oct 10, 2025
13 checks passed
@kosack kosack deleted the remove-index-tables branch October 10, 2025 09:15
@maxnoe
Copy link
Member Author

maxnoe commented Oct 10, 2025

I prefer to remove now and eventually add it back later once we understood why the corruption is happening and if these indices are actually useful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Event IDs duplicated/mismatched between tables when writing simtel to HDF5 with ctapipe-processor

3 participants