-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Labels
questionFurther information is requestedFurther information is requested
Description
In #419 we introduced storing of tables as parquet files, and created a custom hashing method for it as audformat.utils.hash() does not consider column names, and does not provide consistent hashes across different pandas versions.
The steps of the hashing, that could be added to a public function, are:
audformat/audformat/core/table.py
Lines 1137 to 1142 in c132807
| table = pa.Table.from_pandas(self.df.reset_index(), preserve_index=False) | |
| # Create hash of table | |
| table_hash = hashlib.md5() | |
| table_hash.update(_schema_hash(table)) | |
| table_hash.update(_dataframe_hash(self.df)) |
The only downside might be that we have already audformat.utils.hash(), which is meant for index, series, and dataframe at the moment. In our own tools, we only use the hashing of index. So the question arises how we should name the new hash function, and how it should be positioned with regards to audformat.utils.hash().
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested