Provide table dataframe hashing in public API?

In https://github.com/audeering/audformat/pull/419 we introduced storing of tables as parquet files, and created a custom hashing method for it as `audformat.utils.hash()` does not consider column names, and does not provide consistent hashes across different `pandas` versions.

The steps of the hashing, that could be added to a public function, are:

https://github.com/audeering/audformat/blob/c1328079be136a1801759891d851ca03953f4dad/audformat/core/table.py#L1137-L1142

The only downside might be that we have already `audformat.utils.hash()`, which is meant for index, series, and dataframe at the moment. In our own tools, we only use the hashing of index. So the question arises how we should name the new hash function, and how it should be positioned with regards to `audformat.utils.hash()`.

	table = pa.Table.from_pandas(self.df.reset_index(), preserve_index=False)

	# Create hash of table
	table_hash = hashlib.md5()
	table_hash.update(_schema_hash(table))
	table_hash.update(_dataframe_hash(self.df))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide table dataframe hashing in public API? #447

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Provide table dataframe hashing in public API? #447

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions