Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistence matrix module #669

Merged
merged 156 commits into from
Jun 20, 2024
Merged

Conversation

hschreiber
Copy link
Collaborator

The matrix module is not finished yet, but I would like to have feedback on the structure, in particular on the way the different options are handled.
The options are listed in options.h and they are processed in matrix.h.
The basic functions of a matrix can also be found in matrix.h, and the more specialized functions are in their respective header (*_pairing, *_rep_cycles, *_vine_swap).

There are three different types of matrices:

  • the basic boundary matrix which can be reduced to R,
  • the boundary matrix but decomposed in RU (R is the usual reduced boundary matrix and U is such that R times U is the original boundary matrix)
  • and the matrix representing the base of a chain complex (as defined in Clement's and Steve's paper about zigzag persistence).

In example/comp_test.cpp is a temporary example of how to call all different options implemented yet. The idea was to call all possible functions with trivial parameters to verify that everything compiles correctly, but not to test the functions.

Copy link
Member

@mglisse mglisse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started looking at this PR. I don't have a global view of the design yet (there are many files!), but here are minor details I noted while looking at the code.

src/Persistence_matrix/include/gudhi/utilities/Z2_field.h Outdated Show resolved Hide resolved
src/Persistence_matrix/include/gudhi/utilities/Zp_field.h Outdated Show resolved Hide resolved
}

template<unsigned int characteristic>
inline void Zp_field_element<characteristic>::_multiply(unsigned int v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't look very efficient. Good thing it is a negligible part of the computation anyway...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the most efficient version I could find that computes the modulo multiplication without any overflow. But I agree that this class will probably never be used with such high numbers that an overflow is possible, so it is kind of useless. I will test at some point if this creates a real overhead or not.

src/Persistence_matrix/include/gudhi/utilities/Zp_field.h Outdated Show resolved Hide resolved
src/Persistence_matrix/include/gudhi/utilities/Zp_field.h Outdated Show resolved Hide resolved
src/Persistence_matrix/include/gudhi/matrix.h Outdated Show resolved Hide resolved
src/Persistence_matrix/include/gudhi/matrix.h Outdated Show resolved Hide resolved
@VincentRouvreau VincentRouvreau merged commit e4419b6 into GUDHI:master Jun 20, 2024
6 of 7 checks passed
Comment on lines +348 to +351
* @warning As the member is static, they can eventually be problems if the matrix is duplicated in several threads.
* If this become necessary, the static should be removed in the future.
*/
inline static Simple_object_pool<Column_type> columnPool_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a big problem if we cannot even have 2 threads that both compute persistence on completely independent filtrations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I forgot about that. I think, we can just remove the static from it and it should work just fine as it is only used inside the compressed matrix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC you have a comment about that elsewhere in the file (near swap maybe?) saying that it may be a problem.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but it is not that much of a problem. It was just to remind me that if I remove the static, I should also swap the pools in swap as the pointers won't belong to the right pool otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the right thing will happen if

  • I swap 2 non-empty matrices
  • I swap columns that don't belong to the same matrix

? Good then.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If two non empty matrices are swapped, you just have to be careful that the matrices own the right pool at the end. If two columns from different matrices are swapped, their content are swapped but not their addresses, so it is also fine. Were problem can arise when swapping two columns from different matrices, is at the row access, as the cells in the rows will not be swapped (you should first unlink a column and relink it after swap). But this has nothing to do with this pool. Nevertheless, I realized that I never mentioned this in the doc. I could also just add the linking/unlinking in the swap operator.

* @ref set_characteristic before calling for the first time a method needing it.
* Ignored if @ref PersistenceMatrixOptions::is_z2 is true.
*/
Matrix(unsigned int numberOfColumns,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should all the unsigned int used for columns be index_type or some related typedef?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to use a native type here to indicate that numberOfColumns is not something stored (and therefore doesn't really need to be optimized in size). And the chances are not very high that someone wants to use more than max unsigned int columns. Otherwise, index would be the right type to use.
I am more bothered by the fact that I use unsigned int in one constructor and int in another. This makes no sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to use a native type here to indicate that numberOfColumns is not something stored (and therefore doesn't really need to be optimized in size).

As a user, that's not the message I understand when I read unsigned int, almost the opposite. std::size_t or std::intmax_t (or maybe ptrdiff_t) would be closer.

the chances are not very high that someone wants to use more than max unsigned int columns.

"640K [of RAM] ought to be enough for anybody" 😉
A few years after starting Gudhi, we received a bug report because the code failed on a simplicial complex with 3 billion simplices. Sure, that required an awful lot of memory to reproduce, but I don't see a good reason to hardcode a limit of 4 billions (the default options can still limit the computation to 4G, that's fine).

Comment on lines +1580 to +1584
if constexpr (isNonBasic && !PersistenceMatrixOptions::is_of_boundary_type &&
PersistenceMatrixOptions::column_indexation_type == Column_indexation_types::CONTAINER)
return matrix_.insert_boundary(boundary, dim);
else
matrix_.insert_boundary(boundary, dim);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If matrix_.insert_boundary(boundary, dim); returns void exactly in the right cases, we could just write return matrix_.insert_boundary(boundary, dim); without need for 2 cases.
(if sometimes matrix_.insert_boundary(boundary, dim); returns a vector but we don't want to return that vector, then forget this comment)

Comment on lines +486 to +490
using tmp_column_type = typename std::conditional<
Master_matrix::Option_list::is_z2,
std::set<id_index>,
std::map<id_index, Field_element_type>
>::type;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is surprising that this is not one of the types defined in Persistence_matrix/columns.

*
* @brief List of column types.
*/
enum Column_types {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for a user to use their own column type? There is a concept of what such a column type must look like, but from the use of this enum, it doesn't look like a user has any way of telling matrix to use it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.10.0 GUDHI version 3.10.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants