Skip to content

Added Entanglement Concentration Dataset for 3 and 4 qubits for Benchmarking Binary Classifiers #915

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

RishiNandha
Copy link
Contributor

Summary

Classification dataset for 3 and 4 qubits based on the concentration of entanglement (CE) in Quantum States. Two pre-trained circuits are used to generate states of a given amount of CE. Users can use this dataset to benchmark their binary classification pipelines.

Pre-trained weights courtesy to https://github.com/LSchatzki/NTangled_Datasets. The CE values claimed in the above repository had a mismatch for 8 qubits, hence we've left other number of qubits for future development

I've verified mypy, spell, lint and black. For some reason, make html breaks the other modules. Need some help with resolving that

Details and comments

  • We have made the order and default values of parameters match the existing ad hoc data generator for consistency. Hence equal number of datapoints in each class are generated.
  • There are two sampling options: the input states given to the circuit before it's action can either be sampled by setting each qubit's state as one of the axes of the bloch sphere ("cardinal") or can be sampled randomly ("isotropic")
  • Each qubit has an easy and a hard mode. Easy has a larger difference in CE values than hard. This is to make benchmarking of pipelines more standardizable. Easy can be used to verify the working of algorithms, while hard can be used to test the maximum the algorithm can achieve.
  • There are two formatting options. The x_train and x_test can either be a numpy array or a list of quantum states
  • Reference for confirming the relevance of such a dataset: https://arxiv.org/abs/2109.03400. Authors have shown that QCNNs can learn from these datasets effectively

- Classification dataset for 3 and 4 qubits
- Pre-trained weights courtesy to https://github.com/LSchatzki/NTangled_Datasets
- The CE values claimed in the above repository had a mismatch for 8 qubits, hence we've left other number of qubits for future development
- Make html breaks for some reason. Need to fix in upcoming commits

Co-Authored-By: Nishant Vasan <69106567+rockywick@users.noreply.github.com>
Co-Authored-By: rogue-infinity <116993419+rogue-infinity@users.noreply.github.com>
@RishiNandha
Copy link
Contributor Author

Oh the init file seems to have gotten missed. I'll recommit

@RishiNandha
Copy link
Contributor Author

Not sure why the 3.9 tests are passing but the 3.11 and 3.12 ones are failing. And it seems like the routine is raising errors majorly only on files I've left untouched. Any inputs of what might be happening?

@woodsp-ibm
Copy link
Member

If you look at Actions tab at the top of the page i.e. here https://github.com/qiskit-community/qiskit-machine-learning/actions you will see Machine Learning Tests that are Scheduled. These are the same tests but are run nightly so any changes in dependents that may cause problems/failures etc are caught. The scheduled tests have been failing for a while and the main branch code needs updating in some way (maybe pinning to an earlier dependent or changing code to suit etc) so things pass again. With that done, and then merged with the code in your PR, it would then only be changes done by your PR that could cause failures - but with the base (i.e. main) failing its not.

@woodsp-ibm
Copy link
Member

The CI issues have been fixed so I updated the branch (via the update button that was here) so any issues now would be just down to this PR - unless it has a random failure for which there is an issue #903 around that,

@coveralls
Copy link

coveralls commented May 14, 2025

Pull Request Test Coverage Report for Build 15622602924

Details

  • 95 of 119 (79.83%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.3%) to 90.542%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit_machine_learning/datasets/entanglement_concentration.py 93 117 79.49%
Totals Coverage Status
Change from base Build 15622557464: -0.3%
Covered Lines: 4624
Relevant Lines: 5107

💛 - Coveralls

@edoaltamura edoaltamura added type: enhancement ✨ Features or aspects to improve short project A task amounting to a small project (but larger than a "good first issue") labels Jun 12, 2025
Copy link
Collaborator

@edoaltamura edoaltamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is nearly ready. I'd suggest not using npy files though, because decoding the binaries might require a specific version of Numpy, which will likely change in the future. Since the files are relatively small, we could convert the arrays in the npys to json or txt files.

training_size: int,
test_size: int,
n: int,
mode: str = "easy",
Copy link
Member

@woodsp-ibm woodsp-ibm Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My take, instead of suppressing the warning about too many positional args, would be to add *, after the n parameter which would allow the first 3 arguments to be positional but require the following, which do have defaults so they do not need to be provided, be provided when doing so as keyword arguments only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
short project A task amounting to a small project (but larger than a "good first issue") type: enhancement ✨ Features or aspects to improve
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants