The following are some examples on how to use the qad
package to train and test quantum models, and reproduce results from the paper.
The data used for training and testing all the quantum machine learning models is published in zenodo.
The training and testing of the unsupervised kernel machine is
accomplished using the
. The
configuration parameters of the model, e.g., quantum or classical
version, feature map, number of training samples, backend used for the
quantum computation, etc, are defined through the arguments of the
scripts. For instance, to train the model:
python --sig_path /path/to/signal/data --bkg_path /path/to/background/data --test_bkg_path /path/to/test_background/data --unsup --nqubits 8 --feature_map u_dense_encoding --run_type ideal --output_folder quantum_test --nu_param 0.01 --ntrain 600 --quantum
To test the saved model:
python --sig_path /path/to/signal/data --bkg_path /path/to/background/data --test_bkg_path /path/to/test_background/data --model trained_qsvms/quantum_test_nu\=0.01_ideal/
For a small scale demo that can be run on a normal personal computer, in a reasonable amount of time (5-10 minutes), consider using ntrain
at the order of 50 to 200 data points for the
script, and ntest
at around 1000 to 10000 data points for the
After the unsuperised quantum and classical kernel machines have been trained and test scores have been saved, one can summarise their performance with a ROC curve plot. Firstly, following our convention the test scores are prepared for plotting using scripts/kernel_machines/scripts/
, and by running
python --classical_folder trained_qsvms/c_test_nu\=0.01/ --quantum_folder trained_qsvms/q_test_nu\=0.01_ideal/ --out_path test_plot --name_suffix n<n_test>_k<k_folds>
Then, we load the score values from the saved files using our convention, e.g. for the case of three different signals, with eight latent dimensions, 600 training datapoints, 100k testing datapoints, and k=5 folds
n_folds = 5
latent_dim = '8'
mass=['35', '35', '15']
br_na=['NA', '', 'BR'] # narrow (NA) or broad (BR)
signal_name=['RSGraviton_WW', 'AtoHZ_to_ZZZ', 'RSGraviton_WW']
ntest = ['100', '100', '100']
q_loss_qcd=[]; q_loss_sig=[]; c_loss_qcd=[]; c_loss_sig=[]
for i in range(len(signal_name)):
#if br_na[i]:
with h5py.File(f'{read_dir}/Latent_{latent_dim}_trainsize_{n_samples_train}_{signal_name[i]}'
'{mass[i]}{br_na[i]}_n{ntest[i]}k_kfold{n_folds}.h5', 'r') as file:
The final ROC plot, as it appears in the paper in Fig. 3, can be obtained
colors = ['forestgreen', '#EC4E20', 'darkorchid']
legend_signal_names=['Narrow 'r'G $\to$ WW 3.5 TeV', r'A $\to$ HZ $\to$ ZZZ 3.5 TeV', 'Broad 'r'G $\to$ WW 1.5 TeV']
pl.plot_ROC_kfold_mean(q_loss_qcd, q_loss_sig, c_loss_qcd, c_loss_sig, legend_signal_names, n_folds,\
legend_title=r'Anomaly signature', save_dir='../jupyter_plots', pic_id='test',
palette=colors, xlabel=r'$TPR$', ylabel=r'$FPR^{-1}$')
Example for the unsupervised kernel machine performance on different anomalies:
The metrics are calculated via sampling the circuit parameters from
three different distributions as depicted in the legends: the uniform
distribution in [0,2π], the QCD background data distribution, and the
signal (anomaly) scalar boson data distribution. (a) The expressibility
(Expr) as a function of the different circuit architectures. (b) The
entanglement capability of the data encoding circuit
Given a data encoding quantum circuit we can compute its expressibility
and entanglement capability. Additionaly, we can also compute, as
function of the number of qubits, the variance of the quantum kernel
that is constructed from the given quantum circuit. The different
properties of the quantum feature map and the corresponding quantum
kernel can be computed using the script
. The
desired computation can be chosen using the argparse
For instance, to compute the expressibility and entanglement capability of the circuits discussed in the paper run:
python --n_shots 10000 --n_exp 20 --out_path test --compute expr_ent_vs_circ
where n_shots
defines the number of fidelity samples to generate per
expressibility and entanglement capability evaluation, n_exp
is the
number of evaluations ('experiments') of the expressibility and
entanglement capability needed too estimate the mean and std of around
the true value. For more details please check the
repo of the triple_e
To compute the expressibility as a function of the number of the qubits in a data dependent setting (i.e. sampling the circuit parameters from a data distribution instead of the uniform in [0,2π]) run:
python --n_qubits 8 --n_shots 100000 --n_exp 20 --out_path test --compute expr_vs_qubits --data_path dataset1_path dataset2_path dataset3_path --data_dependent