CMS Open Data Dimuon Analysis - Student Lab Guide

Using Jupyter Notebook and CSV Data

Duration: 3 hours
Objective: Analyze real particle physics data from CERN to identify J/ψ, Υ, and Z boson resonances

Part 1: Setup (15 minutes)

Step 1: Install Required Packages

Open your terminal/command prompt and navigate to your working directory:

# Windows PowerShell
cd C:\Users\YourName\Downloads\physics_lab

# Mac/Linux
cd ~/Downloads/physics_lab

Activate your empty virtual environment, then install packages:

pip install jupyter numpy pandas matplotlib scipy

Step 2: Download the Dataset

Download the CSV file from CERN Open Data Portal:

Go to: https://opendata.cern.ch/record/5202
Download the CSV file (approximately 100,000 dimuon events)
Save it as DoubleMu.csv in your working directory

Step 3: Launch Jupyter Notebook

jupyter notebook

This will open Jupyter in your web browser. Create a new notebook called Dimuon_Analysis.ipynb.

Part 2: Data Loading and Exploration (20 minutes)

Cell 1: Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

# Set plot style
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

Run this cell (Shift + Enter).

Cell 2: Load the Data

# Load the CSV file
df = pd.read_csv("DoubleMu.csv")

print(f"Dataset loaded successfully!")
print(f"Total events: {len(df)}")
print(f"\nColumn names:")
print(df.columns.tolist())
print(f"\nFirst 5 events:")
df.head()

Expected output: You should see 100,000 events with columns including Run, Event, pt1, eta1, phi1, pt2, eta2, phi2, Q1, Q2, M.[2]

Cell 3: Understand the Data

# Display statistics
print("Data Summary:")
print(df[['pt1', 'pt2', 'eta1', 'eta2', 'M']].describe())

# Check for missing values
print(f"\nMissing values: {df.isnull().sum().sum()}")

Discussion Point:

pt = transverse momentum (GeV/c)
eta = pseudorapidity (direction)
phi = azimuthal angle (radians)
M = invariant mass (pre-calculated in CSV)

Part 3: Calculate Invariant Mass (30 minutes)

Cell 4: Extract Kinematic Variables

# Extract muon kinematic data
pt1 = df['pt1'].values
eta1 = df['eta1'].values
phi1 = df['phi1'].values

pt2 = df['pt2'].values
eta2 = df['eta2'].values
phi2 = df['phi2'].values

# Muon mass (PDG value)
mu_mass = 0.105658  # GeV/c²

print(f"Extracted kinematic data for {len(pt1)} events")

Cell 5: Convert to Cartesian Coordinates

# Muon 1: Convert (pt, eta, phi) → (px, py, pz)
px1 = pt1 * np.cos(phi1)
py1 = pt1 * np.sin(phi1)
pz1 = pt1 * np.sinh(eta1)
E1 = np.sqrt(px1**2 + py1**2 + pz1**2 + mu_mass**2)

# Muon 2: Convert (pt, eta, phi) → (px, py, pz)
px2 = pt2 * np.cos(phi2)
py2 = pt2 * np.sin(phi2)
pz2 = pt2 * np.sinh(eta2)
E2 = np.sqrt(px2**2 + py2**2 + pz2**2 + mu_mass**2)

print("Coordinate conversion complete")
print(f"Example - Event 0:")
print(f"  Muon 1: E={E1[0]:.3f} GeV, px={px1[0]:.3f}, py={py1[0]:.3f}, pz={pz1[0]:.3f}")
print(f"  Muon 2: E={E2[0]:.3f} GeV, px={px2[0]:.3f}, py={py2[0]:.3f}, pz={pz2[0]:.3f}")

Cell 6: Calculate Invariant Mass

# Sum four-vectors
E_sum = E1 + E2
Px_sum = px1 + px2
Py_sum = py1 + py2
Pz_sum = pz1 + pz2

# Calculate invariant mass: M² = E² - p²
mass2 = E_sum**2 - (Px_sum**2 + Py_sum**2 + Pz_sum**2)
inv_mass = np.sqrt(np.clip(mass2, 0, None))  # Ensure non-negative

print("Invariant mass calculation complete")
print(f"Mass range: {inv_mass.min():.2f} - {inv_mass.max():.2f} GeV/c²")
print(f"Mean mass: {inv_mass.mean():.2f} GeV/c²")
print(f"Median mass: {np.median(inv_mass):.2f} GeV/c²")

Discussion Point: The invariant mass formula $$ M^2 = E^2 - \vec{p}^2 $$ is Lorentz invariant - the same in all reference frames.

Part 4: Visualize the Full Spectrum (20 minutes)

Cell 7: Plot Full Mass Spectrum

plt.figure(figsize=(12, 6))
plt.hist(inv_mass, bins=200, range=(0, 12), histtype='step', 
         linewidth=1.5, color='blue', label='Data')

# Mark expected particle masses
plt.axvline(x=3.0969, color='red', linestyle='--', alpha=0.6, label='J/ψ (3.097 GeV)')
plt.axvline(x=9.4603, color='green', linestyle='--', alpha=0.6, label='Υ(1S) (9.460 GeV)')

plt.xlabel("Invariant mass (GeV/c²)", fontsize=13)
plt.ylabel("Events per bin", fontsize=13)
plt.title("Dimuon Invariant Mass Spectrum - CMS Open Data 2011", fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.savefig("full_spectrum.png", dpi=300)
plt.show()

Observation Questions:

Can you see the J/ψ peak around 3 GeV?
Can you see the Υ family peaks around 9-10 GeV?
Why does the background decrease at higher masses?

Part 5: Analyze J/ψ Meson (40 minutes)

Cell 8: Zoom into J/ψ Region

# Select events in J/ψ region
mask_jpsi = (inv_mass > 2.8) & (inv_mass < 3.4)
data_jpsi = inv_mass[mask_jpsi]

print(f"Events in J/ψ region (2.8-3.4 GeV): {len(data_jpsi)}")

# Plot
plt.figure(figsize=(10, 6))
plt.hist(data_jpsi, bins=60, range=(2.8, 3.4), histtype='step', 
         linewidth=1.5, color='blue', label='Data')
plt.axvline(x=3.0969, color='red', linestyle='--', alpha=0.6, label='PDG J/ψ mass')
plt.xlabel("Invariant mass (GeV/c²)", fontsize=13)
plt.ylabel("Events per bin", fontsize=13)
plt.title("J/ψ → μ⁺μ⁻ Resonance", fontsize=14, fontweight='bold')
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.savefig("jpsi_peak.png", dpi=300)
plt.show()

Cell 9: Fit J/ψ Peak with Gaussian

# Define fit function: Gaussian signal + linear background
def fit_func(x, amplitude, mean, sigma, bg_const, bg_slope):
    gaussian = amplitude * np.exp(-0.5 * ((x - mean) / sigma)**2)
    background = bg_const + bg_slope * x
    return gaussian + background

# Create histogram for fitting
hist, bin_edges = np.histogram(data_jpsi, bins=60, range=(2.8, 3.4))
bin_centers = (bin_edges[:-1] + bin_edges[1:]) / 2

# Initial parameter guesses: [amplitude, mean, sigma, bg_const, bg_slope]
p0 = [hist.max() - hist.min(), 3.097, 0.05, hist.min(), 0]

# Perform fit
params, covariance = curve_fit(fit_func, bin_centers, hist, p0=p0, maxfev=10000)
amp, mean, sigma, bg_c, bg_s = params
errors = np.sqrt(np.diag(covariance))

# Display results
print("=" * 50)
print("J/ψ PEAK FIT RESULTS")
print("=" * 50)
print(f"Fitted mass:  {mean:.4f} ± {errors[1]:.4f} GeV/c²")
print(f"PDG mass:     3.0969 GeV/c²")
print(f"Difference:   {abs(mean - 3.0969):.4f} GeV/c²")
print(f"Width (σ):    {sigma:.4f} ± {errors[2]:.4f} GeV/c²")
print(f"FWHM:         {2.355 * sigma:.4f} GeV/c²")
print(f"Signal yield: {amp * sigma * np.sqrt(2*np.pi):.0f} events")
print("=" * 50)

Cell 10: Plot Fitted J/ψ Peak

plt.figure(figsize=(10, 6))
plt.hist(data_jpsi, bins=60, range=(2.8, 3.4), histtype='step', 
         linewidth=1.5, color='blue', label='Data')

# Plot fit components
x_fit = np.linspace(2.8, 3.4, 300)
plt.plot(x_fit, fit_func(x_fit, *params), 'r-', linewidth=2, label='Total fit')
plt.plot(x_fit, bg_c + bg_s * x_fit, 'g--', linewidth=1.5, label='Background')

# Add text box with fit results
textstr = f'Mass = {mean:.4f} ± {errors[1]:.4f} GeV/c²\n'
textstr += f'σ = {sigma:.4f} GeV/c²\n'
textstr += f'FWHM = {2.355*sigma:.4f} GeV/c²'
plt.text(0.05, 0.95, textstr, transform=plt.gca().transAxes, 
         fontsize=11, verticalalignment='top', 
         bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

plt.xlabel("Invariant mass (GeV/c²)", fontsize=13)
plt.ylabel("Events per bin", fontsize=13)
plt.title("J/ψ → μ⁺μ⁻ with Gaussian + Background Fit", fontsize=14, fontweight='bold')
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.savefig("jpsi_fit.png", dpi=300)
plt.show()

Discussion Questions:

How close is your fitted mass to the PDG value (3.0969 GeV)?
What does the width (σ) tell us about detector resolution?
Why do we need a background term in the fit?

Part 6: Analyze Υ (Upsilon) Family (25 minutes)

Cell 11: Υ Family Resonances

# Select Υ region
mask_upsilon = (inv_mass > 9.0) & (inv_mass < 11.0)
data_upsilon = inv_mass[mask_upsilon]

print(f"Events in Υ region (9-11 GeV): {len(data_upsilon)}")

plt.figure(figsize=(12, 6))
plt.hist(data_upsilon, bins=100, range=(9.0, 11.0), histtype='step', 
         linewidth=1.5, color='blue', label='Data')

# Mark the three Υ states
plt.axvline(x=9.4603, color='red', linestyle='--', alpha=0.6, 
            linewidth=2, label='Υ(1S) - 9.460 GeV')
plt.axvline(x=10.0233, color='green', linestyle='--', alpha=0.6, 
            linewidth=2, label='Υ(2S) - 10.023 GeV')
plt.axvline(x=10.3552, color='orange', linestyle='--', alpha=0.6, 
            linewidth=2, label='Υ(3S) - 10.355 GeV')

plt.xlabel("Invariant mass (GeV/c²)", fontsize=13)
plt.ylabel("Events per bin", fontsize=13)
plt.title("Υ (Upsilon) Family: Bottomonium States", fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.savefig("upsilon_family.png", dpi=300)
plt.show()

Discussion Questions:

Can you distinguish the three Υ states?
Why are the Υ peaks narrower than the J/ψ peak?
What quark composition do Υ mesons have? (bottom-antibottom)

Part 7: Apply Selection Cuts (25 minutes)

Cell 12: Compare With and Without pT Cuts

# Apply momentum cut: both muons must have pT > 3 GeV
pt_threshold = 3.0
mask_cut = (pt1 > pt_threshold) & (pt2 > pt_threshold)

inv_mass_cut = inv_mass[mask_cut]

print(f"Events before cuts: {len(inv_mass)}")
print(f"Events after pT > {pt_threshold} GeV cut: {len(inv_mass_cut)}")
print(f"Selection efficiency: {len(inv_mass_cut)/len(inv_mass)*100:.1f}%")

# Side-by-side comparison
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

ax1.hist(inv_mass, bins=200, range=(0, 12), histtype='step', 
         linewidth=1.5, color='blue')
ax1.set_xlabel("Invariant mass (GeV/c²)", fontsize=12)
ax1.set_ylabel("Events per bin", fontsize=12)
ax1.set_title(f"No Selection Cuts\n({len(inv_mass)} events)", fontsize=13)
ax1.grid(alpha=0.3)

ax2.hist(inv_mass_cut, bins=200, range=(0, 12), histtype='step', 
         linewidth=1.5, color='red')
ax2.set_xlabel("Invariant mass (GeV/c²)", fontsize=12)
ax2.set_ylabel("Events per bin", fontsize=12)
ax2.set_title(f"With pT > {pt_threshold} GeV Cuts\n({len(inv_mass_cut)} events)", fontsize=13)
ax2.grid(alpha=0.3)

plt.tight_layout()
plt.savefig("selection_cuts_comparison.png", dpi=300)
plt.show()

Discussion Questions:

How does the signal-to-background ratio change with cuts?
Why do we lose events at low mass more than high mass?
What other cuts could improve the analysis?

Part 8: Search for Z Boson (20 minutes)

Cell 13: Z Boson Analysis

# Z boson region (60-120 GeV)
mask_z = (inv_mass > 60) & (inv_mass < 120)
data_z = inv_mass[mask_z]

print(f"Events in Z region (60-120 GeV): {len(data_z)}")

plt.figure(figsize=(10, 6))
plt.hist(data_z, bins=60, range=(60, 120), histtype='step', 
         linewidth=1.5, color='green')
plt.axvline(x=91.1876, color='red', linestyle='--', linewidth=2, 
            label='Z boson mass (91.19 GeV)')
plt.xlabel("Invariant mass (GeV/c²)", fontsize=13)
plt.ylabel("Events per bin", fontsize=13)
plt.title("Z Boson Search: Z → μ⁺μ⁻", fontsize=14, fontweight='bold')
plt.legend(fontsize=11)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.savefig("z_boson_search.png", dpi=300)
plt.show()

Discussion Questions:

Is there evidence for a Z boson peak?
Why are there fewer events at this mass range?
What is the significance of the Z boson in the Standard Model?

Part 9: Summary and Report (15 minutes)

Cell 14: Generate Summary Statistics

print("=" * 60)
print("ANALYSIS SUMMARY")
print("=" * 60)
print(f"Total events analyzed: {len(inv_mass):,}")
print(f"\nParticle Resonances Observed:")
print(f"  • J/ψ meson (~3.1 GeV)")
print(f"  • Υ(1S, 2S, 3S) family (~9-10 GeV)")
print(f"  • Z boson (~91 GeV)")
print(f"\nGenerated Plots:")
print(f"  1. full_spectrum.png")
print(f"  2. jpsi_peak.png")
print(f"  3. jpsi_fit.png")
print(f"  4. upsilon_family.png")
print(f"  5. selection_cuts_comparison.png")
print(f"  6. z_boson_search.png")
print("=" * 60)

Submission Requirements

Students should submit:

Completed Jupyter Notebook (Dimuon_Analysis.ipynb)
All generated plots (6 PNG files)
Short report (1-2 pages) discussing:
- Observed resonance peaks
- Fitted J/ψ mass and comparison to PDG value
- Effect of selection cuts
- Physical interpretation of results

Troubleshooting

Problem: CSV won't load
Solution: Check file path, ensure CSV is in same directory as notebook

Problem: Plots don't show
Solution: Make sure %matplotlib inline is in first cell

Problem: Fit fails
Solution: Check initial parameter guesses in p0 array

Problem: Import errors
Solution: Run pip install jupyter numpy pandas matplotlib scipy again

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Dimuon_Analysis.ipynb		Dimuon_Analysis.ipynb
DoubleMu.csv		DoubleMu.csv
README.md		README.md
instructions.pdf		instructions.pdf

hpawl/CERN-Lab

Folders and files

Latest commit

History

Repository files navigation

CMS Open Data Dimuon Analysis - Student Lab Guide

Using Jupyter Notebook and CSV Data

Part 1: Setup (15 minutes)

Step 1: Install Required Packages

Step 2: Download the Dataset

Step 3: Launch Jupyter Notebook

Part 2: Data Loading and Exploration (20 minutes)

Cell 1: Import Libraries

Cell 2: Load the Data

Cell 3: Understand the Data

Part 3: Calculate Invariant Mass (30 minutes)

Cell 4: Extract Kinematic Variables

Cell 5: Convert to Cartesian Coordinates

Cell 6: Calculate Invariant Mass

Part 4: Visualize the Full Spectrum (20 minutes)

Cell 7: Plot Full Mass Spectrum

Part 5: Analyze J/ψ Meson (40 minutes)

Cell 8: Zoom into J/ψ Region

Cell 9: Fit J/ψ Peak with Gaussian

Cell 10: Plot Fitted J/ψ Peak

Part 6: Analyze Υ (Upsilon) Family (25 minutes)

Cell 11: Υ Family Resonances

Part 7: Apply Selection Cuts (25 minutes)

Cell 12: Compare With and Without pT Cuts

Part 8: Search for Z Boson (20 minutes)

Cell 13: Z Boson Analysis

Part 9: Summary and Report (15 minutes)

Cell 14: Generate Summary Statistics

Submission Requirements

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages