🔬 Activity Cliff Calculator

A Python tool to identify and analyze activity cliffs — pairs of structurally similar compounds with large potency differences.

It takes as input an .xls or .xlsx file containing molecular data and automatically detects activity cliffs based on the differences in activity (pIC50) and chemical similarity (Tanimoto coefficient). It computes Tanimoto similarity, ΔpIC50, and disparity using multiple RDKit fingerprints, and supports filtering, Excel export with molecule images, and inline SVG visualization directly in Jupyter or VS Code.

📘 Overview

This tool detects activity cliffs, defined as compound pairs (or groups) that share high structural similarity but exhibit large differences in potency.
It automates the generation of molecular pairs, similarity computation, filtering, visualization, and export.

📂 Input requirements

The input file must contain at least the following columns:

Column	Description
`standardize_smiles`	Canonical SMILES string (used to compute fingerprints)
`PubMedID`	Identifier used to group compounds from the same publication
`pIC50`	Biological activity value (−log₁₀(IC₅₀))

Additional columns (e.g. Compound, Target, etc.) are preserved in the output.

⚙️ Main features

🧬 Supports 7 RDKit fingerprint types:morgan, feature_morgan, rdk, maccs, pattern, atompair, torsion
Fully compatible with RDKit 2025.01+, but adapts the fingerprint generation for several RDKit versions
🔁 Pair generation grouped by PubMedID (or other columns)
📊 Flexible filtering by Tanimoto, ΔpIC50, and Disparity
💾 Excel export with embedded molecule images
🧠 Inline molecule preview (SVG) in VS Code / Jupyter

🔁 Pair Generation

Generates all compound pairs within the same group (default: PubMedID)
Computes:
- Tanimoto similarity
- Activity difference (ΔpIC50)
- Disparity = {Delta pIC50}/(1 - Tanimoto)
- Handles identical molecules (Tanimoto = 1) gracefully

🎚️ Filtering Options

You can filter the resulting pairs using any combination of:

tanimoto_min → minimum similarity
activity_diff_min → minimum ΔpIC50
disparity_min → minimum disparity value

Each filter is optional — if set to None, it is ignored.

📊 Visualization and Export

📈 Interactive Plot

Creates an interactive scatter plot (ΔpIC50 vs. Tanimoto) using Plotly
One plot per selected fingerprint type

💾 Excel Export

Exports all results to .xlsx format with molecule images embedded (PNG)
Automatically adjusts row height and column width to match image size
Configurable image size via image_size parameter (e.g., 75, 100, 150 px)

🧠 Inline Visualization

show_molecule_table() displays molecules as inline SVGs directly in VS Code or Jupyter Notebook
Automatically limits drawing to the first n rows (default: 20) for performance
Adjustable preview size with img_size=(width, height)

🧰 Example Workflow

It uses as input file a .xls file containing a dataset of SARS-CoV-2 M-Pro inhibitors from Macip G, Garcia-Segura P, Mestres-Truyol J, Saldivar-Espinoza B, Pujadas G, Garcia-Vallvé S. A Review of the Current Landscape of SARS-CoV-2 Main Protease Inhibitors: Have We Hit the Bullseye Yet? Int J Mol Sci. 2021 Dec 27;23(1):259. doi: 10.3390/ijms23010259 available at https://www.mdpi.com/article/10.3390/ijms23010259/s1

from activity_cliffs_utils import (
    fp_as_bitvect, generate_pairs,
    export_activity_cliffs_to_excel, show_molecule_table, mol_to_image_bytes, smiles_to_svg
)
from rdkit import Chem
import pandas as pd
import plotly.express as px


# 1️⃣ Load input data
df = pd.read_excel("M-pro_Inhibitors.xls")

# 2️⃣ Compute fingerprints (feature_morgan by default)
df["fp_feature_morgan"] = df["standardize_smiles"].apply(
    lambda s: fp_as_bitvect(Chem.MolFromSmiles(s)) if Chem.MolFromSmiles(s) else None
)

# 3️⃣ Generate compound pairs (grouped by PubMedID)
result_df = generate_pairs(df, group_col="PubMedID")

# 4️⃣  Create the interactive chart
fig = px.scatter(
    result_df,
    x='Tanimoto',
    y='pIC50_diff',
    title='Activity difference vs Tanimoto Similarity',
    labels={'Tanimoto': f'Tanimoto Similarity (feature_morgan)', 'pIC50_diff': 'Activity difference (ΔpIC50)'},
    hover_data=['Compound1', 'Compound2', 'Disparity']
)
# Adjust the chart dimensions
fig.update_layout(
    height=700  # You can change this value to adjust the height
)
# Show the graph
fig.show()

# 5️⃣ Apply filters
tanimoto_min = 0.8
activity_diff_min = 1.5
mask = (result_df["Tanimoto"] >= tanimoto_min) & (result_df["pIC50_diff"] >= activity_diff_min)
filtered_df = result_df[mask].copy()

# 6️⃣  Add molecule images for Excel export and preview
filtered_df["Mol1_img"] = filtered_df["SMILES1"].apply(lambda s: mol_to_image_bytes(s, (150,150)))
filtered_df["Mol2_img"] = filtered_df["SMILES2"].apply(lambda s: mol_to_image_bytes(s, (150,150)))
filtered_df["Mol1_svg"] = filtered_df["SMILES1"].apply(lambda s: smiles_to_svg(s, size=(120,120)))
filtered_df["Mol2_svg"] = filtered_df["SMILES2"].apply(lambda s: smiles_to_svg(s, size=(120,120)))

# 7️⃣  Export to Excel
export_activity_cliffs_to_excel(filtered_df, "disparity_results_feature_morgan.xlsx", image_size=150)

# 8️⃣ Display molecule preview
show_molecule_table(filtered_df, max_rows=30, img_size=(120,120))

🧾 Output Columns

Column	Description
`PubMedID`	Group identifier
`Compound1`, `Compound2`	Compound names
`pIC50_1`, `pIC50_2`	Activity values
`pIC50_diff`	Absolute difference in activity
`Tanimoto`	Structural similarity
`Disparity`	Activity cliff magnitude
`Fingerprint`	Fingerprint type used
`Mol1_img`, `Mol2_img`	Molecule images (PNG, Excel export only)

🧩 Highlights

✅ Compatible with RDKit 2025+

✅ Supports 7 fingerprint types

✅ Flexible filtering and visualization

✅ Integrated Excel export with molecule images

✅ Inline SVG previews for quick inspection

✅ Clear, English docstrings and maintainable code

🧑‍💻 Authors and License

Developed by Santi Garcia-Vallvé. Universitat Rovira i Virgili. Department of Biochemistry and Biotechnology. Cheminformatics and Nutrition (QiN) Research Group

Parts of the code structure and documentation were drafted with assistance from ChatGPT (OpenAI), under human supervision and subsequent review.

Licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Activity_Cliffs.ipynb		Activity_Cliffs.ipynb
LICENSE		LICENSE
M-pro_Inhibitors.xls		M-pro_Inhibitors.xls
README.md		README.md
activity_cliffs_utils.py		activity_cliffs_utils.py
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔬 Activity Cliff Calculator

A Python tool to identify and analyze activity cliffs — pairs of structurally similar compounds with large potency differences.

📘 Overview

⚙️ Main features

🔁 Pair Generation

🎚️ Filtering Options

📊 Visualization and Export

📈 Interactive Plot

💾 Excel Export

🧠 Inline Visualization

🧰 Example Workflow

🧑‍💻 Authors and License

About

Uh oh!

Releases

Packages

Languages

License

URV-cheminformatics/Activity_Cliffs_Calculator

Folders and files

Latest commit

History

Repository files navigation

🔬 Activity Cliff Calculator

A Python tool to identify and analyze activity cliffs — pairs of structurally similar compounds with large potency differences.

📘 Overview

⚙️ Main features

🔁 Pair Generation

🎚️ Filtering Options

📊 Visualization and Export

📈 Interactive Plot

💾 Excel Export

🧠 Inline Visualization

🧰 Example Workflow

🧑‍💻 Authors and License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages