A Python tool to identify and analyze activity cliffs — pairs of structurally similar compounds with large potency differences.
It takes as input an .xls or .xlsx file containing molecular data and automatically detects activity cliffs based on the differences in activity (pIC50) and chemical similarity (Tanimoto coefficient). It computes Tanimoto similarity, ΔpIC50, and disparity using multiple RDKit fingerprints, and supports filtering, Excel export with molecule images, and inline SVG visualization directly in Jupyter or VS Code.
This tool detects activity cliffs, defined as compound pairs (or groups) that share high structural similarity but exhibit large differences in potency.
It automates the generation of molecular pairs, similarity computation, filtering, visualization, and export.
📂 Input requirements
The input file must contain at least the following columns:
| Column | Description |
|---|---|
standardize_smiles |
Canonical SMILES string (used to compute fingerprints) |
PubMedID |
Identifier used to group compounds from the same publication |
pIC50 |
Biological activity value (−log₁₀(IC₅₀)) |
Additional columns (e.g. Compound, Target, etc.) are preserved in the output.
- 🧬 Supports 7 RDKit fingerprint types:
morgan,feature_morgan,rdk,maccs,pattern,atompair,torsion - Fully compatible with RDKit 2025.01+, but adapts the fingerprint generation for several RDKit versions
- 🔁 Pair generation grouped by
PubMedID(or other columns) - 📊 Flexible filtering by Tanimoto, ΔpIC50, and Disparity
- 💾 Excel export with embedded molecule images
- 🧠 Inline molecule preview (SVG) in VS Code / Jupyter
- Generates all compound pairs within the same group (default:
PubMedID) - Computes:
- Tanimoto similarity
- Activity difference (ΔpIC50)
- Disparity = {Delta pIC50}/(1 - Tanimoto)
- Handles identical molecules (
Tanimoto = 1) gracefully
You can filter the resulting pairs using any combination of:
tanimoto_min→ minimum similarityactivity_diff_min→ minimum ΔpIC50disparity_min→ minimum disparity value
Each filter is optional — if set to None, it is ignored.
- Creates an interactive scatter plot (
ΔpIC50vs.Tanimoto) using Plotly - One plot per selected fingerprint type
- Exports all results to
.xlsxformat with molecule images embedded (PNG) - Automatically adjusts row height and column width to match image size
- Configurable image size via
image_sizeparameter (e.g., 75, 100, 150 px)
show_molecule_table()displays molecules as inline SVGs directly in VS Code or Jupyter Notebook- Automatically limits drawing to the first n rows (default: 20) for performance
- Adjustable preview size with
img_size=(width, height)
It uses as input file a .xls file containing a dataset of SARS-CoV-2 M-Pro inhibitors from Macip G, Garcia-Segura P, Mestres-Truyol J, Saldivar-Espinoza B, Pujadas G, Garcia-Vallvé S. A Review of the Current Landscape of SARS-CoV-2 Main Protease Inhibitors: Have We Hit the Bullseye Yet? Int J Mol Sci. 2021 Dec 27;23(1):259. doi: 10.3390/ijms23010259 available at https://www.mdpi.com/article/10.3390/ijms23010259/s1
from activity_cliffs_utils import (
fp_as_bitvect, generate_pairs,
export_activity_cliffs_to_excel, show_molecule_table, mol_to_image_bytes, smiles_to_svg
)
from rdkit import Chem
import pandas as pd
import plotly.express as px
# 1️⃣ Load input data
df = pd.read_excel("M-pro_Inhibitors.xls")
# 2️⃣ Compute fingerprints (feature_morgan by default)
df["fp_feature_morgan"] = df["standardize_smiles"].apply(
lambda s: fp_as_bitvect(Chem.MolFromSmiles(s)) if Chem.MolFromSmiles(s) else None
)
# 3️⃣ Generate compound pairs (grouped by PubMedID)
result_df = generate_pairs(df, group_col="PubMedID")
# 4️⃣ Create the interactive chart
fig = px.scatter(
result_df,
x='Tanimoto',
y='pIC50_diff',
title='Activity difference vs Tanimoto Similarity',
labels={'Tanimoto': f'Tanimoto Similarity (feature_morgan)', 'pIC50_diff': 'Activity difference (ΔpIC50)'},
hover_data=['Compound1', 'Compound2', 'Disparity']
)
# Adjust the chart dimensions
fig.update_layout(
height=700 # You can change this value to adjust the height
)
# Show the graph
fig.show()
# 5️⃣ Apply filters
tanimoto_min = 0.8
activity_diff_min = 1.5
mask = (result_df["Tanimoto"] >= tanimoto_min) & (result_df["pIC50_diff"] >= activity_diff_min)
filtered_df = result_df[mask].copy()
# 6️⃣ Add molecule images for Excel export and preview
filtered_df["Mol1_img"] = filtered_df["SMILES1"].apply(lambda s: mol_to_image_bytes(s, (150,150)))
filtered_df["Mol2_img"] = filtered_df["SMILES2"].apply(lambda s: mol_to_image_bytes(s, (150,150)))
filtered_df["Mol1_svg"] = filtered_df["SMILES1"].apply(lambda s: smiles_to_svg(s, size=(120,120)))
filtered_df["Mol2_svg"] = filtered_df["SMILES2"].apply(lambda s: smiles_to_svg(s, size=(120,120)))
# 7️⃣ Export to Excel
export_activity_cliffs_to_excel(filtered_df, "disparity_results_feature_morgan.xlsx", image_size=150)
# 8️⃣ Display molecule preview
show_molecule_table(filtered_df, max_rows=30, img_size=(120,120))
🧾 Output Columns
| Column | Description |
|---|---|
PubMedID |
Group identifier |
Compound1, Compound2 |
Compound names |
pIC50_1, pIC50_2 |
Activity values |
pIC50_diff |
Absolute difference in activity |
Tanimoto |
Structural similarity |
Disparity |
Activity cliff magnitude |
Fingerprint |
Fingerprint type used |
Mol1_img, Mol2_img |
Molecule images (PNG, Excel export only) |
🧩 Highlights
✅ Compatible with RDKit 2025+
✅ Supports 7 fingerprint types
✅ Flexible filtering and visualization
✅ Integrated Excel export with molecule images
✅ Inline SVG previews for quick inspection
✅ Clear, English docstrings and maintainable code
Developed by Santi Garcia-Vallvé. Universitat Rovira i Virgili. Department of Biochemistry and Biotechnology. Cheminformatics and Nutrition (QiN) Research Group
Parts of the code structure and documentation were drafted with assistance from ChatGPT (OpenAI), under human supervision and subsequent review.
Licensed under the MIT License.