Skip to content

Commit

Permalink
feat(feat:-refactor-api-and-add-data-generator-for-multi-backend-supp…
Browse files Browse the repository at this point in the history
…ort): feat: refactor core api and add systematic data generator

- implement new core api to support multiple backends (pandas, polars, modin)
- add synthetic_data_generator for systematic testing across backends
- refactor core modules: core_utils, exceptions, temporal_data_loader, temporal_target_shifter
- add new temporal_core_processing module
- restructure and update test files to align with new api design
- enhance functionality to support both single-step and multi-step operations
- update pyproject.toml to reflect new structure and dependencies
- merged changes from main branch to integrate latest updates and resolve conflicts
  • Loading branch information
philip-ndikum committed Oct 6, 2024
2 parents b3320d0 + 766d0a6 commit 803024a
Show file tree
Hide file tree
Showing 11 changed files with 114 additions and 119 deletions.
15 changes: 3 additions & 12 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,12 @@ version: 2
updates:
# Pip dependencies (via pyproject.toml)
- package-ecosystem: "pip"
directory: "/" # Root directory where your pyproject.toml is located
directory: "/"
schedule:
interval: "weekly" # Check for updates weekly
open-pull-requests-limit: 5

# Conda dependencies
- package-ecosystem: "conda"
directory: "/" # If your conda environment.yml is in the root directory
schedule:
interval: "weekly" # Check for updates weekly
open-pull-requests-limit: 5
interval: "weekly"

# GitHub Actions (for any workflows using specific actions or dependencies)
- package-ecosystem: "github-actions"
directory: "/" # Root directory of the repository
directory: "/"
schedule:
interval: "weekly"
open-pull-requests-limit: 5
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,6 @@ jobs:
# Ensure that the coverage is only uploaded once (if statement)
- name: Upload coverage to codecov.io
if: matrix.os == 'ubuntu-latest' && matrix.python-version == '3.11'
uses: codecov/codecov-action@v4.2.0
uses: codecov/codecov-action@v4.5.0
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
14 changes: 5 additions & 9 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,15 +22,11 @@ repos:
args: [--markdown-linebreak-ext=md]

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.6.5
rev: v0.6.8
hooks:
- id: ruff
# Exclude tests and tutorials
exclude: "^(test/|tutorial_notebooks/)"
# No args needed, uses pyproject.toml settings
- id: ruff-format
args: ["--line-length=120"]
# No need for --ignore options here, as ruff-format is for applying automatic fixes.

- repo: https://github.com/codespell-project/codespell
rev: v2.3.0
Expand All @@ -42,12 +38,12 @@ repos:


- repo: https://github.com/rhysd/actionlint
rev: v1.7.1
rev: v1.7.3
hooks:
- id: actionlint

- repo: https://github.com/PyCQA/bandit
rev: '1.7.9'
rev: '1.7.10'
hooks:
- id: bandit
args: ["-c", "pyproject.toml"]
Expand All @@ -71,12 +67,12 @@ repos:
- id: shellcheck

- repo: https://github.com/gitleaks/gitleaks
rev: v8.19.2
rev: v8.19.3
hooks:
- id: gitleaks

- repo: https://github.com/commitizen-tools/commitizen
rev: v3.29.0
rev: v3.29.1
hooks:
- id: commitizen
- id: commitizen-branch
Expand Down
22 changes: 16 additions & 6 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -1,10 +1,20 @@
# Read the Docs configuration file for Sphinx projects
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

version: 2

build:
os: ubuntu-22.04
tools:
python: "3.12"
commands:
- pip install hatch
- hatch shell docs
- cd docs
- hatch run docs:readthedocs
python: "3.10"

python:
install:
- method: pip
path: .
extra_requirements:
- docs

sphinx:
configuration: docs/conf.py
fail_on_warning: true
34 changes: 32 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,25 @@
## Table of Contents
- [Contributing to TemporalScope](#contributing-to-temporalscope)
- [Contribution Guidelines](#contribution-guidelines)
- [How to Contribute to TemporalScope](#how-to-contribute-to-temporalscope)
- [Issue Tracking](#issue-tracking)
- [Contributing Code](#contributing-code)
- [Fork the Repository](#fork-the-repository)
- [Setup Development Environment](#setup-development-environment)
- [Install Pre-commit Hooks](#install-pre-commit-hooks)
- [Create a New Branch](#create-a-new-branch)
- [Make Your Changes](#make-your-changes)
- [Ensure Code Quality](#ensure-code-quality)
- [Commit Your Changes](#commit-your-changes)
- [Submit a Pull Request](#submit-a-pull-request)
- [After Submitting](#after-submitting)
- [Documentation](#documentation)
- [Test Policy](#test-policy)
- [Development Roadmap \& Changelog](#development-roadmap--changelog)
- [Workflow for Releasing New Versions](#workflow-for-releasing-new-versions)
- [Code Style](#code-style)
- [Reporting Issues \& Requesting Features](#reporting-issues--requesting-features)

# Contributing to TemporalScope

Thank you for your interest in contributing to TemporalScope! Contributions of all kinds are welcome and appreciated.
Expand Down Expand Up @@ -113,10 +135,13 @@ TemporalScope employs various tools to maintain consistent code style, quality,

Before submitting your changes, perform the following steps:

1. Run the test suite and type checking:
1. Run the test suite:

```console
$ hatch run test:type
$ hatch run test:unit
```
```console
$ hatch run test:integration
```

2. Check your code format:
Expand All @@ -143,6 +168,11 @@ $ hatch run check
$ hatch run fix
```

6. All checks (managed by pre-commit)
```console
$ hatch run quality-assurance
```

> [!NOTE]
> Running these checks locally will help identify and resolve issues before submitting your changes, streamlining the review process.
Expand Down
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
</p>

---

<!-- SPHINX-START -->

<div align="center">
Expand Down Expand Up @@ -60,6 +61,7 @@
<td>
<a href="https://results.pre-commit.ci/latest/github/philip-ndikum/TemporalScope/main"><img src="https://results.pre-commit.ci/badge/github/philip-ndikum/TemporalScope/main.svg" alt="pre-commit.ci status"></a><br>
<a href="https://codecov.io/gh/philip-ndikum/TemporalScope"><img src="https://codecov.io/gh/philip-ndikum/TemporalScope/branch/main/graph/badge.svg" alt="codecov"></a>
<a href="https://github.com/philip-ndikum/TemporalScope/actions/workflows/test.yml"> <img src="https://github.com/philip-ndikum/TemporalScope/actions/workflows/test.yml/badge.svg"></a>
</td>
<td>
<a href="https://www.bestpractices.dev/projects/9424"><img src="https://www.bestpractices.dev/projects/9424/badge" alt="OpenSSF Best Practices"></a><br>
Expand All @@ -71,10 +73,11 @@
</div>

---
**TemporalScope** is an open-source Python package designed to bridge the gap between scientific research and practical industry applications for analyzing the temporal dynamics of feature importance in AI & ML time series models. Developed in alignment with Linux Foundation standards and licensed under Apache 2.0, it builds on tools such as Boruta-SHAP and SHAP, using modern window partitioning algorithms to tackle challenges like non-stationarity and concept drift. The tool is flexible and extensible, allowing for bespoke enhancements and algorithms, and supports frameworks like Pandas, Polars, and Modin. Additionally, the optional *Clara LLM* modules (etymology from the word _Clarity_) are intended to serve as a model-validation tool to support explainability efforts (XAI). **Note**: TemporalScope is currently in **beta and pre-release** phase so some installation methods may not work as expected on all platforms. Please check the `CONTRIBUTIONS.md` for the full roadmap.

**TemporalScope** is an open-source Python package designed to bridge the gap between scientific research and practical industry applications for analyzing the temporal dynamics of feature importance in AI & ML time series models. Developed in alignment with Linux Foundation standards and licensed under Apache 2.0, it builds on tools such as Boruta-SHAP and SHAP, using modern window partitioning algorithms to tackle challenges like non-stationarity and concept drift. The tool is flexible and extensible, allowing for bespoke enhancements and algorithms, and supports frameworks like Pandas, Polars, and Modin. Additionally, the optional _Clara LLM_ modules (etymology from the word _Clarity_) are intended to serve as a model-validation tool to support explainability efforts (XAI). **Note**: TemporalScope is currently in **beta and pre-release** phase so some installation methods may not work as expected on all platforms. Please check the `CONTRIBUTIONS.md` for the full roadmap.

<!-- SPHINX-END -->
---

### **Table of Contents**

- [**Installation**](#installation)
Expand Down
5 changes: 0 additions & 5 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,6 @@

import datetime
import importlib.metadata
import os
import sys

sys.path.insert(0, os.path.abspath("../.."))

project = "TemporalScope"
author = "Philip Ndikum, Serge Ndikum, Kane Norman"
Expand Down Expand Up @@ -63,7 +59,6 @@
autosummary_generate = True

html_theme = "pydata_sphinx_theme"
html_static_path = ["_static"]
html_title = project
html_theme_options = {
"icon_links": [
Expand Down
59 changes: 24 additions & 35 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ description = "TemporalScope: Model-Agnostic Temporal Feature Importance Analysi
authors = [
{ name = "Philip Ndikum", email = "philip-ndikum@users.noreply.github.com" },
{ name = "Serge Ndikum" },
{ name = "Kane Norman" },
{ name = "Kane Norman", email = "kanenorman@fas.harvard.edu" },
]
license = "Apache-2.0"
readme = "README.md"
Expand Down Expand Up @@ -54,28 +54,8 @@ keywords = [
"Time-Series",
]

[project.urls]
"Source code" = "https://github.com/philip-ndikum/TemporalScope"
Documentation = "https://temporalscope.readthedocs.io/en/latest/"

[tool.hatch.envs.default]
dependencies = [
"pre-commit",
"ruff",
"jupyterlab",
"notebook",
"commitizen==3.29.0",
"mypy", # Include dependencies for QA scripts
"bandit", # Include dependencies for QA scripts
"black", # Include dependencies for QA scripts
"pytest", # Include pytest for running tests
"pytest-cov", # Include pytest-cov for coverage if needed
"docformatter", # Add docformatter for docstring formatting
"commitizen",
]

[tool.hatch.envs.docs]
dependencies = [
[project.optional-dependencies]
docs = [
"pydata-sphinx-theme",
"myst-parser",
"sphinx >=4.0",
Expand All @@ -85,14 +65,23 @@ dependencies = [
'sphinx-autoapi',
]

[tool.hatch.envs.docs.scripts]
build = "sphinx-build -WTb html . _build"
readthedocs = "sphinx-build -WTb html . $READTHEDOCS_OUTPUT/html"
serve = "python -m http.server --directory _build"
[project.urls]
"Source code" = "https://github.com/philip-ndikum/TemporalScope"
Documentation = "https://temporalscope.readthedocs.io/en/latest/"

[tool.hatch.envs.default]
dependencies = ["pre-commit", "ruff", "jupyterlab", "notebook", "commitizen"]

[tool.hatch.envs.docs]
features = ["docs"]

[tool.hatch.envs.test]
extra-dependencies = ["pytest", "pytest-cov", "pytest-custom_exit_code", "pytest-mock"]

[tool.hatch.envs.docs.scripts]
build = "sphinx-build -WTb html . _build"
serve = "python -m http.server --directory _build"

[tool.hatch.envs.test.scripts]
unit = 'pytest --cov-report xml:coverage.xml --cov="temporalscope" -m "not integration" {args:test}'
integration = 'pytest --cov-report xml:coverage.xml --cov="temporalscope" -m "integration" {args:test}'
Expand All @@ -107,13 +96,13 @@ log_date_format = "%Y-%m-%d %H:%M:%S"
minversion = "6.0"
filterwarnings = "ignore"

[tool.black]
line-length = 120 # Set Black's line length to 120 for consistency
[tool.ruff.format]
docstring-code-format = true

[tool.ruff]
extend-exclude = ["*.pyc", "test/*", "tutorial_notebooks/*"]
extend-exclude = ["*.pyc", "tutorial_notebooks/*"]
target-version = "py310"
line-length = 120 # Consistent line length across all tools
line-length = 120

[tool.ruff.lint]
select = [
Expand Down Expand Up @@ -141,10 +130,10 @@ select = [
]

ignore = [
"D400", # Ignore "First line should end with a period" for docstrings.
"D401", # Ignore "First line should be in imperative mood" for docstrings.
"D415", # Ignore "First line should end with a period, question mark, or exclamation point."
"E501", # Ignore "Line too long" in docstrings/comments for exceeding 120 characters.
"D400", # Ignore "First line should end with a period" for docstrings.
"D401", # Ignore "First line should be in imperative mood" for docstrings.
"D415", # Ignore "First line should end with a period, question mark, or exclamation point."
"E501", # Ignore "Line too long" in docstrings/comments for exceeding 120 characters.
"PERF203", # `try`-`except` within a loop incurs performance overhead
"PERF401", # Use a list comprehension to create a transformed list
"PLR1714", # repeated-equality-comparison
Expand Down
8 changes: 5 additions & 3 deletions src/temporalscope/metrics/masv.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,12 @@
import pandas as pd
from shap import Explainer

from temporalscope.partition.base import BaseTemporalPartitioner
from temporalscope.partition.base_protocol import TemporalPartitionerProtocol


def calculate_masv(model: Callable, data: pd.DataFrame, partitioner: BaseTemporalPartitioner) -> dict[str, list[float]]:
def calculate_masv(
model: Callable, data: pd.DataFrame, partitioner: TemporalPartitionerProtocol
) -> dict[str, list[float]]:
r"""Calculate Mean Absolute SHAP Values (MASV).
Calculate MASV for temporal feature importance across partitions.
Expand All @@ -45,7 +47,7 @@ def calculate_masv(model: Callable, data: pd.DataFrame, partitioner: BaseTempora
:type data: pd.DataFrame
:param partitioner: The partitioner object to divide the data into phases.
:type partitioner: BaseTemporalPartitioner
:type partitioner: TemporalPartitionerProtocol
:return: A dictionary where keys are feature names and values are lists of
MASV scores across partitions.
Expand Down
26 changes: 10 additions & 16 deletions src/temporalscope/partition/base_protocol.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
and memory efficiency.
"""

from typing import Dict, Iterator, Protocol, Tuple, Union
from typing import Any, Dict, Iterator, Protocol, Tuple, Union

import modin.pandas as mpd
import pandas as pd
Expand Down Expand Up @@ -102,18 +102,8 @@ def fit(
.. code-block:: python
{
"partition_1": {
"full": (0, 10),
"train": (0, 8),
"test": (8, 10),
"validation": None
},
"partition_2": {
"full": (5, 15),
"train": (5, 13),
"test": (13, 15),
"validation": None
}
"partition_1": {"full": (0, 10), "train": (0, 8), "test": (8, 10), "validation": None},
"partition_2": {"full": (5, 15), "train": (5, 13), "test": (13, 15), "validation": None},
}
"""
pass
Expand Down Expand Up @@ -151,14 +141,14 @@ def transform(
"full": DataFrame(...),
"train": DataFrame(...),
"test": DataFrame(...),
"validation": None
"validation": None,
},
"partition_2": {
"full": DataFrame(...),
"train": DataFrame(...),
"test": DataFrame(...),
"validation": None
}
"validation": None,
},
}
"""
pass
Expand All @@ -170,3 +160,7 @@ def check_data(self) -> None:
checking for window overlaps, or validating the feature count.
"""
pass

def get_partition_data(self) -> Any:
"""Return the partitioned data."""
pass
Loading

0 comments on commit 803024a

Please sign in to comment.