feat(feat:-refactor-api-and-add-data-generator-for-multi-backend-supp…

…ort): feat: refactor core api and add systematic data generator - implement new core api to support multiple backends (pandas, polars, modin) - add synthetic_data_generator for systematic testing across backends - refactor core modules: core_utils, exceptions, temporal_data_loader, temporal_target_shifter - add new temporal_core_processing module - restructure and update test files to align with new api design - enhance functionality to support both single-step and multi-step operations - update pyproject.toml to reflect new structure and dependencies - merged changes from main branch to integrate latest updates and resolve conflicts
philip-ndikum · Oct 6, 2024 · 803024a · 803024a
2 parents b3320d0 + 766d0a6
commit 803024a
Show file tree

Hide file tree

Showing 11 changed files with 114 additions and 119 deletions.
diff --git a/.github/dependabot.yml b/.github/dependabot.yml
@@ -2,21 +2,12 @@ version: 2
 updates:
   # Pip dependencies (via pyproject.toml)
   - package-ecosystem: "pip"
-    directory: "/" # Root directory where your pyproject.toml is located
+    directory: "/"
     schedule:
-      interval: "weekly" # Check for updates weekly
-    open-pull-requests-limit: 5
-
-  # Conda dependencies
-  - package-ecosystem: "conda"
-    directory: "/" # If your conda environment.yml is in the root directory
-    schedule:
-      interval: "weekly" # Check for updates weekly
-    open-pull-requests-limit: 5
+      interval: "weekly"
 
   # GitHub Actions (for any workflows using specific actions or dependencies)
   - package-ecosystem: "github-actions"
-    directory: "/" # Root directory of the repository
+    directory: "/"
     schedule:
       interval: "weekly"
-    open-pull-requests-limit: 5
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -46,6 +46,6 @@ jobs:
       # Ensure that the coverage is only uploaded once (if statement)
       - name: Upload coverage to codecov.io
         if: matrix.os == 'ubuntu-latest' && matrix.python-version == '3.11'
-        uses: codecov/codecov-action@v4.2.0
+        uses: codecov/codecov-action@v4.5.0
         env:
           CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -22,15 +22,11 @@ repos:
         args: [--markdown-linebreak-ext=md]
 
   - repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: v0.6.5
+    rev: v0.6.8
     hooks:
       - id: ruff
         # Exclude tests and tutorials
-        exclude: "^(test/|tutorial_notebooks/)"
-        # No args needed, uses pyproject.toml settings
       - id: ruff-format
-        args: ["--line-length=120"]
-        # No need for --ignore options here, as ruff-format is for applying automatic fixes.
 
   - repo: https://github.com/codespell-project/codespell
     rev: v2.3.0
@@ -42,12 +38,12 @@ repos:
 
 
   - repo: https://github.com/rhysd/actionlint
-    rev: v1.7.1
+    rev: v1.7.3
     hooks:
       - id: actionlint
 
   - repo: https://github.com/PyCQA/bandit
-    rev: '1.7.9'
+    rev: '1.7.10'
     hooks:
     - id: bandit
       args: ["-c", "pyproject.toml"]
@@ -71,12 +67,12 @@ repos:
       - id: shellcheck
 
   - repo: https://github.com/gitleaks/gitleaks
-    rev: v8.19.2
+    rev: v8.19.3
     hooks:
       - id: gitleaks
 
   - repo: https://github.com/commitizen-tools/commitizen
-    rev: v3.29.0
+    rev: v3.29.1
     hooks:
       - id: commitizen
       - id: commitizen-branch

diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -1,10 +1,20 @@
+# Read the Docs configuration file for Sphinx projects
+# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
+
 version: 2
+
 build:
   os: ubuntu-22.04
   tools:
-    python: "3.12"
-  commands:
-    - pip install hatch
-    - hatch shell docs
-    - cd docs
-    - hatch run docs:readthedocs
+    python: "3.10"
+
+python:
+  install:
+    - method: pip
+      path: .
+      extra_requirements:
+        - docs
+
+sphinx:
+  configuration: docs/conf.py
+  fail_on_warning: true
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,3 +1,25 @@
+## Table of Contents
+- [Contributing to TemporalScope](#contributing-to-temporalscope)
+  - [Contribution Guidelines](#contribution-guidelines)
+- [How to Contribute to TemporalScope](#how-to-contribute-to-temporalscope)
+- [Issue Tracking](#issue-tracking)
+- [Contributing Code](#contributing-code)
+  - [Fork the Repository](#fork-the-repository)
+  - [Setup Development Environment](#setup-development-environment)
+  - [Install Pre-commit Hooks](#install-pre-commit-hooks)
+  - [Create a New Branch](#create-a-new-branch)
+  - [Make Your Changes](#make-your-changes)
+  - [Ensure Code Quality](#ensure-code-quality)
+  - [Commit Your Changes](#commit-your-changes)
+  - [Submit a Pull Request](#submit-a-pull-request)
+  - [After Submitting](#after-submitting)
+  - [Documentation](#documentation)
+  - [Test Policy](#test-policy)
+  - [Development Roadmap \& Changelog](#development-roadmap--changelog)
+  - [Workflow for Releasing New Versions](#workflow-for-releasing-new-versions)
+  - [Code Style](#code-style)
+  - [Reporting Issues \& Requesting Features](#reporting-issues--requesting-features)
+
 # Contributing to TemporalScope
 
 Thank you for your interest in contributing to TemporalScope! Contributions of all kinds are welcome and appreciated.
@@ -113,10 +135,13 @@ TemporalScope employs various tools to maintain consistent code style, quality,
 
 Before submitting your changes, perform the following steps:
 
-1. Run the test suite and type checking:
+1. Run the test suite:
 
 ```console
-$ hatch run test:type
+$ hatch run test:unit
+```
+```console
+$ hatch run test:integration
 ```
 
 2. Check your code format:
@@ -143,6 +168,11 @@ $ hatch run check
 $ hatch run fix
 ```
 
+6. All checks (managed by pre-commit)
+```console
+$ hatch run quality-assurance
+```
+
 > [!NOTE]
 > Running these checks locally will help identify and resolve issues before submitting your changes, streamlining the review process.
 

diff --git a/README.md b/README.md
@@ -27,6 +27,7 @@
 </p>
 
 ---
+
 <!-- SPHINX-START -->
 
 <div align="center">
@@ -60,6 +61,7 @@
         <td>
           <a href="https://results.pre-commit.ci/latest/github/philip-ndikum/TemporalScope/main"><img src="https://results.pre-commit.ci/badge/github/philip-ndikum/TemporalScope/main.svg" alt="pre-commit.ci status"></a><br>
           <a href="https://codecov.io/gh/philip-ndikum/TemporalScope"><img src="https://codecov.io/gh/philip-ndikum/TemporalScope/branch/main/graph/badge.svg" alt="codecov"></a>
+          <a href="https://github.com/philip-ndikum/TemporalScope/actions/workflows/test.yml"> <img src="https://github.com/philip-ndikum/TemporalScope/actions/workflows/test.yml/badge.svg"></a>
         </td>
         <td>
           <a href="https://www.bestpractices.dev/projects/9424"><img src="https://www.bestpractices.dev/projects/9424/badge" alt="OpenSSF Best Practices"></a><br>
@@ -71,10 +73,11 @@
 </div>
 
 ---
-**TemporalScope** is an open-source Python package designed to bridge the gap between scientific research and practical industry applications for analyzing the temporal dynamics of feature importance in AI & ML time series models. Developed in alignment with Linux Foundation standards and licensed under Apache 2.0, it builds on tools such as Boruta-SHAP and SHAP, using modern window partitioning algorithms to tackle challenges like non-stationarity and concept drift. The tool is flexible and extensible, allowing for bespoke enhancements and algorithms, and supports frameworks like Pandas, Polars, and Modin. Additionally, the optional *Clara LLM* modules (etymology from the word _Clarity_) are intended to serve as a model-validation tool to support explainability efforts (XAI). **Note**: TemporalScope is currently in **beta and pre-release** phase so some installation methods may not work as expected on all platforms. Please check the `CONTRIBUTIONS.md` for the full roadmap.
+
+**TemporalScope** is an open-source Python package designed to bridge the gap between scientific research and practical industry applications for analyzing the temporal dynamics of feature importance in AI & ML time series models. Developed in alignment with Linux Foundation standards and licensed under Apache 2.0, it builds on tools such as Boruta-SHAP and SHAP, using modern window partitioning algorithms to tackle challenges like non-stationarity and concept drift. The tool is flexible and extensible, allowing for bespoke enhancements and algorithms, and supports frameworks like Pandas, Polars, and Modin. Additionally, the optional _Clara LLM_ modules (etymology from the word _Clarity_) are intended to serve as a model-validation tool to support explainability efforts (XAI). **Note**: TemporalScope is currently in **beta and pre-release** phase so some installation methods may not work as expected on all platforms. Please check the `CONTRIBUTIONS.md` for the full roadmap.
 
 <!-- SPHINX-END -->
----
+
 ### **Table of Contents**
 
 - [**Installation**](#installation)

diff --git a/docs/conf.py b/docs/conf.py
@@ -24,10 +24,6 @@
 
 import datetime
 import importlib.metadata
-import os
-import sys
-
-sys.path.insert(0, os.path.abspath("../.."))
 
 project = "TemporalScope"
 author = "Philip Ndikum, Serge Ndikum, Kane Norman"
@@ -63,7 +59,6 @@
 autosummary_generate = True
 
 html_theme = "pydata_sphinx_theme"
-html_static_path = ["_static"]
 html_title = project
 html_theme_options = {
     "icon_links": [

diff --git a/pyproject.toml b/pyproject.toml
@@ -9,7 +9,7 @@ description = "TemporalScope: Model-Agnostic Temporal Feature Importance Analysi
 authors = [
     { name = "Philip Ndikum", email = "philip-ndikum@users.noreply.github.com" },
     { name = "Serge Ndikum" },
-    { name = "Kane Norman" },
+    { name = "Kane Norman", email = "kanenorman@fas.harvard.edu" },
 ]
 license = "Apache-2.0"
 readme = "README.md"
@@ -54,28 +54,8 @@ keywords = [
     "Time-Series",
 ]
 
-[project.urls]
-"Source code" = "https://github.com/philip-ndikum/TemporalScope"
-Documentation = "https://temporalscope.readthedocs.io/en/latest/"
-
-[tool.hatch.envs.default]
-dependencies = [
-    "pre-commit",
-    "ruff",
-    "jupyterlab",
-    "notebook",
-    "commitizen==3.29.0",
-    "mypy",               # Include dependencies for QA scripts
-    "bandit",             # Include dependencies for QA scripts
-    "black",              # Include dependencies for QA scripts
-    "pytest",             # Include pytest for running tests
-    "pytest-cov",         # Include pytest-cov for coverage if needed
-    "docformatter",       # Add docformatter for docstring formatting
-    "commitizen",
-]
-
-[tool.hatch.envs.docs]
-dependencies = [
+[project.optional-dependencies]
+docs = [
     "pydata-sphinx-theme",
     "myst-parser",
     "sphinx >=4.0",
@@ -85,14 +65,23 @@ dependencies = [
     'sphinx-autoapi',
 ]
 
-[tool.hatch.envs.docs.scripts]
-build = "sphinx-build -WTb html . _build"
-readthedocs = "sphinx-build -WTb html . $READTHEDOCS_OUTPUT/html"
-serve = "python -m http.server --directory _build"
+[project.urls]
+"Source code" = "https://github.com/philip-ndikum/TemporalScope"
+Documentation = "https://temporalscope.readthedocs.io/en/latest/"
+
+[tool.hatch.envs.default]
+dependencies = ["pre-commit", "ruff", "jupyterlab", "notebook", "commitizen"]
+
+[tool.hatch.envs.docs]
+features = ["docs"]
 
 [tool.hatch.envs.test]
 extra-dependencies = ["pytest", "pytest-cov", "pytest-custom_exit_code", "pytest-mock"]
 
+[tool.hatch.envs.docs.scripts]
+build = "sphinx-build -WTb html . _build"
+serve = "python -m http.server --directory _build"
+
 [tool.hatch.envs.test.scripts]
 unit = 'pytest --cov-report xml:coverage.xml --cov="temporalscope" -m "not integration" {args:test}'
 integration = 'pytest --cov-report xml:coverage.xml --cov="temporalscope" -m "integration" {args:test}'
@@ -107,13 +96,13 @@ log_date_format = "%Y-%m-%d %H:%M:%S"
 minversion = "6.0"
 filterwarnings = "ignore"
 
-[tool.black]
-line-length = 120  # Set Black's line length to 120 for consistency
+[tool.ruff.format]
+docstring-code-format = true
 
 [tool.ruff]
-extend-exclude = ["*.pyc", "test/*", "tutorial_notebooks/*"]
+extend-exclude = ["*.pyc", "tutorial_notebooks/*"]
 target-version = "py310"
-line-length = 120  # Consistent line length across all tools
+line-length = 120
 
 [tool.ruff.lint]
 select = [
@@ -141,10 +130,10 @@ select = [
 ]
 
 ignore = [
-    "D400",  # Ignore "First line should end with a period" for docstrings.
-    "D401",  # Ignore "First line should be in imperative mood" for docstrings.
-    "D415",  # Ignore "First line should end with a period, question mark, or exclamation point."
-    "E501",  # Ignore "Line too long" in docstrings/comments for exceeding 120 characters.
+    "D400",    # Ignore "First line should end with a period" for docstrings.
+    "D401",    # Ignore "First line should be in imperative mood" for docstrings.
+    "D415",    # Ignore "First line should end with a period, question mark, or exclamation point."
+    "E501",    # Ignore "Line too long" in docstrings/comments for exceeding 120 characters.
     "PERF203", # `try`-`except` within a loop incurs performance overhead
     "PERF401", # Use a list comprehension to create a transformed list
     "PLR1714", # repeated-equality-comparison

diff --git a/src/temporalscope/metrics/masv.py b/src/temporalscope/metrics/masv.py
@@ -29,10 +29,12 @@
 import pandas as pd
 from shap import Explainer
 
-from temporalscope.partition.base import BaseTemporalPartitioner
+from temporalscope.partition.base_protocol import TemporalPartitionerProtocol
 
 
-def calculate_masv(model: Callable, data: pd.DataFrame, partitioner: BaseTemporalPartitioner) -> dict[str, list[float]]:
+def calculate_masv(
+    model: Callable, data: pd.DataFrame, partitioner: TemporalPartitionerProtocol
+) -> dict[str, list[float]]:
     r"""Calculate Mean Absolute SHAP Values (MASV).
 
     Calculate MASV for temporal feature importance across partitions.
@@ -45,7 +47,7 @@ def calculate_masv(model: Callable, data: pd.DataFrame, partitioner: BaseTempora
     :type data: pd.DataFrame
 
     :param partitioner: The partitioner object to divide the data into phases.
-    :type partitioner: BaseTemporalPartitioner
+    :type partitioner: TemporalPartitionerProtocol
 
     :return: A dictionary where keys are feature names and values are lists of
         MASV scores across partitions.

diff --git a/src/temporalscope/partition/base_protocol.py b/src/temporalscope/partition/base_protocol.py
@@ -37,7 +37,7 @@
 and memory efficiency.
 """
 
-from typing import Dict, Iterator, Protocol, Tuple, Union
+from typing import Any, Dict, Iterator, Protocol, Tuple, Union
 
 import modin.pandas as mpd
 import pandas as pd
@@ -102,18 +102,8 @@ def fit(
             .. code-block:: python
 
                 {
-                    "partition_1": {
-                        "full": (0, 10),
-                        "train": (0, 8),
-                        "test": (8, 10),
-                        "validation": None
-                    },
-                    "partition_2": {
-                        "full": (5, 15),
-                        "train": (5, 13),
-                        "test": (13, 15),
-                        "validation": None
-                    }
+                    "partition_1": {"full": (0, 10), "train": (0, 8), "test": (8, 10), "validation": None},
+                    "partition_2": {"full": (5, 15), "train": (5, 13), "test": (13, 15), "validation": None},
                 }
         """
         pass
@@ -151,14 +141,14 @@ def transform(
                         "full": DataFrame(...),
                         "train": DataFrame(...),
                         "test": DataFrame(...),
-                        "validation": None
+                        "validation": None,
                     },
                     "partition_2": {
                         "full": DataFrame(...),
                         "train": DataFrame(...),
                         "test": DataFrame(...),
-                        "validation": None
-                    }
+                        "validation": None,
+                    },
                 }
         """
         pass
@@ -170,3 +160,7 @@ def check_data(self) -> None:
         checking for window overlaps, or validating the feature count.
         """
         pass
+
+    def get_partition_data(self) -> Any:
+        """Return the partitioned data."""
+        pass