Update V6

jrolf · Jan 13, 2024 · a19a02f · a19a02f
1 parent 00f56f6
commit a19a02f
Show file tree

Hide file tree

Showing 6 changed files with 296 additions and 36 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,123 @@
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# PyInstaller
+# Usually these files are written by a python script from a template
+# before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+
+# pytype static type analyzer
+.pytype/
+
+# Cython debug symbols
+cython_debug/
+
+# VS Code settings
+.vscode/
+
diff --git a/README.md b/README.md
@@ -3,7 +3,7 @@
 
 ## Introduction
 
-Collaborative Generalized Effects Modeling (CGEM) represents a pivotal advancement in statistical modeling and data analysis, tailored for complex, real-world scenarios. Bridging traditional statistical methods with modern machine learning, CGEM stands out in its ability to interpret intricate data relationships found in domains like business analytics and scientific research.
+Collaborative Generalized Effects Modeling (CGEM) is a state-of-the-art statistical modeling framework tailored for complex, real-world scenarios. Bridging traditional statistical methods with modern machine learning, CGEM stands out in its ability to interpret intricate data relationships found in domains like business analytics and scientific research.
 
 ## Defining Characteristics of CGEM
 
@@ -37,65 +37,202 @@ CGEM operates on an iterative algorithm, adjusting the effects within the model
 
 ### Example Implementation
 
+Installation
+To install the CGEM library, use the following command:
+
+css
+Copy code
+pip install --upgrade cgem
+To verify the installation:
+
+sql
+Copy code
+pip show cgem
+Example Usage of CGEM
+This example demonstrates fitting a CGEM model to simulated data, showcasing CGEM's capabilities in handling complex data structures.
+
+Generating Artificial Data
+First, we define a function to generate artificial data, simulating a causal system:
+
 ```python
-# Example code demonstrating the implementation of a CGEM model.
-# This would include defining the model, setting up the effects, and running the iterative optimization process.
+from cgem import *
+import numpy as np
+from random import choice
+import pandas as pd
+
+def gen_artificial_data_v1(size=10000):
+    # Generating random values for variables
+    reg_var_a = np.random.normal(10, 3, size)
+    reg_var_b = np.random.normal(12, 4, size)
+    reg_var_c = np.random.normal(15, 5, size)
+
+    # Calculating the effect
+    effect_x = 20.0 + (1.0 * reg_var_a) + (1.5 * reg_var_b) + (2.0 * reg_var_c)
+
+    # Defining categories and effects
+    cats = list("ABCDEFGHIJ")
+    effs = np.around(np.linspace(0.5, 1.4, len(cats)), 2)
+    cat2effect = {cat: round(eff, 4) for cat, eff in zip(cats, effs)}
+
+    # Generating categorical variable and its effect
+    cat_var_d = np.array([choice(cats) for _ in range(size)])
+    cat_effect_d = np.array([cat2effect[c] for c in cat_var_d])
+
+    # Adding noise effect
+    noise_effect = np.random.uniform(0.90, 1.10, size)
+
+    # Calculating the target variable
+    target_var_z = ((effect_x) * cat_effect_d) * noise_effect
+
+    # Constructing the DataFrame
+    return pd.DataFrame({
+        'TGT_Z': target_var_z,
+        'REG_A': reg_var_a,
+        'REG_B': reg_var_b,
+        'REG_C': reg_var_c,
+        'CAT_D': cat_var_d
+    })
+
+# Generate two datasets for fitting and prediction
+DF1 = gen_artificial_data_v1(size=10000)
+DF2 = gen_artificial_data_v1(size=10000) 
 ```
 
-## Conclusion
+## Model Fitting
+Next, we fit a CGEM model to the generated data:
 
-CGEM represents a paradigm shift in the field of data analysis, offering a sophisticated, versatile approach that is adept at handling the complexities of modern datasets. Its combination of formulaic freedom, generalization of effects, and focus on causal relationships makes it an invaluable asset in various fields, from business intelligence to scientific research.
+### Define the formula for the model
+```python
+Formula = "TGT_Z = CAT_D_EFF * LIN_REG_EFF"
+```
 
----
+### Define terms model parameters
+```python
+tparams = {
+    "CAT_D_EFF": {
+        'model': "CatRegModel()", 
+        'xvars': ['CAT_D'],
+        'ival' : 10,
+    },
+    "LIN_REG_EFF": {
+        'model': "OLS()", 
+        'xvars': ['REG_A','REG_B','REG_C'],
+        'ival' : 10,
+    } 
+}   
+```
 
-This document provides a detailed and comprehensive introduction to CGEM, highlighting its unique features and the principles that make it a powerful tool in statistical modeling and data analysis. The content can be further enriched with specific examples and case studies to illustrate the practical applications of CGEM.Certainly! Based on the content in the Python file and the previous discussions, I'll craft a detailed markdown document that encapsulates the essence of Collaborative Generalized Effects Modeling (CGEM).
+### Initialize and fit the model
+```python
+model = CGEM() 
+model.load_df(DF1)  
+model.define_form(Formula) 
+model.define_terms(tparams)  
+model.fit(25)
+```
 
----
+### Make predictions and calculate R-Squared
+```python
+preds = model.predict(DF2) 
+actuals = DF2['TGT_Z'].values
+r2 = model.calc_r2(actuals, preds) 
+print('CrosVal R-Squared:', round(r2, 5))
+```
 
-# Collaborative Generalized Effects Modeling (CGEM): A Comprehensive Overview
+## Model Process Explanation (Step-By-Step) 
 
-## Introduction
+### Conceptual Overview
+CGEM's modeling process is characterized by its adaptability and integration of various statistical and machine learning techniques. The key steps in this process include formulating the model, integrating diverse effects, iterative optimization, and ensuring causal coherence.
 
-Collaborative Generalized Effects Modeling (CGEM) represents a pivotal advancement in statistical modeling and data analysis, tailored for complex, real-world scenarios. Bridging traditional statistical methods with modern machine learning, CGEM stands out in its ability to interpret intricate data relationships found in domains like business analytics and scientific research.
+### 1. Defining the Formula of the Model
+The first step is to define a model formula. This formula represents the relationship between the dependent variable and an array of independent variables or 'effects.' Unlike traditional models, CGEM allows for complex, non-linear, and interactive relationships. 
 
-## Defining Characteristics of CGEM
+```python
+from cgem import CGEM
 
-### Formulaic Flexibility
+# Define the formula
+# Here, 'TGT_Z' is the target variable, and 
+# 'CAT_D_EFF' and 'LIN_REG_EFF' are the effects
+Formula = "TGT_Z = CAT_D_EFF * LIN_REG_EFF"
+```
 
-CGEM is characterized by an unprecedented level of formulaic freedom. This flexibility allows for the construction of models encompassing a diverse range of mathematical relationships, from linear to non-linear, multiplicative, and beyond. It's an essential feature that enables the modeling of complex dynamics in datasets.
+### 2. Integrating Diverse Effects
+In CGEM, effects can range from simple linear terms to outputs from sophisticated machine learning models. This flexibility allows the model to capture more complex patterns in the data.
 
-### Generalization of Effects
+```python
+# Define terms model parameters
+tparams = {
+    "CAT_D_EFF": {
+        'model': "CatRegModel()",  # Categorical Regression Model
+        'xvars': ['CAT_D'],        # Independent variable for this effect
+        'ival' : 10,               # Initial value
+    },
+    "LIN_REG_EFF": {
+        'model': "OLS()",          # Ordinary Least Squares Model
+        'xvars': ['REG_A', 'REG_B', 'REG_C'],  # Independent variables for this effect
+        'ival' : 10,               # Initial value
+    }
+}
+```
 
-In CGEM, the concept of an 'effect' is broadly interpreted. Effects can range from simple constants or linear terms to outputs from sophisticated machine learning models. This generalization allows CGEM to integrate and leverage diverse methodologies within a single model framework, offering a comprehensive view of the data.
+### 3. Iterative Optimization
+CGEM models are refined through an iterative process. This process involves adjusting the effects to achieve the best possible fit with the data, enhancing accuracy and reducing overfitting.
 
-### Iterative Refinement and Convergence
+```python
+# Initialize the CGEM model
+model = CGEM()
 
-CGEM employs an iterative process to refine and converge the terms in the model. This approach ensures balanced weighting of each effect, mitigating common issues like overfitting or variable dominance. The focus is on achieving a natural and efficient convergence of terms, enhancing model robustness.
+# Load the dataset
+model.load_df(DF1)
 
-### Causal Coherence
+# Define the model formula and terms
+model.define_form(Formula)
+model.define_terms(tparams)
 
-A cornerstone of CGEM is its emphasis on maintaining causally coherent relationships within the model. This focus ensures that the outputs are not just statistically significant but also meaningful and interpretable in real-world contexts, bridging the gap between correlation and causation.
+# Fit the model
+model.fit(25);
+```
 
-### Integration with Machine Learning
+### 4. Evaluating Model Performance
+After fitting the model, it's important to evaluate its performance. This can be done by making predictions on a new dataset and comparing them to actual values.
 
-CGEM is uniquely designed to incorporate machine learning models as effects. This integration harnesses the predictive power of machine learning while maintaining the structural integrity and interpretability of traditional statistical models.
+```python
+# Predict using a new dataset
+preds = model.predict(DF2) 
 
-## Core Mechanics of CGEM
+# Actual values
+actuals = DF2['TGT_Z'].values
 
-CGEM operates on an iterative algorithm, adjusting the effects within the model to achieve the best fit to the data. The process involves:
+# Calculate R-Squared for model performance
+r2 = model.calc_r2(actuals, preds) 
+print('CrosVal R-Squared:', round(r2, 5))
+```
 
-- **Defining a Model**: Specifying the relationship between dependent and independent variables using a flexible and expressive formula syntax.
-- **Incorporating Effects**: Including various effects, ranging from statistical terms to outputs from machine learning models.
-- **Iterative Optimization**: Continually refining the model through an iterative process, ensuring each effect is appropriately calibrated.
+## Conclusion: Embracing the Future of Data Analysis with CGEM
+
+The Collaborative Generalized Effects Modeling (CGEM) framework represents a significant advancement in the field of data analysis and statistical modeling. Its innovative approach and robust capabilities address the complexities and nuances of real-world data, making it a vital tool for data scientists, analysts, and researchers.
+
+**Unparalleled Flexibility**: CGEM's formulaic flexibility allows for the construction of models that accurately capture complex, non-linear, and interactive relationships in data. This flexibility is crucial in an era where data is not just abundant but also diverse in structure and origin.
+
+**Integration of Diverse Effects**: By allowing the inclusion of a wide range of effects - from simple statistical terms to outputs from advanced machine learning models - CGEM provides a comprehensive view of the data. This integration is key to uncovering deeper insights and patterns that would otherwise remain hidden in traditional modeling approaches.
+
+**Iterative Optimization for Robust Models**: The iterative nature of CGEM ensures that models are not only fine-tuned for accuracy but also resilient against common pitfalls such as overfitting. This process of continuous refinement helps in building models that are both reliable and adaptable to new data.
+
+**Focus on Causal Coherence**: CGEM's emphasis on causally coherent relationships elevates its utility from mere predictive modeling to a tool that can provide actionable insights. This aspect is particularly valuable in decision-making processes where understanding the why behind the data is as important as the what.
+
+**Practical Applicability and Scalability**: The CGEM framework is designed with practicality in mind. It is scalable to different types of datasets and adaptable to various domains, ranging from business intelligence and marketing analytics to healthcare research and environmental studies.
+
+**Empowering Data-Driven Decisions**: With CGEM, organizations and researchers can make data-driven decisions with greater confidence. The insights derived from CGEM models are not just numbers and predictions; they are interpretable, meaningful, and grounded in the reality of the data.
+
+**A Step Towards Advanced Data Science**: CGEM is more than a statistical tool; it's a step towards advanced data science practices. It encourages a deeper understanding of data, promotes the integration of diverse analytical techniques, and fosters a culture of innovation in the analysis and interpretation of data.
+
+In conclusion, CGEM is not just an evolution in statistical modeling; it is a revolution in how we understand and interact with data. Its comprehensive approach, blending traditional statistical methods with modern machine learning, makes it an indispensable tool in the toolkit of any modern data professional. As we continue to navigate the ever-growing sea of data, CGEM stands as a beacon, guiding us towards more accurate, insightful, and actionable data analysis.
+
+---
+
+[End of README.md]
 
-### Example Implementation
 
-```python
-# Example code demonstrating the implementation of a CGEM model.
-# This would include defining the model, setting up the effects, and running the iterative optimization process.
-```
 
-## Conclusion
 
-CGEM represents a paradigm shift in the field of data analysis, offering a sophisticated, versatile approach that is adept at handling the complexities of modern datasets. Its combination of formulaic freedom, generalization of effects, and focus on causal relationships makes it an invaluable asset in various fields, from business intelligence to scientific research.
 
diff --git a/cgem.egg-info/PKG-INFO b/cgem.egg-info/PKG-INFO
@@ -1,6 +1,6 @@
 Metadata-Version: 1.1
 Name: cgem
-Version: 0.0.5
+Version: 0.0.6
 Summary: CGEM: Collaborative Generalized Effects Modeling
 Home-page: https://github.com/jrolf/cgem
 Author: James A. Rolfsen

diff --git a/dist/cgem-0.0.5-py3-none-any.whl b/dist/cgem-0.0.5-py3-none-any.whl
diff --git a/dist/cgem-0.0.5.tar.gz b/dist/cgem-0.0.5.tar.gz
diff --git a/setup.py b/setup.py
@@ -5,7 +5,7 @@
 setup(
     # Basic package information:
     name    ='cgem',  
-    version ='0.0.5',
+    version ='0.0.6',
     packages=find_packages(),  # Automatically find packages in the directory
 
     # Dependencies: