Skip to content

Commit

Permalink
Update V6
Browse files Browse the repository at this point in the history
  • Loading branch information
jrolf committed Jan 13, 2024
1 parent 00f56f6 commit a19a02f
Show file tree
Hide file tree
Showing 6 changed files with 296 additions and 36 deletions.
123 changes: 123 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# Jupyter Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# VS Code settings
.vscode/

205 changes: 171 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

## Introduction

Collaborative Generalized Effects Modeling (CGEM) represents a pivotal advancement in statistical modeling and data analysis, tailored for complex, real-world scenarios. Bridging traditional statistical methods with modern machine learning, CGEM stands out in its ability to interpret intricate data relationships found in domains like business analytics and scientific research.
Collaborative Generalized Effects Modeling (CGEM) is a state-of-the-art statistical modeling framework tailored for complex, real-world scenarios. Bridging traditional statistical methods with modern machine learning, CGEM stands out in its ability to interpret intricate data relationships found in domains like business analytics and scientific research.

## Defining Characteristics of CGEM

Expand Down Expand Up @@ -37,65 +37,202 @@ CGEM operates on an iterative algorithm, adjusting the effects within the model

### Example Implementation

Installation
To install the CGEM library, use the following command:

css
Copy code
pip install --upgrade cgem
To verify the installation:

sql
Copy code
pip show cgem
Example Usage of CGEM
This example demonstrates fitting a CGEM model to simulated data, showcasing CGEM's capabilities in handling complex data structures.

Generating Artificial Data
First, we define a function to generate artificial data, simulating a causal system:

```python
# Example code demonstrating the implementation of a CGEM model.
# This would include defining the model, setting up the effects, and running the iterative optimization process.
from cgem import *
import numpy as np
from random import choice
import pandas as pd

def gen_artificial_data_v1(size=10000):
# Generating random values for variables
reg_var_a = np.random.normal(10, 3, size)
reg_var_b = np.random.normal(12, 4, size)
reg_var_c = np.random.normal(15, 5, size)

# Calculating the effect
effect_x = 20.0 + (1.0 * reg_var_a) + (1.5 * reg_var_b) + (2.0 * reg_var_c)

# Defining categories and effects
cats = list("ABCDEFGHIJ")
effs = np.around(np.linspace(0.5, 1.4, len(cats)), 2)
cat2effect = {cat: round(eff, 4) for cat, eff in zip(cats, effs)}

# Generating categorical variable and its effect
cat_var_d = np.array([choice(cats) for _ in range(size)])
cat_effect_d = np.array([cat2effect[c] for c in cat_var_d])

# Adding noise effect
noise_effect = np.random.uniform(0.90, 1.10, size)

# Calculating the target variable
target_var_z = ((effect_x) * cat_effect_d) * noise_effect

# Constructing the DataFrame
return pd.DataFrame({
'TGT_Z': target_var_z,
'REG_A': reg_var_a,
'REG_B': reg_var_b,
'REG_C': reg_var_c,
'CAT_D': cat_var_d
})

# Generate two datasets for fitting and prediction
DF1 = gen_artificial_data_v1(size=10000)
DF2 = gen_artificial_data_v1(size=10000)
```

## Conclusion
## Model Fitting
Next, we fit a CGEM model to the generated data:

CGEM represents a paradigm shift in the field of data analysis, offering a sophisticated, versatile approach that is adept at handling the complexities of modern datasets. Its combination of formulaic freedom, generalization of effects, and focus on causal relationships makes it an invaluable asset in various fields, from business intelligence to scientific research.
### Define the formula for the model
```python
Formula = "TGT_Z = CAT_D_EFF * LIN_REG_EFF"
```

---
### Define terms model parameters
```python
tparams = {
"CAT_D_EFF": {
'model': "CatRegModel()",
'xvars': ['CAT_D'],
'ival' : 10,
},
"LIN_REG_EFF": {
'model': "OLS()",
'xvars': ['REG_A','REG_B','REG_C'],
'ival' : 10,
}
}
```

This document provides a detailed and comprehensive introduction to CGEM, highlighting its unique features and the principles that make it a powerful tool in statistical modeling and data analysis. The content can be further enriched with specific examples and case studies to illustrate the practical applications of CGEM.Certainly! Based on the content in the Python file and the previous discussions, I'll craft a detailed markdown document that encapsulates the essence of Collaborative Generalized Effects Modeling (CGEM).
### Initialize and fit the model
```python
model = CGEM()
model.load_df(DF1)
model.define_form(Formula)
model.define_terms(tparams)
model.fit(25)
```

---
### Make predictions and calculate R-Squared
```python
preds = model.predict(DF2)
actuals = DF2['TGT_Z'].values
r2 = model.calc_r2(actuals, preds)
print('CrosVal R-Squared:', round(r2, 5))
```

# Collaborative Generalized Effects Modeling (CGEM): A Comprehensive Overview
## Model Process Explanation (Step-By-Step)

## Introduction
### Conceptual Overview
CGEM's modeling process is characterized by its adaptability and integration of various statistical and machine learning techniques. The key steps in this process include formulating the model, integrating diverse effects, iterative optimization, and ensuring causal coherence.

Collaborative Generalized Effects Modeling (CGEM) represents a pivotal advancement in statistical modeling and data analysis, tailored for complex, real-world scenarios. Bridging traditional statistical methods with modern machine learning, CGEM stands out in its ability to interpret intricate data relationships found in domains like business analytics and scientific research.
### 1. Defining the Formula of the Model
The first step is to define a model formula. This formula represents the relationship between the dependent variable and an array of independent variables or 'effects.' Unlike traditional models, CGEM allows for complex, non-linear, and interactive relationships.

## Defining Characteristics of CGEM
```python
from cgem import CGEM

### Formulaic Flexibility
# Define the formula
# Here, 'TGT_Z' is the target variable, and
# 'CAT_D_EFF' and 'LIN_REG_EFF' are the effects
Formula = "TGT_Z = CAT_D_EFF * LIN_REG_EFF"
```

CGEM is characterized by an unprecedented level of formulaic freedom. This flexibility allows for the construction of models encompassing a diverse range of mathematical relationships, from linear to non-linear, multiplicative, and beyond. It's an essential feature that enables the modeling of complex dynamics in datasets.
### 2. Integrating Diverse Effects
In CGEM, effects can range from simple linear terms to outputs from sophisticated machine learning models. This flexibility allows the model to capture more complex patterns in the data.

### Generalization of Effects
```python
# Define terms model parameters
tparams = {
"CAT_D_EFF": {
'model': "CatRegModel()", # Categorical Regression Model
'xvars': ['CAT_D'], # Independent variable for this effect
'ival' : 10, # Initial value
},
"LIN_REG_EFF": {
'model': "OLS()", # Ordinary Least Squares Model
'xvars': ['REG_A', 'REG_B', 'REG_C'], # Independent variables for this effect
'ival' : 10, # Initial value
}
}
```

In CGEM, the concept of an 'effect' is broadly interpreted. Effects can range from simple constants or linear terms to outputs from sophisticated machine learning models. This generalization allows CGEM to integrate and leverage diverse methodologies within a single model framework, offering a comprehensive view of the data.
### 3. Iterative Optimization
CGEM models are refined through an iterative process. This process involves adjusting the effects to achieve the best possible fit with the data, enhancing accuracy and reducing overfitting.

### Iterative Refinement and Convergence
```python
# Initialize the CGEM model
model = CGEM()

CGEM employs an iterative process to refine and converge the terms in the model. This approach ensures balanced weighting of each effect, mitigating common issues like overfitting or variable dominance. The focus is on achieving a natural and efficient convergence of terms, enhancing model robustness.
# Load the dataset
model.load_df(DF1)

### Causal Coherence
# Define the model formula and terms
model.define_form(Formula)
model.define_terms(tparams)

A cornerstone of CGEM is its emphasis on maintaining causally coherent relationships within the model. This focus ensures that the outputs are not just statistically significant but also meaningful and interpretable in real-world contexts, bridging the gap between correlation and causation.
# Fit the model
model.fit(25);
```

### Integration with Machine Learning
### 4. Evaluating Model Performance
After fitting the model, it's important to evaluate its performance. This can be done by making predictions on a new dataset and comparing them to actual values.

CGEM is uniquely designed to incorporate machine learning models as effects. This integration harnesses the predictive power of machine learning while maintaining the structural integrity and interpretability of traditional statistical models.
```python
# Predict using a new dataset
preds = model.predict(DF2)

## Core Mechanics of CGEM
# Actual values
actuals = DF2['TGT_Z'].values

CGEM operates on an iterative algorithm, adjusting the effects within the model to achieve the best fit to the data. The process involves:
# Calculate R-Squared for model performance
r2 = model.calc_r2(actuals, preds)
print('CrosVal R-Squared:', round(r2, 5))
```

- **Defining a Model**: Specifying the relationship between dependent and independent variables using a flexible and expressive formula syntax.
- **Incorporating Effects**: Including various effects, ranging from statistical terms to outputs from machine learning models.
- **Iterative Optimization**: Continually refining the model through an iterative process, ensuring each effect is appropriately calibrated.
## Conclusion: Embracing the Future of Data Analysis with CGEM

The Collaborative Generalized Effects Modeling (CGEM) framework represents a significant advancement in the field of data analysis and statistical modeling. Its innovative approach and robust capabilities address the complexities and nuances of real-world data, making it a vital tool for data scientists, analysts, and researchers.

**Unparalleled Flexibility**: CGEM's formulaic flexibility allows for the construction of models that accurately capture complex, non-linear, and interactive relationships in data. This flexibility is crucial in an era where data is not just abundant but also diverse in structure and origin.

**Integration of Diverse Effects**: By allowing the inclusion of a wide range of effects - from simple statistical terms to outputs from advanced machine learning models - CGEM provides a comprehensive view of the data. This integration is key to uncovering deeper insights and patterns that would otherwise remain hidden in traditional modeling approaches.

**Iterative Optimization for Robust Models**: The iterative nature of CGEM ensures that models are not only fine-tuned for accuracy but also resilient against common pitfalls such as overfitting. This process of continuous refinement helps in building models that are both reliable and adaptable to new data.

**Focus on Causal Coherence**: CGEM's emphasis on causally coherent relationships elevates its utility from mere predictive modeling to a tool that can provide actionable insights. This aspect is particularly valuable in decision-making processes where understanding the why behind the data is as important as the what.

**Practical Applicability and Scalability**: The CGEM framework is designed with practicality in mind. It is scalable to different types of datasets and adaptable to various domains, ranging from business intelligence and marketing analytics to healthcare research and environmental studies.

**Empowering Data-Driven Decisions**: With CGEM, organizations and researchers can make data-driven decisions with greater confidence. The insights derived from CGEM models are not just numbers and predictions; they are interpretable, meaningful, and grounded in the reality of the data.

**A Step Towards Advanced Data Science**: CGEM is more than a statistical tool; it's a step towards advanced data science practices. It encourages a deeper understanding of data, promotes the integration of diverse analytical techniques, and fosters a culture of innovation in the analysis and interpretation of data.

In conclusion, CGEM is not just an evolution in statistical modeling; it is a revolution in how we understand and interact with data. Its comprehensive approach, blending traditional statistical methods with modern machine learning, makes it an indispensable tool in the toolkit of any modern data professional. As we continue to navigate the ever-growing sea of data, CGEM stands as a beacon, guiding us towards more accurate, insightful, and actionable data analysis.

---

[End of README.md]

### Example Implementation

```python
# Example code demonstrating the implementation of a CGEM model.
# This would include defining the model, setting up the effects, and running the iterative optimization process.
```

## Conclusion

CGEM represents a paradigm shift in the field of data analysis, offering a sophisticated, versatile approach that is adept at handling the complexities of modern datasets. Its combination of formulaic freedom, generalization of effects, and focus on causal relationships makes it an invaluable asset in various fields, from business intelligence to scientific research.

2 changes: 1 addition & 1 deletion cgem.egg-info/PKG-INFO
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Metadata-Version: 1.1
Name: cgem
Version: 0.0.5
Version: 0.0.6
Summary: CGEM: Collaborative Generalized Effects Modeling
Home-page: https://github.com/jrolf/cgem
Author: James A. Rolfsen
Expand Down
Binary file removed dist/cgem-0.0.5-py3-none-any.whl
Binary file not shown.
Binary file removed dist/cgem-0.0.5.tar.gz
Binary file not shown.
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
setup(
# Basic package information:
name ='cgem',
version ='0.0.5',
version ='0.0.6',
packages=find_packages(), # Automatically find packages in the directory

# Dependencies:
Expand Down

0 comments on commit a19a02f

Please sign in to comment.