Skip to content

Conversation

@TataSatyaPratheek
Copy link

Process Mining Enhancements with Memory Efficiency and DGL Integration

Overview

Greetings ERP.AI team,

I'm thrilled to submit this pull request that transforms the original GNN-based process mining framework into a more comprehensive, memory-efficient toolkit with Deep Graph Library (DGL) integration. I've had an absolute blast working on this project and exploring the fascinating intersection of process mining and graph neural networks!

Key Enhancements

  • DGL Integration: Implemented efficient graph operations with Deep Graph Library for optimized performance
  • Memory Optimization: Added streaming processing, vectorized operations, and adaptive batching strategies to handle large process logs with minimal memory footprint
  • Advanced Attention Mechanisms: Developed multiple attention types (basic, positional, diverse, combined) for richer process understanding
  • Comprehensive Toolkit: Expanded the framework into a full-fledged toolkit with command-line interface and Python API
  • Enhanced Visualization: Added interactive visualizations including process flow diagrams, Sankey diagrams, and attention heatmaps
  • Accelerated Training: Implemented mixed precision training, memory-efficient batching, and gradient checkpointing
  • Ablation Studies: Created systematic testing of model components with parallel experimentation capabilities

Background

As a mathematician from Hyderabad with a Masters in Mathematics from India and another Masters in Data Analytics and AI from France, I'm passionate about applying advanced mathematical modeling to improve manufacturing and business processes in India. While I'm not an engineer by training, I believe my mathematical background offers a unique perspective on process optimization problems.

Implementation Notes

I want to be transparent that I used Claude 3.7 Sonnet to help me with code generation, as I was eager to contribute but had limited time available. The implementation is currently incomplete and requires further testing and refinement from a competent team. I've focused primarily on the architecture and API design, with placeholder implementations for some of the more complex components.

Future Work

If merged, I'd love to continue contributing to this project by:

  1. Implementing proper test cases for all components
  2. Improving documentation with detailed examples
  3. Optimizing the memory efficiency further for extremely large process logs
  4. Developing industry-specific modules for manufacturing processes

I'm eager to learn from your team and improve my contributions based on your feedback. I believe this work represents a good starting point for discussion about the direction of the project.

Thank you for considering my contribution. The childlike excitement I feel about improving processes and manufacturing in India through this work is difficult to contain!

Warm regards,
Satya Pratheek Tata

… progress tracking during training and evaluation
…; remove unused histogram plotting in process mining
…sure correct tensor handling and avoid broadcasting issues
…mized functions for node and edge processing
…methods, and improve visualization memory efficiency
… mapping in data loader, and update README for DGL integration and ablation study features
…hancing label extraction and improving fallback mechanism for random splits.
…nd improve compatibility with NumPy input data
…pes and ensure continuous labels for classification
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant