Standardization of Transformer-Based Attention Network Execution for Pattern-Level Optimizations in Graphs.
This project aims to address the fragmentation in the implementation of inference services for transformer-based models by proposing a standard for graph representation. The standard ensures optimal utilization of hardware accelerators (GPUs and FPGAs) and introduces a GGML-based application tailored for models such as LLAMA 2, LLAMA 3, ViT, and Whisper.
-
Graph Representation Analysis
Analyze the graph representation in GGML-based applications to identify common patterns that optimize the execution of transformer-based networks on high-performance hardware. This includes studying graph construction and management in models such as LLAMA 2, LLAMA 3, ViT, and Whisper. -
Standard Design for Graph Execution
Design a standard for graph representation and execution, establishing operations, hierarchies, and structures compatible with parallel computing architectures (HPC). The standard ensures seamless integration with embedded systems equipped with GPUs and FPGAs. -
GGML-Based Application Implementation
Implement a GGML-based application to process the execution graphs of LLAMA 2 and ViT models following the designed standard, ensuring efficient execution on specialized hardware. -
Execution Graph Optimization
Optimize the LLAMA 2 execution graph using the layer fusion technique. The goal is to maximize parallelism and leverage the full computational capabilities of accelerated hardware. -
Performance Evaluation
Evaluate the impact of the developed standard and application through rigorous performance testing. Analyze key metrics such as scalability and execution time reduction to validate efficiency.
LLM_Graph_Standardization/
β
βββ src/ # Source code for graph processing and optimization
β βββ ggml_integration/ # GGML-based application modules
β βββ graph_standard/ # Standard definitions for graph representation
β βββ hardware_support/ # FPGA and GPU integration code
β
βββ tests/ # Performance and scalability tests
β
βββ docs/ # Documentation and design specifications
β
βββ README.md # Project overview and objectives (this file)