Skip to content

Standardization of Transformer-Based Attention Network Execution for Pattern-Level Optimizations in Graphs.

Notifications You must be signed in to change notification settings

ECASLab/llm_graph_standardization

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 

Repository files navigation

LLM_Graph_Standardization

πŸ“š Project Overview

Standardization of Transformer-Based Attention Network Execution for Pattern-Level Optimizations in Graphs.

This project aims to address the fragmentation in the implementation of inference services for transformer-based models by proposing a standard for graph representation. The standard ensures optimal utilization of hardware accelerators (GPUs and FPGAs) and introduces a GGML-based application tailored for models such as LLAMA 2, LLAMA 3, ViT, and Whisper.


🎯 Objectives

  1. Graph Representation Analysis
    Analyze the graph representation in GGML-based applications to identify common patterns that optimize the execution of transformer-based networks on high-performance hardware. This includes studying graph construction and management in models such as LLAMA 2, LLAMA 3, ViT, and Whisper.

  2. Standard Design for Graph Execution
    Design a standard for graph representation and execution, establishing operations, hierarchies, and structures compatible with parallel computing architectures (HPC). The standard ensures seamless integration with embedded systems equipped with GPUs and FPGAs.

  3. GGML-Based Application Implementation
    Implement a GGML-based application to process the execution graphs of LLAMA 2 and ViT models following the designed standard, ensuring efficient execution on specialized hardware.

  4. Execution Graph Optimization
    Optimize the LLAMA 2 execution graph using the layer fusion technique. The goal is to maximize parallelism and leverage the full computational capabilities of accelerated hardware.

  5. Performance Evaluation
    Evaluate the impact of the developed standard and application through rigorous performance testing. Analyze key metrics such as scalability and execution time reduction to validate efficiency.


πŸ—οΈ Project Structure

LLM_Graph_Standardization/
β”‚
β”œβ”€β”€ src/                    # Source code for graph processing and optimization
β”‚   β”œβ”€β”€ ggml_integration/   # GGML-based application modules
β”‚   β”œβ”€β”€ graph_standard/     # Standard definitions for graph representation
β”‚   └── hardware_support/   # FPGA and GPU integration code
β”‚
β”œβ”€β”€ tests/                  # Performance and scalability tests
β”‚
β”œβ”€β”€ docs/                   # Documentation and design specifications
β”‚
└── README.md               # Project overview and objectives (this file)

About

Standardization of Transformer-Based Attention Network Execution for Pattern-Level Optimizations in Graphs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published