forked from ChiaXinLiang/MLLM-book
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request ChiaXinLiang#11 from ChiaXinLiang/marcus_brach
update chapter 6
- Loading branch information
Showing
18 changed files
with
1,095 additions
and
24 deletions.
There are no files selected for viewing
24 changes: 24 additions & 0 deletions
24
...o Instruction Pruning and Prompt Engineering/reference_paper/Beyond LLaVA-HD.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
## Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models | ||
|
||
### Date of Publication | ||
June 2024 (Preprint) | ||
|
||
### Title | ||
"Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models" | ||
|
||
### Core Ideas | ||
|
||
1. **High-Resolution Imaging in LMMs**: The paper emphasizes the critical role of high-resolution imaging in Large Multimodal Models (LMMs) for enhanced visual perception and reasoning capabilities. | ||
|
||
2. **Addressing Current Limitations**: Identifies the gap in current research where using global and local image branches resized to match global resolution leads to high computational costs and potential overshadowing of global context by local image tokens. | ||
|
||
3. **Innovative Approach - SliME**: Introduces Sophisticated Tasks, Local image compression, and Mixture of global Experts (SliME), a novel framework that uses a mixture of experts to project and extract global context without feature compression. | ||
|
||
4. **Optimized Framework**: | ||
- Employs a mixture of adapters tailored for different tasks to extract contextual information from the global view. | ||
- Introduces learnable query embeddings for local image patches to reduce the number of image tokens. | ||
- Proposes an alternating training approach to ensure balanced learning between global and local aspects. | ||
|
||
5. **Efficiency and Performance**: Achieves leading performance across various benchmarks with only 2 million training data, demonstrating significant efficiency and effectiveness in high-resolution multimodal tasks. | ||
|
||
6. **Contribution to the Field**: Presents a sophisticated solution to the challenges of high-resolution image processing in LMMs, potentially advancing the capabilities of multimodal AI systems in visual understanding and reasoning tasks. |
21 changes: 21 additions & 0 deletions
21
...ference_paper/Why do LLaVA Vision-Language Models Reply to Images in English.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
## Multilingual Bias in Vision-Language Models | ||
|
||
### Date of Publication | ||
July 2024 (Preprint) | ||
|
||
### Title | ||
"Why do LLaVA Vision-Language Models Reply to Images in English?" | ||
|
||
### Core Ideas | ||
|
||
1. **Observation of Multilingual Bias**: LLaVA-style vision-language models (VLMs) exhibit a strong tendency to respond in English when an image is included in a query, regardless of the query's original language. This results in English responses even when the query is in a different language, highlighting a significant gap in current VLMs. | ||
|
||
2. **Research Approach**: | ||
- Extensive ablation of the design space | ||
- Mechanistic analysis of the models' internal representations of image and text inputs | ||
|
||
3. **Origin of Bias**: Both the ablation and mechanistic analyses indicate that the bias originates from the language modeling component of the LLaVA model. This component demonstrates a predisposition towards English when processing multimodal inputs. | ||
|
||
4. **Implications**: This research exposes a critical limitation in current VLMs, where they struggle to maintain the query language when responding to image-text inputs, potentially hindering their effectiveness in non-English contexts. | ||
|
||
5. **Contribution**: The study contributes to the development of more capable and inclusive VLMs that can better serve non-English contexts. By identifying and analyzing this multilingual bias, the research paves the way for addressing an important limitation in current multimodal AI systems and improving their global applicability. |
21 changes: 21 additions & 0 deletions
21
...ter_2_Instruction Pruning Techniques/section_1_Dynamic Token Pruning/LazyLLM.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
## LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference | ||
|
||
### Date of Publication | ||
July 19, 2024 (Preprint) | ||
|
||
### Title | ||
"LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference" | ||
|
||
### Core Ideas | ||
|
||
1. **Abstract Summary**: The paper addresses the efficiency of transformer-based large language models during inference, particularly focusing on the prefilling stage for long prompts, which can become a performance bottleneck due to the need to compute the KV cache for all tokens. | ||
|
||
2. **Gap in Current Research**: The study identifies an open question in current research: whether all prompt tokens are necessary for generating the first token. This gap is significant as the prefilling stage can substantially slow down the generation process. | ||
|
||
3. **Innovation - LazyLLM Method**: The paper introduces LazyLLM, a novel approach that dynamically selects different subsets of tokens from the context at various generation steps. Unlike static pruning methods, LazyLLM can reincorporate previously pruned tokens, enabling more efficient token processing. | ||
|
||
4. **Method**: LazyLLM selectively computes the KV for tokens deemed important for the next token prediction during both the prefilling and decoding stages. This dynamic selection process is a key feature distinguishing it from static pruning methods. | ||
|
||
5. **Contribution**: | ||
- LazyLLM is demonstrated to be a generic method that can be integrated with existing language models to significantly accelerate the generation process without requiring fine-tuning. | ||
- In a practical application of multi-document question-answering, LazyLLM accelerates the prefilling stage of the LLama 2 7B model by 2.34 times while maintaining accuracy, showcasing its effectiveness and efficiency. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
\documentclass{book} | ||
\usepackage{hyperref} | ||
|
||
\begin{document} | ||
|
||
\frontmatter | ||
\title{Instruction Pruning and Prompt Engineering for Multimodal Large Language Models} | ||
\author{Your Name} | ||
\maketitle | ||
\tableofcontents | ||
|
||
\mainmatter | ||
|
||
\chapter{Introduction to Instruction Pruning and Prompt Engineering in MLLM} | ||
|
||
\section{Overview of MLLMs and LLMs} | ||
\section{The Importance of Efficiency and Adaptability} | ||
\section{Challenges in Model Performance and Bias Mitigation} | ||
|
||
\chapter{Instruction Pruning Techniques} | ||
\section{Dynamic Token Pruning} | ||
\subsection{Overview of Dynamic Token Pruning} | ||
Dynamic token pruning, exemplified by the innovative LazyLLM approach introduced by Fu et al. (2024), is a cutting-edge technique designed to enhance the efficiency of large language models (LLMs) during inference, particularly in long-context scenarios. This method addresses performance bottlenecks in the prefilling stage by dynamically selecting and processing subsets of tokens, selectively computing the Key-Value (KV) cache, and reincorporating previously pruned tokens. While primarily focused on text-based LLMs, the concept shows promise for extension to multimodal contexts, such as optimizing image token processing in Large Multimodal Models (LMMs) and mitigating language biases in multilingual vision-language models. The adoption of dynamic token pruning techniques like LazyLLM offers significant benefits, including improved efficiency (demonstrated by a 2.34 times acceleration in the prefilling stage for the LLama 2 7B model), maintained accuracy despite reduced computational load, and potential adaptability to various LLM architectures and tasks, making it a versatile solution for existing and future language models across unimodal and multimodal applications. | ||
|
||
As research in this area progresses, dynamic token pruning is likely to play a crucial role in enhancing the efficiency and adaptability of both unimodal and multimodal large language models. | ||
|
||
\subsection{LazyLLM Approach} | ||
\textit{Example:} Implementing LazyLLM for multi-document QA tasks \cite{fu2024} | ||
|
||
\section{Coarse-to-Fine Pruning Strategies} | ||
\subsection{CoT-Influx for Mathematical Reasoning} | ||
\textit{Example:} Applying CoT-Influx to enhance few-shot learning in math problems \cite{huang2023} | ||
|
||
\chapter{Advanced Prompt Engineering Methods} | ||
\section{Bias Mitigation in Prompts} | ||
\subsection{Reducing Gender Bias in Machine Translation} | ||
\textit{Example:} Structuring prompts to minimize gender bias in LLM translations \cite{sant2024} | ||
|
||
\section{Memorization and Security Considerations} | ||
\subsection{Uncovering Data Leakage through Instruction-Based Prompts} | ||
\textit{Example:} Using Alpaca against Vicuna to expose pre-training data \cite{kassem2024} | ||
|
||
\subsection{Protecting Against Malicious Prompts} | ||
\textit{Example:} Implementing safeguards against MaPP attacks in code generation \cite{heibel2024} | ||
|
||
\chapter{Efficiency and Adaptability in LLMs} | ||
\section{Mixture of Experts (MoE) Approaches} | ||
\subsection{Training-Free MoE for Sequence-Level Expert Selection} | ||
\textit{Example:} Implementing GRIFFIN for efficient model deployment \cite{dong2024} | ||
|
||
\section{Interactive Learning and Environment Adaptation} | ||
\subsection{Autonomous LLM Agents for New Environments} | ||
\textit{Example:} Designing an AutoManual framework for task adaptation \cite{chen2024} | ||
|
||
\chapter{Practical Applications and Case Studies} | ||
\section{Multi-Document Question Answering} | ||
\section{Mathematical Reasoning and Problem Solving} | ||
\section{Cross-Lingual Tasks and Translation} | ||
\section{Code Generation and Security} | ||
|
||
\chapter{Challenges and Future Directions} | ||
\section{Balancing Efficiency and Performance} | ||
\section{Addressing Bias and Fairness} | ||
\section{Enhancing Security and Privacy} | ||
\section{Improving Adaptability to New Domains} | ||
|
||
\chapter{Conclusion} | ||
\section{Summary of Key Findings} | ||
\section{Implications for AI Research and Development} | ||
\section{Future Research Opportunities} | ||
|
||
\backmatter | ||
\bibliographystyle{plain} | ||
\bibliography{references} | ||
|
||
\end{document} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
role: AI researcher | ||
|
||
task-objective: suggestion for learning instruction pruning or prompt with MLLM based LLM model. | ||
|
||
instructions: write an outline for writing a book | ||
|
||
constraint: | ||
1. find me the research paper in recents 2-3 years | ||
2. you should contains all the SOTA paper inside | ||
3. you should need to provide the example for each of your idea | ||
|
||
context: | ||
The exploration of instruction pruning and prompt engineering in the context of Multimodal Large Language Models (MLLMs) and Large Language Models (LLMs) reveals a complex interplay between efficiency, bias mitigation, and model performance. This synthesis of recent research highlights the potential and challenges of these techniques in enhancing LLM capabilities. The following sections delve into the specific aspects of instruction pruning and prompt engineering, drawing insights from the provided papers. | ||
|
||
Instruction Pruning | ||
Instruction pruning involves selectively removing parts of the input to improve model efficiency without compromising performance. This technique is particularly relevant in scenarios where computational resources are limited or when models need to process long inputs. | ||
|
||
Dynamic Token Pruning: LazyLLM introduces a dynamic token pruning method that selectively computes key-value pairs for tokens crucial for the next token prediction. This approach significantly accelerates the generation process, especially in tasks requiring long context processing, such as multi-document question answering, without the need for fine-tuning(Fu et al., 2024). | ||
Coarse-to-Fine Pruning: CoT-Influx employs a coarse-to-fine pruning strategy to enhance math reasoning capabilities in LLMs. By identifying and pruning unimportant tokens, this method improves the efficiency of few-shot learning, achieving notable performance gains across various mathematical datasets(Huang et al., 2023). | ||
Prompt Engineering | ||
Prompt engineering focuses on designing input prompts to guide LLMs towards desired outputs, addressing issues like bias and memorization. | ||
|
||
Bias Mitigation: The study on gender bias in machine translation demonstrates how prompt engineering can reduce bias in LLMs. By structuring prompts effectively, the researchers achieved a significant reduction in gender bias, narrowing the performance gap between LLMs and traditional NMT systems(Sant et al., 2024). | ||
Memorization and Security: The Alpaca against Vicuna paper highlights the use of instruction-based prompts to uncover memorization in LLMs. This method reveals that instruction-tuned models can expose pre-training data, suggesting the need for careful prompt design to prevent data leakage(Kassem et al., 2024). Additionally, the MaPP attack illustrates how malicious prompts can introduce vulnerabilities in code generated by LLMs, emphasizing the importance of securing prompts against manipulation(Heibel & Lowd, 2024). | ||
Efficiency and Adaptability | ||
The balance between efficiency and adaptability is crucial in deploying LLMs across diverse tasks and environments. | ||
|
||
Mixture of Experts (MoE): GRIFFIN leverages a training-free MoE approach to select feedforward experts at the sequence level, maintaining model performance while reducing computational costs. This method exemplifies how prompt engineering can enhance efficiency without sacrificing adaptability(Dong et al., 2024). | ||
Interactive Learning: AutoManual showcases how LLM agents can autonomously adapt to new environments through interactive learning and prompt engineering. This framework improves task success rates by enabling LLMs to build and refine their understanding of environmental rules(Chen et al., 2024). | ||
While instruction pruning and prompt engineering offer promising avenues for improving LLM performance, they also present challenges. The potential for bias, memorization, and security vulnerabilities necessitates careful consideration in prompt design and model deployment. Moreover, the adaptability of LLMs to new tasks and environments remains a critical area for further research and development. These insights underscore the need for ongoing innovation in LLM methodologies to harness their full potential while mitigating associated risks. | ||
|
||
format: | ||
main architecture of the book in latex |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
return the format with: | ||
1. date of publish | ||
2. title | ||
3. core idea | ||
a. abstract | ||
b. gap of current research | ||
c. innovation | ||
d. method | ||
e. contribution |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
abstract, gap of current research, innonvation in this paper, method, contribution |
Oops, something went wrong.