Different types of tables are widely used to store and present information. To automatically process numerous tables and gain valuable insights, researchers have proposed a series of deep-learning models for various table-based tasks, e.g., table question answering (TQA), table-to-text (T2T), text-to-sql (NL2SQL) and table fact verification (TFV). Recently, the emerging Large Language Models (LLMs) and more powerful Multimodal Large Language Models (MLLMs) have opened up new possibilities for processing the tabular data, i.e., we can use one general model to process diverse tables and fulfill different tabular tasks based on the user natural language instructions. We refer to these LLMs speciallized for tabular tasks as Tabular LLMs
. In this repository, we collect a paper list about recent Tabular (M)LLMs and divide them into the following categories based on their key idea.
Table of Contents:
- Survey of Tabular LLMs and table understanding
- Prompting LLMs for different tabular tasks, e.g., in-context learning, prompt engineering and integrating external tools.
- Training LLMs for better table understanding ability, e.g., training existing LLMs by instruction fine-tuning or post-pretraining.
- Developing agents for processing tabular data, e.g., devolping copilot for processing excel tables.
- Empirical study or benchmarks for evaluating LLMs' table understanding ability, e.g., exploring the influence of various table types or table formats.
- Multimodal table understanding, e.g., training MLLMs to understand diverse table images and textual user requests.
Task Names and Abbreviations:
Task Names | Abbreviations | Task Descriptions |
---|---|---|
Table Question Answering | TQA | Answering questions based on the table(s), e.g., answer look-up or computation questions about table(s). |
Table-to-Text | Table2Text or T2T | Generate a text based on the table(s), e.g., generate a analysis report given a financial statement. |
Text-to-Table | Text2Table | Generate structured tables based on input text, e.g., generate a statistical table based on the game summary. |
Table Fact Verification | TFV | Judging if a statement is true or false (or not enough evidence) based on the table(s) |
Text-to-SQL | NL2SQL | Generate a SQL statement to answer the user question based on the database schema |
Tabular Mathematical Reasoning | TMR | Solving mathematical reasoning problems based on the table(s), e.g., solve math word problems related to a table |
Table-and-Text Question Answering | TAT-QA | Answering questions based on both table(s) and their related texts, e.g., answer questions given wikipedia tables and their surrounding texts. |
Table Interpretation | TI | Interpreting basic table content and structure information, e.g., column type annotation, entity linking, relation extraction, cell type classification et al. |
Table Augmentation | TA | Augmenting existing tables with new data, e.g., schema augmentation, row population, et al. |
Title | Conference | Date | Pages |
---|---|---|---|
Large Language Model for Table Processing: A Survey | arxiv | 2024-02-04 | 9 |
A Survey of Table Reasoning with Large Language Models | arxiv | 2024-02-13 | 9 |
Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey | arxiv | 2024-03-01 | 41 |
Transformers for Tabular Data Representation: A Survey of Models and Applications | TACL 2023 | 23 | |
Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks | IJCAI 2022 | 2022-01-24 | 15 |
Title | Conference | Date | Task | LLM Backbone | Code |
---|---|---|---|---|---|
Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science | arxiv | 2024-03-29 | Predictive Tabular Tasks | Llama2 7B | HuggingFace |
HGT: Leveraging Heterogeneous Graph-enhanced Large Language Models for Few-shot Complex Table Understanding | arxiv | 2024-03-28 | TI,TQA | Vicuna-1.5 7B | |
TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios |
arxiv | 2024-03-28 | Table Manipulation | CodeLlama 7B, 13B | Github |
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding |
CoLM 2024 | 2024-02-26 | TQA,TFV,T2T,NL2SQL | CodeLlama 7B-34B | Github |
TableLlama: Towards Open Large Generalist Models for Tables |
NAACL 2024 | 2023-11-15 | TQA,TFV,T2T,TA,TI | Llama2 7B | Github |
HELLaMA: LLaMA-based Table to Text Generation by Highlighting the Important Evidence | arxiv | 2023-11-15 | T2T | Llama2 7B-13B | |
Table-GPT: Table-tuned GPT for Diverse Table Tasks | arxiv | 2023-10-13 | All kinds of table task | GPT-3.5, ChatGPT |
Title | Conference | Date | Task | Code |
---|---|---|---|---|
HYTREL: Hypergraph-enhanced Tabular Data Representation Learning |
NIPS 2023 | 2023-07-14 | TA, TI | Github |
FLAME: A small language model for spreadsheet formulas | AAAI 2024 | 2023-01-31 | Generating Excel Formulas | Github |
Title | Conference | Date | Task | Code |
---|---|---|---|---|
SheetAgent: A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models | arxiv | 2024-03-06 | Manipulating Excels with LLM | Github |
EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records |
arxiv | 2024-01-13 | TQA | Github |
InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks |
arxiv | 2024-01-10 | Data Analysis | Github |
DB-GPT: Empowering Database Interactions with Private Large Language Models |
arxiv | 2023-12-29 | Data Analysis | Github |
ReAcTable: Enhancing ReAct for Table Question Answering | arxiv | 2023-10-01 | TQA | |
SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models |
NIPS 2023 | 2023-05-30 | Manipulating Excels with LLM | Github |
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT | arxiv | 2023-07-17 | Manipulating CSV table with LLM |
Title | Conference | Date | Task | Code |
---|---|---|---|---|
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy | arxiv | 2024-06-03 | TQA,TI | |
TableVQA-Bench: A Visual Question Answering Benchmark on Multiple Table Domains |
arxiv | 2024-04-30 | TQA, TFV | Github |
Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs | ACL 2024 | 2024-02-19 | TQA,TFV,T2T | |
Multimodal Table Understanding |
ACL 2024 | 2024-02-15 | TQA, TFV, T2T, TI, TAT-QA, TMR | Github |