Official implementation of the paper "Rethinking Tabular Data Understanding with Large Language Models" (https://arxiv.org/abs/2312.16702).
Start by cloning the repository to your local machine:
git clone https://github.com/Leolty/tablellm.git
cd tablellm
Create and activate a new environment, and install the required packages:
conda create -n tablellm python=3.10
conda activate tablellm
pip install -r requirements.txt
Unzip the dataset provided in the repository:
unzip assets/data.zip
After unzipping, you should have the following files:
data
├── wtq.json
├── tabfact.json
For replicating our study's findings, navigate to the scripts folder:
- scripts/all_dp.sh: Runs direct prompting on all wtq datasets.
- scripts/all_pyagent.sh: Runs python shell agent on all wtq datasets.
- scripts/vicuna_example.sh: An example of changing base model to
vicuna
on the subsampled wtq dataset.Ensure vllm is installed beforehand.
- scripts/perturbed_example.sh: An example of running experiments on perturbed wtq dataset.
Detailed explanations of parameters can be found in run_cot.py and run_agent.py.
For hands-on experience with the table agent, refer to the following notebook:
If you find this research useful in your work, please consider citing:
@misc{liu2023rethinking,
title={Rethinking Tabular Data Understanding with Large Language Models},
author={Tianyang Liu and Fei Wang and Muhao Chen},
year={2023},
eprint={2312.16702},
archivePrefix={arXiv},
primaryClass={cs.CL}
}