LLM4Mat-Bench

LLM4Mat-Bench is the largest benchmark to date for evaluating the performance of large language models (LLMs) for materials property prediction.

LLM4Mat-Bench Statistics. *https://www.snumat.com/apis

How to use

Installation

git clone https://github.com/vertaix/LLM4Mat-Bench.git
cd LLM4Mat-Bench
conda create -n <environment_name> requirement.txt
conda activate <environment_name>

Get the data

Download the LLM4Mat-Bench data from this link. Each dataset includes a fixed train/validation/test split for reproducibility and fair model comparison.
Save the data into data folder where LLM4Mat-Bench is the parent directory.

Get the checkpoints

Download the LLM-Prop and MatBERT checkpoints from this link.
Save the checkpoints folder into LLM4Mat-Bench directory.

Evaluating the trained LLM-Prop and MatBERT

Add any modification to the following scripts to evaluate.sh

#!/usr/bin/env bash

DATA_PATH='data/' # where LLM4Mat_Bench data is saved
RESULTS_PATH='results/' # where to save the results
CHECKPOINTS_PATH='checkpoints/' # where model weights were saved
MODEL_NAME='llmprop' # or 'matbert'
DATASET_NAME='mp' # any dataset name in LLM4Mat_Bench
INPUT_TYPE='formula' # other values: 'cif_structure' and 'description'
PROPERTY_NAME='band_gap' # any property name in $DATASET_NAME. Please check the property names associated with each dataset first

python code/llmprop_and_matbert/evaluate.py \
--data_path $DATA_PATH \
--results_path $RESULTS_PATH \
--checkpoints_path $CHECKPOINTS_PATH \
--model_name $MODEL_NAME \
--dataset_name $DATASET_NAME \
--input_type $INPUT_TYPE \
--property_name $PROPERTY_NAME

Then run

 bash scripts/evaluate.sh

Training LLM-Prop and MatBERT from scratch

Add any modification to the following scripts to train.sh

#!/usr/bin/env bash

DATA_PATH='data/' # where LLM4Mat_Bench data is saved
RESULTS_PATH='results/' # where to save the results
CHECKPOINTS_PATH='checkpoints/' # where to save model weights 
MODEL_NAME='llmprop' # or 'matbert'
DATASET_NAME='mp' # any dataset name in LLM4Mat_Bench
INPUT_TYPE='formula' # other values: 'cif_structure' and 'description'
PROPERTY_NAME='band_gap' # any property name in $DATASET_NAME. Please check the property names associated with each dataset first
MAX_LEN=256 # for testing purposes only, the default value is 888 while 2000 has shown to give the best performance
EPOCHS=5 #for testing purposes only, the default value is 200

python code/llmprop_and_matbert/train.py \
--data_path $DATA_PATH \
--results_path $RESULTS_PATH \
--checkpoints_path $CHECKPOINTS_PATH \
--model_name $MODEL_NAME \
--dataset_name $DATASET_NAME \
--input_type $INPUT_TYPE \
--property_name $PROPERTY_NAME \
--max_len $MAX_LEN \
--epochs $EPOCHS

Then run

bash scripts/train.sh

Generating the property values with LLaMA2-7b-chat model

Add any modification to the following scripts to llama_inference.sh

#!/usr/bin/env bash

DATA_PATH='data/' # where LLM4Mat_Bench data is saved
RESULTS_PATH='results/' # where to save the results
DATASET_NAME='mp' # any dataset name in LLM4Mat_Bench
INPUT_TYPE='formula' # other values: 'cif_structure' and 'description'
PROPERTY_NAME='band_gap' # any property name in $DATASET_NAME. Please check the property names associated with each dataset first
PROMPT_TYPE='zero_shot' # 'few_shot' can also be used here which let llama see five examples before it generates the answer
MAX_LEN=800 # max_len and batch_size can be modified according to the available resources
BATCH_SIZE=8

python code/llama/llama_inference.py \
--data_path $DATA_PATH \
--results_path $RESULTS_PATH \
--dataset_name $DATASET_NAME \
--input_type $INPUT_TYPE \
--property_name $PROPERTY_NAME \
--prompt_type $PROMPT_TYPE \
--max_len $MAX_LEN \
--batch_size $BATCH_SIZE

Then run

bash scripts/llama_inference.sh

Evaluating the LLaMA results

After running bash scripts/llama_inference.sh, add any modification to the following scripts to llama_evaluate.sh

#!/usr/bin/env bash

DATA_PATH='data/' # where LLM4Mat_Bench data is saved
RESULTS_PATH='results/' # where to save the results
DATASET_NAME='mp' # any dataset name in LLM4Mat_Bench
INPUT_TYPE='formula' # other values: 'cif_structure' and 'description'
PROPERTY_NAME='band_gap' # any property name in $DATASET_NAME. Please check the property names associated with each dataset first
PROMPT_TYPE='zero_shot' # 'few_shot' can also be used here which let llama see five examples before it generates the answer
MAX_LEN=800 # max_len and batch_size can be modified according to the available resources
BATCH_SIZE=8
MIN_SAMPLES=2 # minimum number of valid outputs from llama (the default number is 10)

python code/llama/evaluate.py \
--data_path $DATA_PATH \
--results_path $RESULTS_PATH \
--dataset_name $DATASET_NAME \
--input_type $INPUT_TYPE \
--property_name $PROPERTY_NAME \
--prompt_type $PROMPT_TYPE \
--max_len $MAX_LEN \
--batch_size $BATCH_SIZE \
--min_samples $MIN_SAMPLES

Then run

bash scripts/llama_evaluate.sh

Data LICENSE

The data LICENSE belongs to the original creators of each dataset/database.

Leaderboard

Input	Model	MP		JARVIS-DFT	GNoME	hMOF	Cantor HEA	JARVIS-QETB	OQMD	QMOF	SNUMAT		OMDB
		Regression	Classification	Regression	Regression	Regression	Regression	Regression	Regression	Regression	Classification	Regression	Regression
		8 tasks	2 tasks	20 tasks	6 tasks	7 tasks	6 tasks	4 tasks	2 tasks	4 tasks	4 tasks	3 tasks	1 task
CIF	CGCNN (baseline)	5.319	0.846	7.048	19.478	2.257	17.780	61.729	14.496	3.076	1.973	0.722	2.751
Comp.	Llama 2-7b-chat:0S	0.389	0.491	Inval.	0.164	0.174	0.034	0.188	0.105	0.303	0.940	Inval.	0.885
	Llama 2-7b-chat:5S	0.627	0.507	0.704	0.499	0.655	0.867	1.047	1.160	0.932	1.157	0.466	1.009
	MatBERT-109M	5.317	0.722	4.103	12.834	1.430	6.769	11.952	5.772	2.049	1.828	0.712	1.554
	LLM-Prop-35M	4.394	0.691	2.912	15.599	1.479	8.400	59.443	6.020	1.958	1.509	0.719	1.507
CIF	Llama 2-7b-chat:0S	0.392	0.501	0.216	6.746	0.214	0.022	0.278	0.028	0.119	0.682	0.489	0.159
	Llama 2-7b-chat:5S	Inval.	0.502	Inval.	Inval.	Inval.	Inval.	1.152	1.391	Inval.	Inval.	0.474	0.930
	MatBERT-109M	7.452	0.750	6.211	14.227	1.514	9.958	47.687	10.521	3.024	2.131	0.717	1.777
	LLM-Prop-35M	8.554	0.738	6.756	16.032	1.623	15.728	97.919	11.041	3.076	1.829	0.660	1.777
Descr.	Llama 2-7b-chat:0S	0.437	0.500	0.247	0.336	0.193	0.069	0.264	0.106	0.152	0.883	Inval.	0.155
	Llama 2-7b-chat:5S	0.635	0.502	0.703	0.470	0.653	0.820	0.980	1.230	0.946	1.040	0.568	1.001
	MatBERT-109M	7.651	0.735	6.083	15.558	1.558	9.976	46.586	11.027	3.055	2.152	0.730	1.847
	LLM-Prop-35M	9.116	0.742	7.204	16.224	1.706	15.926	93.001	9.995	3.016	1.950	0.735	1.656

Results for MP dataset. The performance on regression tasks is evaluated in terms of MAD:MAE ratio (the higher the better) while that of classification tasks (Is Stable and Is Gab Direct) is evaluated in terms of AUC score. FEPA: Formation Energy Per Atom, EPA: Energy Per Atom.

Input	Model	MP Dataset
		FEPA	Bandgap	EPA	Ehull	Efermi	Density	Density Atomic	Volume	Is Stable	Is Gab Direct
		145.2K	145.3K	145.2K	145.2K	145.2K	145.2K	145.2K	145.2K	145.2K	145.2K
CIF	CGCNN (baseline)	8.151	3.255	7.224	3.874	3.689	8.773	5.888	1.703	0.882	0.810
Comp.	Llama 2-7b-chat:0S	0.008	0.623	0.009	0.001	0.003	0.967	0.754	0.747	0.500	0.482
	Llama 2-7b-chat:5S	0.33	1.217	0.239	0.132	0.706	0.899	0.724	0.771	0.502	0.512
	MatBERT-109M	8.151	2.971	9.32	2.583	3.527	7.626	5.26	3.099	0.764	0.681
	LLM-Prop-35M	7.482	2.345	7.437	2.006	3.159	6.682	3.523	2.521	0.746	0.636
CIF	Llama 2-7b-chat:0S	0.032	0.135	0.022	0.001	0.015	0.97	0.549	1.41	0.503	0.499
	Llama 2-7b-chat:5S	Inval.	1.111	0.289	Inval.	0.685	0.98	0.99	0.926	0.498	0.506
	MatBERT-109M	11.017	3.423	13.244	3.808	4.435	10.426	6.686	6.58	0.790	0.710
	LLM-Prop-35M	14.322	3.758	17.354	2.182	4.515	13.834	4.913	7.556	0.776	0.700
Descr.	Llama 2-7b-chat:0S	0.019	0.633	0.023	0.001	0.008	1.31	0.693	0.807	0.500	0.500
	Llama 2-7b-chat:5S	0.394	1.061	0.297	0.247	0.684	0.916	0.782	0.704	0.500	0.504
	MatBERT-109M	11.935	3.524	13.851	4.085	4.323	9.9	6.899	6.693	0.794	0.713
	LLM-Prop-35M	15.913	3.931	18.412	2.74	4.598	14.388	4.063	8.888	0.794	0.690

Results for JARVIS-DFT. The performance on regression tasks is evaluated in terms of MAD:MAE ratio (the higher the better). FEPA: Formation Energy Per Atom, Tot. En.: Total Energy, Exf. En.: Exfoliation Energy.

Input	Model	JARVIS-DFT Dataset
		FEPA	Bandgap (OPT)	Tot. En.	Ehull	Bandgap (MBJ)	Kv	Gv	SLME	Spillage	ε_x (OPT)	ε (DFPT)	Max. Piezo. (dij)	Max. Piezo. (eij)	Max. EFG	Exf. En.	Avg. m_e	n-Seebeck	n-PF	p-Seebeck	p-PF
		75.9K	75.9K	75.9K	75.9K	19.8K	23.8K	23.8K	9.7K	11.3K	18.2K	4.7K	3.3K	4.7K	11.8K	0.8K	17.6K	23.2K	23.2K	23.2K	23.2K
CIF	CGCNN (baseline)	13.615	4.797	22.906	1.573	4.497	3.715	2.337	1.862	1.271	2.425	1.12	0.418	1.291	1.787	0.842	1.796	2.23	1.573	3.963	1.59
Comp.	Llama 2-7b-chat:0S	0.021	0.011	0.02	0.005	0.92	0.428	0.374	0.148	Inval.	0.18	0.012	0.121	0.001	0.141	0.384	0.028	0.874	0.801	0.971	0.874
	Llama 2-7b-chat:5S	0.886	0.011	0.02	1.292	0.979	0.88	0.992	0.456	0.85	1.148	1.416	1.289	1.305	0.765	0.512	0.535	1.008	1.04	0.93	0.568
	MatBERT-109M	6.808	4.083	9.21	2.786	3.755	2.906	1.928	1.801	1.243	2.017	1.533	1.464	1.426	1.658	1.124	2.093	1.908	1.318	2.752	1.356
	LLM-Prop-35M	4.765	2.621	5.936	2.073	2.922	2.162	1.654	1.575	1.14	1.734	1.454	1.447	1.573	1.38	1.042	1.658	1.725	1.145	2.233	1.285
CIF	Llama 2-7b-chat:0S	0.023	0.011	0.02	0.002	0.193	0.278	0.358	0.186	0.702	0.781	0.033	0.104	0.001	0.246	0.411	0.041	0.429	0.766	0.83	0.826
	Llama 2-7b-chat:5S	0.859	Inval.	Inval.	1.173	1.054	0.874	0.91	0.486	0.916	1.253	Inval.	Inval.	Inval.	0.796	0.51	Inval.	1.039	1.396	Inval.	Inval.
	MatBERT-109M	10.211	5.483	15.673	4.862	5.344	4.283	2.6	2.208	1.444	2.408	1.509	1.758	2.405	2.143	1.374	2.45	2.268	1.446	3.337	1.476
	LLM-Prop-35M	12.996	3.331	22.058	2.648	4.93	4.121	2.409	2.175	1.37	2.135	1.578	2.103	2.405	1.936	1.044	1.796	1.955	1.332	2.503	1.399
Descr.	Llama 2-7b-chat:0S	0.007	0.011	0.02	0.004	0.94	0.498	0.382	0.07	0.135	0.647	0.08	0.266	0.001	0.138	0.285	0.019	0.769	0.793	0.825	0.829
	Llama 2-7b-chat:5S	0.845	0.011	0.02	1.273	1.033	0.87	0.969	0.461	0.857	1.201	1.649	1.174	1.152	0.806	0.661	0.523	1.098	1.024	0.948	0.563
	MatBERT-109M	10.211	5.33	15.141	4.691	5.01	4.252	2.623	2.178	1.452	2.384	1.534	1.807	2.556	2.081	1.36	2.597	2.241	1.432	3.26	1.565
	LLM-Prop-35M	12.614	3.427	23.509	4.532	4.983	4.128	2.419	2.061	1.307	2.334	1.64	2.116	2.315	1.978	1.168	1.858	2.154	1.364	2.61	1.407

Results for SNUMAT. The performance on regression tasks is evaluated in terms of MAD:MAE ratio (the higher the better) while that of classification tasks (Is Direct, Is Direct HSE, and SOC) is evaluated in terms of AUC score.

Input	Model	SNUMAT Dataset
		Bandgap GGA	Bandgap HSE	Bandgap GGA Optical	Bandgap HSE Optical	Is Direct	Is Direct HSE	SOC
		10.3K	10.3K	10.3K	10.3K	10.3K	10.3K	10.3K
CIF	CGCNN (baseline)	2.075	2.257	1.727	1.835	0.691	0.675	0.800
Comp.	Llama 2-7b-chat:0S	0.797	0.948	1.156	0.859	0.503	0.484	Inval.
	Llama 2-7b-chat:5S	1.267	1.327	0.862	1.174	0.475	0.468	0.455
	MatBERT-109M	1.899	1.975	1.646	1.793	0.671	0.645	0.820
	LLM-Prop-35M	1.533	1.621	1.392	1.491	0.647	0.624	0.829
CIF	Llama 2-7b-chat:0S	0.346	0.454	1.09	0.838	0.479	0.488	0.500
	Llama 2-7b-chat:5S	Inval.	Inval.	Inval.	Inval.	0.494	0.500	0.427
	MatBERT-109M	2.28	2.472	1.885	1.889	0.677	0.650	0.823
	LLM-Prop-35M	1.23	2.401	1.786	1.9	0.661	0.664	0.656
Descr.	Llama 2-7b-chat:0S	0.802	0.941	1.013	0.779	0.499	0.509	Inval.
	Llama 2-7b-chat:5S	0.774	1.315	0.901	1.172	0.594	0.623	0.486
	MatBERT-109M	2.298	2.433	1.901	1.978	0.683	0.645	0.862
	LLM-Prop-35M	2.251	2.142	1.84	1.569	0.681	0.657	0.866

Results for GNoME. The performance on regression tasks is evaluated in terms of MAD:MAE ratio (the higher the better). FEPA: Formation Energy Per Atom, DEPA: Decomposition Energy Per Atom, Tot. En.: Total Energy.

Input	Model	GNoME Dataset
		FEPA	Bandgap	DEPA	Tot. En.	Volume	Density
		376.2K	282.7K	376.2K	282.7K	282.7K	282.7K
CIF	CGCNN (baseline)	34.57	8.549	2.787	7.443	7.967	56.077
Comp.	Llama 2-7b-chat:0S	0.002	0.177	0.0	0.088	0.455	0.368
	Llama 2-7b-chat:5S	0.194	0.086	0.255	0.765	1.006	0.865
	MatBERT-109M	30.248	4.692	2.787	8.57	13.157	15.145
	LLM-Prop-35M	25.472	3.735	1.858	21.624	16.556	25.615
CIF	Llama 2-7b-chat:0S	0.003	0.045	0.0	0.706	43.331	0.794
	Llama 2-7b-chat:5S	Inval.	0.087	Inval.	Inval.	1.029	0.878
	MatBERT-109M	24.199	9.16	3.716	15.309	16.691	16.467
	LLM-Prop-35M	28.469	3.926	3.344	17.837	17.082	25.615
Descr.	Llama 2-7b-chat:0S	0.002	0.114	0.0	0.661	0.654	0.805
	Llama 2-7b-chat:5S	0.192	0.086	0.106	0.75	1.006	0.891
	MatBERT-109M	30.248	5.829	3.716	18.205	17.824	16.599
	LLM-Prop-35M	28.469	5.27	3.716	17.02	17.02	25.936

Results for hMOF. The performance on regression tasks is evaluated in terms of MAD:MAE ratio (the higher the better).

Input	Model	hMOF Dataset
		Max CO2	Min CO2	LCD	PLD	Void Fraction	Surface Area m²g	Surface Area m²cm³
		132.7K	132.7K	132.7K	132.7K	132.7K	132.7K	132.7K
CIF	CGCNN (baseline)	1.719	1.617	1.989	1.757	2.912	3.765	2.039
Comp.	Llama 2-7b-chat:0S	0.011	0.002	0.009	0.008	0.5	0.454	0.233
	Llama 2-7b-chat:5S	0.679	0.058	0.949	1.026	0.945	0.567	0.366
	MatBERT-109M	1.335	1.41	1.435	1.378	1.57	1.517	1.367
	LLM-Prop-35M	1.41	1.392	1.432	1.468	1.672	1.657	1.321
CIF	Llama 2-7b-chat:0S	0.017	0.003	0.016	0.011	0.549	0.54	0.359
	Llama 2-7b-chat:5S	Inval.	Inval.	0.951	1.067	Inval.	Inval.	Inval.
	MatBERT-109M	1.421	1.428	1.544	1.482	1.641	1.622	1.461
	LLM-Prop-35M	1.564	1.41	1.753	1.435	1.9	1.926	1.374
Descr.	Llama 2-7b-chat:0S	0.129	0.014	0.026	0.006	0.382	0.497	0.299
	Llama 2-7b-chat:5S	0.684	0.058	0.955	1.006	0.931	0.571	0.37
	MatBERT-109M	1.438	1.466	1.602	1.511	1.719	1.697	1.475
	LLM-Prop-35M	1.659	1.486	1.623	1.789	1.736	2.144	1.508

Results for Cantor HEA. The performance on regression tasks is evaluated in terms of MAD:MAE ratio (the higher the better). FEPA: Formation Energy Per Atom, EPA: Energy Per Atom, VPA: Volume Per Atom.

Input	Model	Cantor HEA Dataset
		FEPA	EPA	Ehull	VPA
		84.0K	84.0K	84.0K	84.0K
CIF	CGCNN (baseline)	9.036	49.521	9.697	2.869
Comp.	Llama 2-7b-chat:0S	0.005	0.098	0.003	0.031
	Llama 2-7b-chat:5S	0.896	0.658	0.928	0.986
	MatBERT-109M	3.286	16.17	5.134	2.489
	LLM-Prop-35M	3.286	22.638	5.134	2.543
CIF	Llama 2-7b-chat:0S	0.001	0.084	0.0	0.004
	Llama 2-7b-chat:5S	Inval.	Inval.	Inval.	Inval.
	MatBERT-109M	7.229	17.607	9.187	5.809
	LLM-Prop-35M	8.341	36.015	11.636	6.919
Descr.	Llama 2-7b-chat:0S	0.001	0.101	0.164	0.011
	Llama 2-7b-chat:5S	0.797	0.615	0.938	0.93
	MatBERT-109M	7.229	17.607	9.187	5.881
	LLM-Prop-35M	8.341	36.015	11.636	7.713

Results for QMOF. The performance on regression tasks is evaluated in terms of MAD:MAE ratio (the higher the better). Tot. En.: Total Energy.

Input	Model	QMOF Dataset
		Bandgap	Tot. En.	LCD	PLD
		7.6K	7.6K	7.6K	7.6K
CIF	CGCNN (baseline)	2.431	1.489	4.068	4.317
Comp.	Llama 2-7b-chat:0S	0.901	0.26	0.045	0.009
	Llama 2-7b-chat:5S	0.648	0.754	1.241	1.086
	MatBERT-109M	1.823	1.695	2.329	2.349
	LLM-Prop-35M	1.759	1.621	2.293	2.157
CIF	Llama 2-7b-chat:0S	0.201	0.244	0.02	0.011
	Llama 2-7b-chat:5S	Inval.	Inval.	Inval.	Inval.
	MatBERT-109M	1.994	4.378	2.908	2.818
	LLM-Prop-35M	2.166	4.323	2.947	2.87
Descr.	Llama 2-7b-chat:0S	0.358	0.217	0.025	0.006
	Llama 2-7b-chat:5S	0.777	0.713	1.125	1.17
	MatBERT-109M	2.166	4.133	2.981	2.941
	LLM-Prop-35M	2.091	4.312	2.831	2.829

Results for JARVIS-QETB. The performance on regression tasks is evaluated in terms of MAD:MAE ratio (the higher the better). FEPA: Formation Energy Per Atom, EPA: Energy Per Atom, Tot. En.: Total Energy, Ind. Bandgap: Indirect Bandgap.

Input	Model	JARVIS-QETB Dataset
		FEPA	EPA	Tot. En.	Ind. Bandgap
		623.9K	623.9K	623.9K	623.9K
CIF	CGCNN (baseline)	1.964	228.201	11.218	5.534
Comp.	Llama 2-7b-chat:0S	0.003	0.369	0.172	0.21
	Llama 2-7b-chat:5S	0.812	1.037	1.032	1.306
	MatBERT-109M	1.431	37.979	8.19	0.21
	LLM-Prop-35M	2.846	211.757	21.309	1.861
CIF	Llama 2-7b-chat:0S	0.003	0.412	0.656	0.04
	Llama 2-7b-chat:5S	0.8	1.024	1.076	1.71
	MatBERT-109M	24.72	135.156	26.094	4.779
	LLM-Prop-35M	23.346	318.291	48.192	1.845
Descr.	Llama 2-7b-chat:0S	0.003	0.408	0.484	0.16
	Llama 2-7b-chat:5S	0.85	1.015	1.035	1.021
	MatBERT-109M	26.265	122.884	29.409	7.788
	LLM-Prop-35M	22.513	312.218	35.43	1.845

Results for OQMD. The performance on regression tasks is evaluated in terms of MAD:MAE ratio (the higher the better). FEPA: Formation Energy Per Atom.

Input	Model	OQMD Dataset
		FEPA	Bandgap
		963.5K	963.5K
CIF	CGCNN (baseline)	22.291	6.701
Comp.	Llama 2-7b-chat:0S	0.019	0.192
	Llama 2-7b-chat:5S	1.013	1.306
	MatBERT-109M	7.662	3.883
	LLM-Prop-35M	9.195	2.845
CIF	Llama 2-7b-chat:0S	0.009	0.047
	Llama 2-7b-chat:5S	1.051	1.731
	MatBERT-109M	13.879	7.163
	LLM-Prop-35M	18.861	3.22
Descr.	Llama 2-7b-chat:0S	0.025	0.187
	Llama 2-7b-chat:5S	0.991	1.468
	MatBERT-109M	15.012	7.041
	LLM-Prop-35M	16.346	3.644

Results for OMDB. The performance on regression tasks is evaluated in terms of MAD:MAE ratio (the higher the better).

Input	Model	OMDB Dataset
		Bandgap
		12.1K
CIF	CGCNN (baseline)	2.751
Comp.	Llama 2-7b-chat:0S	0.886
	Llama 2-7b-chat:5S	1.009
	MatBERT-109M	1.554
	LLM-Prop-35M	1.507
CIF	Llama 2-7b-chat:0S	0.159
	Llama 2-7b-chat:5S	0.930
	MatBERT-109M	1.777
	LLM-Prop-35M	1.777
Descr.	Llama 2-7b-chat:0S	0.155
	Llama 2-7b-chat:5S	1.002
	MatBERT-109M	1.847
	LLM-Prop-35M	1.656

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
code		code
data/mp		data/mp
figures		figures
results/mp		results/mp
scripts		scripts
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM4Mat-Bench

How to use

Installation

Get the data

Get the checkpoints

Evaluating the trained LLM-Prop and MatBERT

Training LLM-Prop and MatBERT from scratch

Generating the property values with LLaMA2-7b-chat model

Evaluating the LLaMA results

Data LICENSE

Leaderboard

About

Releases

Packages

Languages

vertaix/LLM4Mat-Bench

Folders and files

Latest commit

History

Repository files navigation

LLM4Mat-Bench

How to use

Installation

Get the data

Get the checkpoints

Evaluating the trained LLM-Prop and MatBERT

Training LLM-Prop and MatBERT from scratch

Generating the property values with LLaMA2-7b-chat model

Evaluating the LLaMA results

Data LICENSE

Leaderboard

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages