Thanks for your interesting and comprehensive survey.
If possible, please consider adding our evaluation work about LLMs in chemistry, "What indeed can GPT models do in chemistry? A comprehensive benchmark on eight tasks" (https://arxiv.org/abs/2305.18365) to the list.
Our work mainly establish a comprehensive benchmark containing 8 practical chemistry tasks to evaluate LLMs (GPT-4, GPT-3.5,and Davinci-003) for each chemistry task in zero-shot and few-shot in-context learning settings. We aim to solve the lack of comprehensive assessment of LLMs in the field of chemistry.
Thanks! 😊