Skip to content

Dingyun-Huang/OptoelectronicsLM-codebase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cost-Efficient Domain-Adaptive Pretraining of Language Models for Optoelectronics Applications

This repository contains code and resources for training optoelectronics-aware language models. The models are evaluated on various tasks including text classification, question-answering, and embedding abilities.

Introduction

OptoelectronicsLM is a project aimed at developing language models that are specifically aware of optoelectronics concepts. These models are trained on specialized datasets and evaluated on their performance in classification, question-answering, and embedding tasks.

Training and evaluation scripts used in this work for each relevant task are given in the corresponding directory. Note that you will need to change relevanbt file paths and repository locations to suit your own use.

See the associated paper, models and datasets on Hugging Face for more details.

Contributing

We welcome contributions to improve OptoelectronicsLM. Please fork the repository and submit a pull request with your changes. Ensure that your code adheres to the project's coding standards and includes appropriate tests.

Citation

Please use the following the citation if you use any of this codebase in your work.

@article{doi:10.1021/acs.jcim.4c02029,
  author = {Huang, Dingyun and Cole, Jacqueline M.},
  title = {Cost-Efficient Domain-Adaptive Pretraining of Language Models for Optoelectronics Applications},
  journal = {Journal of Chemical Information and Modeling},
  doi = {10.1021/acs.jcim.4c02029},
      note ={PMID: 39933074},
  URL = {https://doi.org/10.1021/acs.jcim.4c02029},
  eprint = {https://doi.org/10.1021/acs.jcim.4c02029}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published