Cost-Efficient Domain-Adaptive Pretraining of Language Models for Optoelectronics Applications

This repository contains code and resources for training optoelectronics-aware language models. The models are evaluated on various tasks including text classification, question-answering, and embedding abilities.

Introduction

OptoelectronicsLM is a project aimed at developing language models that are specifically aware of optoelectronics concepts. These models are trained on specialized datasets and evaluated on their performance in classification, question-answering, and embedding tasks.

Training and evaluation scripts used in this work for each relevant task are given in the corresponding directory. Note that you will need to change relevanbt file paths and repository locations to suit your own use.

See the associated paper, models and datasets on Hugging Face for more details.

Contributing

We welcome contributions to improve OptoelectronicsLM. Please fork the repository and submit a pull request with your changes. Ensure that your code adheres to the project's coding standards and includes appropriate tests.

Citation

Please use the following the citation if you use any of this codebase in your work.

@article{doi:10.1021/acs.jcim.4c02029,
  author = {Huang, Dingyun and Cole, Jacqueline M.},
  title = {Cost-Efficient Domain-Adaptive Pretraining of Language Models for Optoelectronics Applications},
  journal = {Journal of Chemical Information and Modeling},
  doi = {10.1021/acs.jcim.4c02029},
      note ={PMID: 39933074},
  URL = {https://doi.org/10.1021/acs.jcim.4c02029},
  eprint = {https://doi.org/10.1021/acs.jcim.4c02029}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
abstract-classification		abstract-classification
deepspeed-mlm		deepspeed-mlm
deepspeed-qa		deepspeed-qa
embedding		embedding
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cost-Efficient Domain-Adaptive Pretraining of Language Models for Optoelectronics Applications

Introduction

Contributing

Citation

About

Uh oh!

Releases

Packages

Languages

License

Dingyun-Huang/OptoelectronicsLM-codebase

Folders and files

Latest commit

History

Repository files navigation

Cost-Efficient Domain-Adaptive Pretraining of Language Models for Optoelectronics Applications

Introduction

Contributing

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages