Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limiting CPU Usage and Optimizing Offline Encoding with Sentence Transformer #2948

Open
hh23485 opened this issue Sep 21, 2024 · 0 comments
Open

Comments

@hh23485
Copy link

hh23485 commented Sep 21, 2024

Hi Everyone,

First of all, thank you for the amazing work on this framework! I’m currently using the Sentence Transformer with the BGE Small EN model for sentence encoding, but I’ve encountered an issue on my server.

My server has 8 CPUs, and the transformer seems to always utilize all of them. However, there are multiple tasks running simultaneously on the server, so I would like to limit the CPU usage to just 2 cores to avoid impacting other tasks.

I’ve attempted the following settings, but they don’t seem to have the desired effect:

# worker_number = 2
torch.set_num_threads(worker_number)
torch.set_num_interop_threads(worker_number)
os.environ["MKLDNN"] = "1"
os.environ["DNNL"] = "1"
os.environ["OMP_NUM_THREADS"] = f"{worker_number}"
os.environ["MKL_NUM_THREADS"] = f"{worker_number}"
os.environ["OPENBLAS_NUM_THREADS"] = f"{worker_number}"

Could anyone provide guidance on how to effectively limit the model to use only 2 CPUs? Additionally, I would appreciate any advice on optimizing offline inference performance using Sentence Transformers on CPUs—what’s the fastest way to achieve this with just 2 cores?

Thanks in advance for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant