Skip to content

Conversation

@dilshad-aee
Copy link

Change Description

Adds documentation for using spaCy transformer models with GPU acceleration.

Addresses #1790.

Checklist

  • I have reviewed the contribution guidelines
  • I have signed the CLA (if required)
  • My code includes unit tests
  • All unit tests and lint checks pass locally
  • My PR contains documentation updates / additions if required

dilshad added 4 commits January 3, 2026 03:31
Addresses microsoft#1790 - Added comprehensive documentation for using GPU
acceleration with spaCy transformer models and other NLP engines.

- New GPU usage guide with examples for spaCy and Hugging Face transformers
- Covers automatic GPU detection, prerequisites, and troubleshooting
- Added cross-references from existing NLP engine documentation
- Updated CHANGELOG and mkdocs navigation
@dilshad-aee
Copy link
Author

@microsoft-github-policy-service agree

]

for text in texts:
results = analyzer.analyze(text=text, language="en")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this benefit from batch processing given that we are sending one text at a time?

If you see `RuntimeError: CUDA out of memory`:

- Process fewer texts at once
- Try a smaller model (`en_core_web_sm` instead of `en_core_web_trf`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a better suggestion is to use a smaller transformers model or shorter text rather than a small spacy model which wouldn't bring much value


### CPU fallback

Presidio will automatically use CPU if:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RonShakutai this is something to keep an eye for- if users have GPU issues and want to fallback to CPU but can't because the DeviceDetector is automated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@omri374 Good point.
Currently the DeviceDetector does handle most GPU initialization failures automatically (it catches exceptions and falls back to CPU with a warning).

However, you're right that there's no way to force CPU and this is should be in seperate PR

omri374
omri374 previously approved these changes Jan 5, 2026
Copy link
Collaborator

@omri374 omri374 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I left some comments to think about, but overall this is good stuff.

@RonShakutai RonShakutai self-requested a review January 5, 2026 18:19
Copy link
Collaborator

@RonShakutai RonShakutai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR,
Few comments to discuss before proceeding

@@ -0,0 +1,301 @@
# GPU Acceleration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reference this markdown from the gpu acceleration section in install.md

@@ -0,0 +1,301 @@
# GPU Acceleration

Presidio supports GPU acceleration for transformer-based NLP models, which can significantly improve performance when processing large volumes of text.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not only transformer-based currently? and potentially this can be extended to local llms for example

!!! tip "Tip"
Use `pip install "spacy[cuda12x]"` (or valid version) to install all necessary GPU dependencies. Ensure the CUDA version matches your system installation.

## Automatic GPU Detection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would drop the entire section into a one liner in on if the previous sections, as its repeating some of the statements (types, not installation, order)


## Usage

### spaCy Transformer Models
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also kind of readundent sections and code sample

!!! tip "Tip"
The `en_core_web_trf` model uses a transformer architecture (RoBERTa) and benefits significantly from GPU acceleration. For best results, ensure CUDA and cupy are installed.

### Hugging Face Transformers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repeating / should be in transformers docs, not gpu, IMO, so would drop this as well


See [Hugging Face models](https://huggingface.co/models?pipeline_tag=token-classification) for more options.

### Checking if GPU is being used
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid section, in GPU we have a trace, but in MPS do don't, right @RonShakutai

# Process results...
```

## GPU-Enabled NLP Engines
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great summary

!!! warning "Warning"
Standard spaCy models (e.g., `en_core_web_lg`) may perform worse on GPU due to overhead. Use GPU primarily for transformer-based models.

## When to Use GPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, repeating some of the stuff above, and also feeds quite not polished as the numbers below, document and text size-wise are resource, configuration and data type dependent?

pip install cupy-cuda12x # or cupy-cuda11x
```

### Out of memory errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repeating

Copy link
Collaborator

@RonShakutai RonShakutai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds more clarity around the hardware support in Presidio.
Lets finalize it ! Thanks !

- Home: analyzer/customizing_nlp_models.md
- Spacy/Stanza: analyzer/nlp_engines/spacy_stanza.md
- Transformers: analyzer/nlp_engines/transformers.md
- GPU Acceleration: analyzer/nlp_engines/gpu_usage.md
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d suggest moving this page one level up and linking it from installation.md.

### GPU acceleration (optional)

@RonShakutai RonShakutai removed the request for review from dorlugasigal January 6, 2026 13:52
@dilshad-aee
Copy link
Author

I have addressed all the comments and streamlined the document accordingly..
please take a look

Copy link
Collaborator

@RonShakutai RonShakutai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lGTM,
Left one last comment ! great work !

=== "Apple Silicon"

No additional dependencies required. MPS is detected automatically.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the bash snippet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants