-
Notifications
You must be signed in to change notification settings - Fork 972
Add memory resources to all nvtext APIs #20119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
rapids-bot
merged 11 commits into
rapidsai:branch-25.12
from
vyasr:feat/memory_resource_part9
Sep 26, 2025
Merged
Add memory resources to all nvtext APIs #20119
rapids-bot
merged 11 commits into
rapidsai:branch-25.12
from
vyasr:feat/memory_resource_part9
Sep 26, 2025
+553
−188
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- nvtext/tokenize: 7 functions (tokenize_scalar, tokenize_column, count_tokens_scalar, count_tokens_column, character_tokenize, detokenize, tokenize_with_vocabulary) All functions now accept optional DeviceMemoryResource parameter for GPU memory management. Updated .pxd, .pyx, and .pyi files with consistent signatures following established patterns. This module provides comprehensive text tokenization capabilities with fine-grained memory control. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
* Update 3 functions to accept DeviceMemoryResource parameter: - generate_ngrams, generate_character_ngrams, hash_character_ngrams * Thread memory resource through Column.from_libcudf calls * Update corresponding .pxd and .pyi files for API consistency * Maintains backwards compatibility with optional mr parameter 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
* Update 2 functions to accept DeviceMemoryResource parameter: - edit_distance, edit_distance_matrix * Thread memory resource through Column.from_libcudf calls * Update corresponding .pxd and .pyi files for API consistency * Add missing Column import to enable compilation * Maintains backwards compatibility with optional mr parameter 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
* Update 2 functions to accept DeviceMemoryResource parameter: - replace_tokens, filter_tokens * Thread memory resource through Column.from_libcudf calls * Update corresponding .pxd and .pyi files for API consistency * Maintains backwards compatibility with optional mr parameter 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Updated normalize.pyx and normalize.pxd to support DeviceMemoryResource for text normalization functions: - normalize_spaces: Normalize whitespace in strings - normalize_characters: Normalize characters for tokenization - Added DeviceMemoryResource parameters to function signatures in .pxd file - Updated functions to accept and process DeviceMemoryResource parameters - Pass memory resource to Column.from_libcudf() for proper memory management
Updated byte_pair_encode.pyx and byte_pair_encode.pxd to support DeviceMemoryResource for byte-pair encoding: - byte_pair_encoding: Encode strings using byte-pair encoding algorithm - Added DeviceMemoryResource parameter to function signature in .pxd file - Updated function to accept and process DeviceMemoryResource parameter - Pass memory resource to Column.from_libcudf() for proper memory management
Updated ngrams_tokenize.pyx and ngrams_tokenize.pxd to support DeviceMemoryResource for n-gram tokenization: - ngrams_tokenize: Generate n-grams from tokenized strings - Added DeviceMemoryResource parameter to function signature in .pxd file - Updated function to accept and process DeviceMemoryResource parameter - Pass memory resource to Column.from_libcudf() for proper memory management
…cate, wordpiece_tokenize)
This commit finalizes DeviceMemoryResource support across all 12 nvtext modules in pylibcudf: **Modules Updated (12/12):** - byte_pair_encode, deduplicate, edit_distance, generate_ngrams - jaccard, minhash, ngrams_tokenize, normalize, replace - stemmer, tokenize, wordpiece_tokenize **Changes Made:** - Added device_memory_resource* parameters to all C++ function declarations in libcudf .pxd files - Updated all .pyx files to include DeviceMemoryResource parameters and mr.get_mr() calls - Added DeviceMemoryResource imports and parameters to all .pyi type stub files - Fixed function signature alignment between .pxd and .pyx files - Ensured proper memory resource handling for functions that support it in C++ **Key Fixes:** - Resolved minhash compilation issue caused by misaligned .pxd/.pyx signatures - Properly handled functions that don't support DeviceMemoryResource (e.g., stemmer.is_letter) - Added comprehensive type annotations for better IDE support **Build Status:** ✅ All modules compile successfully **Test Status:** ✅ Full pylibcudf build passes 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
mroeschke
approved these changes
Sep 26, 2025
/merge |
TomAugspurger
pushed a commit
to TomAugspurger/pygdf
that referenced
this pull request
Sep 26, 2025
Contributes to rapidsai#15170 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Roeschke (https://github.com/mroeschke) URL: rapidsai#20119
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
feature request
New feature or request
non-breaking
Non-breaking change
pylibcudf
Issues specific to the pylibcudf package
Python
Affects Python cuDF API.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Contributes to #15170
Checklist