Skip to content

[FEA]: Refactor AutoEncoder to utilize cuDF as much as possible #1166

Open

Description

Is this a new feature, an improvement, or a change to existing functionality?

Improvement

How would you describe the priority of this feature request

Medium

Please provide a clear description of problem this feature solves

Currently, the AutoEncoder class utilizes pandas to perform much of the pre-processing necessary before converting the DataFrames to pytorch tensors needed for training. This is much slower and requires converting from cudf -> pandas -> GPU tensor, which is very inefficient.

Describe your ideal solution

Where possible, replace all uses of pandas with cuDF. This would require:

  • Remove all uses of apply() by using alternate cuDF functions (where possible). We should strive to never use apply()
  • Replace all uses of pandas with cudf
  • Ensure all tests continue to pass without changes

Describe any alternatives you have considered

No response

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
  • I have searched the open feature requests and have found no duplicates for this feature request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

dfp[Workflow] Related to the Digital Fingerprinting (DFP) workflowfeature requestNew feature or request

Type

No type

Projects

  • Status

    Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions