Skip to content

[Feature] ParallelProcessing Support for dataframes #67

Closed
@caffeine-addictt

Description

@caffeine-addictt

Feature Request

Your issue may already be reported!
Please check out our active issues before creating one.

Is Your Feature Request Related to an Issue?

Data processing in Python is usually done with pandas or other libraries. Datasets created with these libraries are not fully compatible with ParallelProcessing.

Describe the Solution You'd Like

We will not support any one library explicitly as it will increase the maintenance burden oh keeping up-to-date with each of their best practices and breaking changes.

It makes more sense to provide a way to customize how data and length is retrieved with optional arguments.

from thread import ParallelProcessing

ParallelProcessing(
    function=lambda x:x,
    dataset=[1, 2],
    _get_value=lambda dataset, index: dataset[index],
    _length=2
)

Progress

  • Drop explicit Sequence typing
  • Updating ParallelProcessing function signature to include the 2 arguments
  • Overloading __init__ to support optional arguments when dataset supports dataset[index] (__getitem__) and len(dataset) (__len__)
  • Overloading __init__ to require arguments when dataset does not support dataset[index] (__getitem__) and len(dataset) (__len__)
  • Overloading __init__ to require _get_value when dataset does not support dataset[index] (__getitem__)
  • Overloading __init__ to require _length when dataset does not support len(dataset) (__len__)
  • Implementing the logic
  • Updating ParallelProcessing decorator
  • Updating Documentation
  • [4/4] Add tests for ParallelProcessing

Metadata

Metadata

Labels

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions