Closed
Description
Feature Request
Your issue may already be reported!
Please check out our active issues before creating one.
Is Your Feature Request Related to an Issue?
Data processing in Python is usually done with pandas or other libraries. Datasets created with these libraries are not fully compatible with ParallelProcessing.
Describe the Solution You'd Like
We will not support any one library explicitly as it will increase the maintenance burden oh keeping up-to-date with each of their best practices and breaking changes.
It makes more sense to provide a way to customize how data and length is retrieved with optional arguments.
from thread import ParallelProcessing
ParallelProcessing(
function=lambda x:x,
dataset=[1, 2],
_get_value=lambda dataset, index: dataset[index],
_length=2
)
Progress
- Drop explicit Sequence typing
- Updating ParallelProcessing function signature to include the 2 arguments
- Overloading __init__ to support optional arguments when dataset supports
dataset[index]
(__getitem__) andlen(dataset)
(__len__) - Overloading __init__ to require arguments when dataset does not support
dataset[index]
(__getitem__) andlen(dataset)
(__len__) - Overloading __init__ to require _get_value when dataset does not support
dataset[index]
(__getitem__) - Overloading __init__ to require _length when dataset does not support
len(dataset)
(__len__) - Implementing the logic
- Updating ParallelProcessing decorator
- Updating Documentation
- [4/4] Add tests for ParallelProcessing