-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Change to parallel reader #635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost LGTM. @zhxfl please help to verify the correctness of whole process.
""" struct for one block : | ||
contain label, label desc, feature, feature_desc | ||
class SampleInfo(object): | ||
"""SampleInfo holds the necessary information to load an example from disk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an example
-> a sample
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done.
|
||
class DataReader(object): | ||
"""DataReader provides basic audio sample preprocessing pipeline including | ||
I/O and augmentation transforming. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I/O and augmentation transforming
-> data loading and data augmentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
Args: | ||
feature_file_list (str): File containing feature data related files. | ||
label_file_list (str): File containing label data related files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> feature_file_list (str): File that lists the paths of all the feature data and their descriptions.
-> label_file_list (str): File that lists the paths of all the label data and their descriptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
label_file_list (str): File containing label data related files. | ||
frame_dim (int): The final feature dimension of one frame after all | ||
augmentation applied. | ||
drop_frame_len (int): Lower threshold bound to filter samples having |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> drop_frame_len (int): The sequence length threshold above which the samples will be dropped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
batch_samples = [] | ||
lod = [0] | ||
|
||
if len(batch_samples) >= minimum_batch_size: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this if statement be merged with the one in above for
loop? The resemble each other very much.
parser.add_argument( | ||
'--minimum_batch_size', | ||
type=int, | ||
default=32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the proper default value for this argument?
|
||
class SampleInfoBucket(object): | ||
"""SampleInfoBucket contains paths of several description files. Feature | ||
description file contains necessary information to access samples' feature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a simple explanation of necessary information
, you can use ( )
or such as
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done.
"""SampleInfoBucket contains paths of several description files. Feature | ||
description file contains necessary information to access samples' feature | ||
data and label description file contains necessary information to | ||
access samples' label data. SampleInfoBucket is the minimum unit to do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and label description file contains necessary information to access samples' label data
-->, the same with the label description file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done.
self._batch_buffer_size = batch_buffer_size | ||
self._process_num = process_num | ||
|
||
def generate_bucket_list(self, is_shuffle): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private function use function name "_generate_bucket_list" to discriminate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If generate_bucket_list
is exposed, we can shuffle the block for each epoch. Otherwise, the shuffling can only be done once.
We can merge this pr first. |
Resolves #630