Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand Sequence list #692

Merged
merged 5 commits into from
Jul 11, 2019
Merged

Conversation

us
Copy link
Contributor

@us us commented Jun 19, 2019

Sequence gets expanded, for better readability.

Related issue #689

Sequence gets expanded, for better readability.
Related issue tensorflow#689

Please enter the commit message for your changes. Lines starting
@googlebot googlebot added the cla: yes Author has signed CLA label Jun 19, 2019
@Conchylicultor
Copy link
Member

Thank you for fixing this. This fix seems a little hacky though.
I would prefer to use an more explicit check:

if isinstance(feature, FeaturesDict) or (
    isinstance(feature, Sequence) and isinstance(feature.feature, FeaturesDict)
):

@us
Copy link
Contributor Author

us commented Jun 19, 2019

I think for every complicated structure we need to new line. I thought of using {} to break lines. And all keys feature shown with {}.

@us
Copy link
Contributor Author

us commented Jun 19, 2019

At first I did like you, but I chose to write more general code but I can go back if you think code style should more understandable and readable.

@Conchylicultor
Copy link
Member

Conchylicultor commented Jun 20, 2019

The problem with if hasattr(v, 'keys'): is that if another FeatureConnector implement a .key attribute, the code will crash or behave in some unexpected way. I agree that your solution is shorter, but seems more hacky.

Alternatively, would it be possible to modify the FeatureDict.__repr__ to be displayed in multiple lines, such as print(builder.info.feature) is also displayed properly.

print(builder.info.features)
FeaturesDict({
    'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
    'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
})

instead of currently:

FeaturesDict({'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10), 'image': Image(shape=(28, 28, 1), dtype=tf.uint8)})

Then you could update the usage in DatasetInfo.__repr__ and _pprint_features_dict in the doc wouldn't be required anymore.

us added 2 commits June 28, 2019 17:51
`tensorflow_datasets.scripts.document_datasets.pprint_features_dict`
converted to public function and add types parameter.
@us
Copy link
Contributor Author

us commented Jun 29, 2019

@Conchylicultor I find an alternative solution please check it.

Copy link
Member

@Conchylicultor Conchylicultor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks much better.

tensorflow_datasets/core/features/features_dict.py Outdated Show resolved Hide resolved
tensorflow_datasets/scripts/document_datasets.py Outdated Show resolved Hide resolved
tensorflow_datasets/scripts/document_datasets.py Outdated Show resolved Hide resolved
tensorflow_datasets/scripts/document_datasets.py Outdated Show resolved Hide resolved
@@ -138,7 +139,7 @@ def __iter__(self):

def __repr__(self):
"""Display the feature dictionary."""
return '{}({})'.format(type(self).__name__, self._feature_dict)
return pprint_features_dict(self._feature_dict, types='FeaturesDict')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling pprint_features_dict and have the kwargs hack which seems hacky, I feel it would be nicer to have FeaturesDict simply calling repr() recursively on its child.

Something like:

def __repr__(self):
  lines = ['{}({'.format(type(self).__name__)]
  for key, feature in sorted(list(features_dict.items())):
    all_sub_lines = '\'{}\': {},'.format(key, feature)
    lines.extend('    ' + l for l in all_sub_lines.split('\n'))  # Add indentation to all childs
  lines.append('})')
  return lines.join('\n')

This is more generic as this would works with any feature which is using multiple line. And it avoid having to test insinstance(, FeatureDict) or hasattr.
What do you think ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm agree with you but now we don't use the pprint_features_dict func.

Move `pprint_features_dict` func to `features_dict.py`
@Conchylicultor
Copy link
Member

Great, thank you. Looks good. I'm merging it now.
Yes, pprint_features_dict isn't used anymore, so I'll remove it internally before merging.

Also, in features_dict.py, be careful not to import the full API import tensorflow_datasets as tfds, as it creates circular dependency. Instead individual modules should be imported. I'm fixing this internally before merging it too.

@tfds-copybara tfds-copybara merged commit f9f8b37 into tensorflow:master Jul 11, 2019
tfds-copybara pushed a commit that referenced this pull request Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Author has signed CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants