[WIP - do not merge!] Move sparkdl utilities for conversion between numpy arrays and image schema to ImageSchema#90
Conversation
a959b18 to
b533e69
Compare
MrBago
left a comment
There was a problem hiding this comment.
This looks good to me, just a few minor comments.
| dtype=ocvType.nptype, | ||
| buffer=image.data, | ||
| strides=(width * nChannels, nChannels, 1)) | ||
| strides=(width * nChannels * itemSz, nChannels * itemSz, itemSz)) |
There was a problem hiding this comment.
Will numpy figure out the right strides if we don't pass it explicitly?
There was a problem hiding this comment.
Hmm yeah I would think so. The original code from ms folks was like this and I did not want to do more changes than necessary.
|
|
||
| if array.ndim != 3: | ||
| raise ValueError("Invalid array shape") | ||
| raise ValueError("Invalid array shape %s" % str(array.shape)) |
There was a problem hiding this comment.
Do we want to reshape 2d arrays to be shape + (1,)?
There was a problem hiding this comment.
I agree with their approach. I think it's better to make the caller pass the arguments in expected format rather than trying to auto-convert unless that is completely unambiguous.
So in this case, we say images are always 3 dimensional arrays and it's up to the user to make sure they conform to that. Otherwise they might be passing something else than they think they are passing and we would mask their bug until later.
python/sparkdl/image/image.py
Outdated
| "Unexpected/unsupported array data type '%s', currently only supported formats are %s" % | ||
| (str( | ||
| array.dtype), str( | ||
| self._numpyToOcvMap.keys()))) |
There was a problem hiding this comment.
Can we get this on fewer lines or sue some variables, it looks odd.
There was a problem hiding this comment.
Yeah it does, I think it's the autopep8 being weird here. I'll reformat that.
| testDefaultReadWrite(featurizer) | ||
| } | ||
|
|
||
|
|
b533e69 to
52c6041
Compare
python/sparkdl/image/image.py
Outdated
| dataType=x.dataType(), | ||
| nptype=self._ocvToNumpyMap[x.dataType()]) | ||
| for x in ctx._jvm.org.apache.spark.ml.image.ImageSchema.javaOcvTypes()] | ||
| return [x for x in self._ocvTypes] |
There was a problem hiding this comment.
What does this do? Isn't self._ocvTypes already a list?
There was a problem hiding this comment.
The purpose was to return copy of the list so that the private member can not be modified.
There was a problem hiding this comment.
oic, I usually see myList[:] or list(myList) to make a (shallow) copy.
There was a problem hiding this comment.
Thanks, that's nicer :)
The members of the list are tuples so shallow copy suffices here.
52c6041 to
f616462
Compare
Codecov Report
@@ Coverage Diff @@
## master #90 +/- ##
==========================================
+ Coverage 82.49% 83.92% +1.43%
==========================================
Files 33 33
Lines 1879 1866 -13
Branches 35 39 +4
==========================================
+ Hits 1550 1566 +16
+ Misses 329 300 -29
Continue to review full report at Codecov.
|
f616462 to
c8c90e0
Compare
sueann
left a comment
There was a problem hiding this comment.
Are we planning to merge this? It'll be difficult to maintain and resolve the differences in the copied files between DL Pipelines and Spark. It'd be much easier cognitively to merge these changes into Spark then remove the corresponding files in sparkdl. If we need to keep these files in sparkdl until Spark 2.4 is out, it'd be safer to first get the changes merged into Spark then copy the exact changes here; if we merge this first, it could easily get out of sync with whatever revisions get made in Spark.
|
@sueann Yes I agree, I would merge spark version first and merge this one only after spark 2.4 is released. I made the PR here mostly because that's what we need the changes for, so it can be reviewed in context, also to run tests. I'll mark it WIP. |
|
ah ok got it. thanks! |
[WIP] Preparation for moving stuff to Spark.
Moved utilities for image schema <=> numpy array conversion to (copy pasted from spark 2.3) Image schema code.