Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undefined Behavior: using numpy.array() to convert instances of PIL.TiffImagePlugin.TiffImageFile to ndarrays after opening TIFF with Image.open() #4281

Closed
JordanPavlic opened this issue Dec 20, 2019 · 22 comments
Labels

Comments

@JordanPavlic
Copy link

JordanPavlic commented Dec 20, 2019

Since I spent multiple days trying to address this issue in my project and cannot find any documented information on why this could be occurring on either StackExchange or the Pillow Issues board, I'm posting it here for some help.

What did you do?

Attempted to import a .tif image using Image.open() and then convert the Image to an array.
image

What did you expect to happen?

Successful conversion of the pillow image to a numpy ndarray.
As documented per #3301

What actually happened?

Both numpy.array() and numpy.asarray() returned an instance of Pil.TiffImagePlugin.TiffImageFile instead of a numpy ndarray when called once. Calling numpy.asarray() or numpy.array() more than once on the instance of Pil.TiffImagePlugin.TiffImageFile causes the numpy.asarray() or numpy.array() to return a numpy ndarray even if no assignment of the returned value from the first call occurs. it's as if the call to numpy.array() or numpy.asarray() is mutating the Pil.TiffImagePlugin.TiffImageFile instance.

I ran the following tests to show the oddity of the behavior, each time I got the console print message "tempfile.tif: Cannot read TIFF header." This error message was not part of my code base and was not raise with an exception. The image I'm using is a scanned .tiff image in grayscale sent through Microsoft outlook.

Case 1:

Code:
Code#1
Result:
Result#1

Case 2:

Code:
Code#2

Result:
Result#2

Case 3 :

Code:
Code#3

Result:
Result#3

Case 4:

Code:
Code#4

Result:
Result#4

Case 5:

Code:
Code#5

Result:
Result#5

Unfortunately, I'm unable to post the image I'm analyzing as it contains confidential information from my employer. However, since the behavior of the library itself is the really strange thing I'm hoping that should be enough to start looking into why this strange behavior is occurring.

However, here is some file information for the file that I can provide:
image

What versions of Pillow and Python are you using?
image
image
image

@radarhere
Copy link
Member

You mention #3301. In that issue, users are instructed to use asarray. You are just using array.

@JordanPavlic
Copy link
Author

JordanPavlic commented Dec 20, 2019

I also evaluated with asarray instead of array and confirmed the issue is still present. See Case 4 in "What actually happened?" above.

@radarhere
Copy link
Member

If it helps, the error isn't technically coming from our code either - it is coming from libtiff, at either https://gitlab.com/libtiff/libtiff/blob/master/libtiff/tif_open.c#L291 or https://gitlab.com/libtiff/libtiff/blob/master/libtiff/tif_open.c#L415

@radarhere
Copy link
Member

Regarding your comment about the returned values, I actually can't reproduce this with Pillow 5.3. The following script just gives me five True values.

from PIL import Image
import numpy as np

im = Image.open("Tests/images/hopper.tif")
nd = np.array(im)
a = str(nd)
b = str(np.array(im))
print(a == b)

im = Image.open("Tests/images/hopper.tif")
np.array(im)
nd = np.array(im)
a = str(nd)
b = str(np.array(im))
print(a == b)

im = Image.open("Tests/images/hopper.tif")
nd1 = np.array(im)
nd2 = np.array(im)
a = str(nd2)
b = str(np.array(nd1))
print(a == b)

im = Image.open("Tests/images/hopper.tif")
nd = np.asarray(im)
a = str(nd)
b = str(np.asarray(im))
print(a == b)

im = Image.open("Tests/images/hopper.tif")
np.asarray(im)
nd = np.asarray(im)
a = str(nd)
b = str(np.asarray(im))
print(a == b)

@JordanPavlic
Copy link
Author

JordanPavlic commented Dec 20, 2019

That's definitely helpful!

Another strange addition to this issue:
Code:
image
Result:
image

image

I added the above code and started to run my main script again, which iterates through all files in the drive location. It appears that at the numpy.array(img) the console is now waiting for keyboard input to continue. At the bottom of the result you can see that I'm on numpy.array(img), but execution doesn't continue until I press a key. I wonder if something in libtiff is holding up execution?

regarding your last post, can you post your test image? I would like to run your test code on my system and then post back the results.

Edit: I've also confirmed the same behavior on my end using numpy.asarray() as well.

@radarhere
Copy link
Member

@JordanPavlic
Copy link
Author

JordanPavlic commented Dec 20, 2019

Just ran the following and got your result:

I ran using your hopper.tif here just to clarify by the way.

Code:
image

Result:
image

An interesting thing I noticed is that I did not get the internal libtif error this time around. I'll see if I can replicate the process for creating the image I'm analyzing without any confidential materials present.

@JordanPavlic
Copy link
Author

JordanPavlic commented Dec 20, 2019

Also, rerunning your tests with the image I'm analyzing currently gives me the following result:
image

Edit: as with the above, I realized this was waiting on keyboard input to continue as well. Here are the results when I hit enter for each step:
image

@JordanPavlic
Copy link
Author

Here is an example of an image that's causing the issue to appear:
error_causing_image.zip

@radarhere
Copy link
Member

Thanks, but I'm still not seeing a problem.

Here's another thought - the description of needing to hit Enter, the sqlalchemy lines being printed out, and the .filePath that suggests that you're using Twisted (but maybe not) - all of this sounds like interference from other libraries. It would be ideal if you could put together a script that could be run by itself, from start to finish, that shows the problem.

@JordanPavlic
Copy link
Author

I just did a quick check of both pip list and conda list and I don't believe I have twisted installed. The .filePath comes from one of my internal classes I use to hold file metadata and the filepath itself is a pathlib object.

I pulled out your test script and modified it to run on it's own with the image in the same directory. Hopefully this is what you were after, if not just let me know and I'll see what I can do.

It still seems to wait for key input and fails the tests on my end. I've attached the code I'm running as well as the image. I'll go ahead and reinstall both packages for numpy and pillow just to see if that resolves it.

Code:
image

Result:
image

Here is a copy of the files executed:
Test Code.zip

@JordanPavlic
Copy link
Author

I reinstalled both numpy and pillow and no change.

@radarhere
Copy link
Member

Thanks for your efforts, but I still can't reproduce. I've also tried on our CI jobs, and found no success there either.

@JordanPavlic
Copy link
Author

Thanks radarhere, I really appreciate all the help on this. I'm going to try a clean conda environment with only the required dependencies next and see if that works. I'll let you know how it goes.

@radarhere radarhere changed the title Undefined Behavior: using numpy.array() to convert instances of Pil.TiffImagePlugin.TiffImageFile to ndarrays after opening TIFF with Image.Open() Undefined Behavior: using numpy.array() to convert instances of PIL.TiffImagePlugin.TiffImageFile to ndarrays after opening TIFF with Image.Open() Dec 24, 2019
@radarhere radarhere changed the title Undefined Behavior: using numpy.array() to convert instances of PIL.TiffImagePlugin.TiffImageFile to ndarrays after opening TIFF with Image.Open() Undefined Behavior: using numpy.array() to convert instances of PIL.TiffImagePlugin.TiffImageFile to ndarrays after opening TIFF with Image.open() Dec 25, 2019
@kmilos
Copy link
Contributor

kmilos commented Dec 26, 2019

Could this be related to #4237, does it work when downgrading libtiff to 4.0.10?

Also, this might not be ndarray conversion specific, just the exception hidden like in the mentioned issue and #3863. Do you maybe have a more descriptive failure on im.load() beforehand?

@JordanPavlic
Copy link
Author

JordanPavlic commented Jan 2, 2020

Alright, I just did a conda install --force on libtiff to downgrade my version of libtiff and only libtiff to 4.0.10 and reran the test in the attached test code above.

My previous conda version of libtiff:
image

My current conda version of libtiff:
image

After installing libtiff 4.0.10, I reran the attached test code and test image again and got the following output:
image

To confirm I can turn the issue on and off with a version change of libtiff, I then did a conda install --force for libtiff 4.1.0 and again got the following after running the test code on the test image:
image

So downgrading to libtiff 4.0.10 resolves the issue and upgrading to libtiff 4.1.0 causes the issue to appear again.

Regarding: "Do you maybe have a more descriptive failure on im.load() beforehand?":
unfortunately no, the print statement "tempfile.tif: Cannot read TIFF header" is the only description I get when failure occurs. There's no stack trace that allows me to trace the issue further.

@JordanPavlic
Copy link
Author

JordanPavlic commented Jan 2, 2020

in case it matters, I should also add I'm using Windows.

@JordanPavlic
Copy link
Author

JordanPavlic commented Jan 2, 2020

Also, I just created a clean test environment in conda with only the required dependencies to run test.py in Test Code.zip attached above. Here I test switching out only the libtiff versions to isolate the issue with consistent versions of all other environment packages.

Here is the complete conda environment package version list:
image

This environment is using libtiff 4.1.0

Running test.py on this environment outputs the following:
image

After performing the above test, I then installed libtiff 4.0.10 and the results are below.

Here is the new complete conda environment package version list (only change is libtiff version):
image

This environment is using libtiff 4.0.10

Running test.py on this new environment outputs the following:
image

Hopefully this helps with reproducing the issue.

@JordanPavlic
Copy link
Author

JordanPavlic commented Jan 2, 2020

Doing some experimentation, it appears that libtiff's behavior changed between 4.1.0 and 4.0.10 with regard to how it handles jpeg compressed tif images.

I did an experimented where I looked at the hopper.tif image ( uncompressed tif ), my test tif image (jpeg compressed tif), a decompression of my test image using gimp (uncompressed tif), and a straight import into gimp and export as a jpeg compressed tif of my test tif image ( jpeg compressed tif). This is using the cleanTestEnvironment shown above. The results show that in libtiff 4.1.0 versus libtiff 4.0.10 the behavior surrounding jpeg compressed tif images is different and this appears to be directly related the issue that's occurring here.

libtiff 4.0.10:
image

libtiff 4.1.0:
image

This appears very closely related to #4237

Attached is the code and images for this analysis:
Test Code.zip

@radarhere
Copy link
Member

#4237 should be fixed in Pillow 8.2.0. If you upgrade, is this resolved?

@radarhere
Copy link
Member

@pigfat ?

@radarhere
Copy link
Member

Closing. This isn't reproducible, and a potential fix has been merged. This can be re-opened if there are any further comments.

Also, if anyone experiences a similar problem with numpy, #5379 should help by showing the hidden error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants