Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoadImageD performance #3322

Closed
myron opened this issue Nov 11, 2021 · 12 comments · Fixed by #3338
Closed

LoadImageD performance #3322

myron opened this issue Nov 11, 2021 · 12 comments · Fixed by #3338
Assignees
Labels
enhancement New feature or request

Comments

@myron
Copy link
Collaborator

myron commented Nov 11, 2021

@drbeh @Nic-Ma
Can I ask, why is LoadImageD with WSIReader is slower than WSIReader directly (both use tiffile backend)
#3251


start = time.time()

wsi_reader =WSIReader( backend="TiffFile", level=1)
transform =  LoadImageD(keys=["image"], reader=WSIReader, backend="TiffFile", dtype=np.uint8, level=1, image_only=True)

wsi_time = []
transform_time=[]

for i in range(100):
    start = time.time()
    img_obj = wsi_reader.read(image_path)
    image = wsi_reader.get_data(img_obj)[0]
    wsi_time.append(time.time()-start)

    start = time.time()
    image = transform({"image" : image_path})['image']
    transform_time.append(time.time()-start)


print('wsi time', sum(wsi_time[1:])/(len(wsi_time)-1), 'transform_time time', sum(transform_time[1:])/(len(transform_time)-1))

wsi time 0.02141376216002185 transform_time time 0.026847357701773596

which is 20% slower, and my overall code runs 10% slower

@myron myron added the enhancement New feature or request label Nov 11, 2021
@drbeh
Copy link
Member

drbeh commented Nov 11, 2021

Hi @myron,
That's interesting. The only thing that I see is thatLoadImage performs additional data type change:

img_array = img_array.astype(self.dtype)

which should cause that much of overhead since in this scenario data is already in unint8 format.

@Nic-Ma are you aware of any overhead that LoadImage may cause?

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Nov 12, 2021

Hi @myron @drbeh ,

Thanks for the interesting feedback.
Could you please help test again with image = wsi_reader.get_data(img_obj)[0].astype(np.uint8)?

Thanks in advance.

@myron
Copy link
Collaborator Author

myron commented Nov 12, 2021

wsi_reader =WSIReader( backend="TiffFile", level=1)
transform =  LoadImageD(keys=["image"], reader=WSIReader, backend="TiffFile", dtype=np.uint8, level=1, image_only=True)

wsi_time = []
wsi_time2 = []
wsi_time3 = []
transform_time=[]

for i in range(100):
    start = time.time()
    img_obj = wsi_reader.read(image_path)
    image = wsi_reader.get_data(img_obj)[0]
    wsi_time.append(time.time()-start)


    start = time.time()
    img_obj = wsi_reader.read(image_path)
    image = wsi_reader.get_data(img_obj)[0].astype(np.uint8)
    wsi_time2.append(time.time()-start)

    start = time.time()
    img_obj = wsi_reader.read(image_path)
    image = wsi_reader.get_data(img_obj)[0]
    if image.dtype!=np.uint8:
        image = image.astype(np.uint8)
    wsi_time3.append(time.time()-start)


    start = time.time()
    image = transform({"image" : image_path})['image']
    transform_time.append(time.time()-start)


print('wsi time', sum(wsi_time[1:])/(len(wsi_time)-1), 'wsi time2', sum(wsi_time2[1:])/(len(wsi_time2)-1), 'wsi time3', sum(wsi_time3[1:])/(len(wsi_time2)-1), 'transform_time time', sum(transform_time[1:])/(len(transform_time)-1))


result
wsi time 0.022537204954359267 wsi time2 0.028219649285981148 wsi time3 0.022044388934819387 transform_time time 0.027675144600145744

summary:
adding .astype(np.uint8) indeed makes it slower even if data is already np.uint8
we can solve it by checking for the type first

    if image.dtype!=np.uint8:
        image = image.astype(np.uint8)

this runs fast. I suggest we make a PR for it, we can compare if dtype matches the provided type, and if they don't match do the conversion

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Nov 12, 2021

Hi @myron ,

Thanks for your interesting experiments.
Let me try to investigate deeper soon.

Thanks.

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Nov 12, 2021

Mark: I will try to check all the transforms for this converting issue next week.

Thanks.

@cgohlke
Copy link

cgohlke commented Nov 12, 2021

numpy.ndarray.astype makes a copy by default. Try the copy=False parameter.

@myron
Copy link
Collaborator Author

myron commented Nov 12, 2021

numpy.ndarray.astype makes a copy by default. Try the copy=False parameter.

how interesting. I wasn't aware of that

@Nic-Ma @wyli I think we should use astype(copy=False) everywhere in the Monai codebase, I see no downsides only an advantage. Especially in all the loading/augmentation transforms. We'll get performance improvements, and and less memory peak pressure

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Nov 12, 2021

Hi @myron ,

Thanks for your suggestion, I will try to make it clear next week, also for Tensor data.

Thanks.

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Nov 15, 2021

Hi @myron ,

Could you please help try again with np.asarray(data, dtype=mp.uint8)?
I think this function can automatically avoid copying if data is already in uint8.
If works fine, I will try to check all the astype in codebase and optimize in PR #3338 for the cases that may have the same dtype for input and output.

Thanks in advance.

@myron
Copy link
Collaborator Author

myron commented Nov 15, 2021

I tried it and np.asarray(data, dtype=np.uint8) is Fast (as fast as not attempting the data conversion)
but why do you think it's better than astype(copy=False) ?

@Nic-Ma
Copy link
Contributor

Nic-Ma commented Nov 15, 2021

Hi @myron ,

Sorry maybe I didn't make it clear.

  1. I checked our codebase and found that many places use np.asarray(XXXX).astype(XXX), I will change this code to np.asarray(XXX, dtype=XXX).
  2. And also change the places that may have same dtype for input / output to XXX.astype(XXX, copy=False) as you suggested.
  3. For the places that obviously change the dtype, I will not update.

Thanks.

@myron
Copy link
Collaborator Author

myron commented Nov 16, 2021

ok, nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

4 participants