Skip to content

Slowness in Item.to_dict(), seemingly from links #546

@TomAugspurger

Description

@TomAugspurger

I'm trying to diagnose a slowdown I'm observing in pystac.to_dict() is slower. Right now I'm unsure if the slowdown is due to a change in pystac or in our STAC endpoint. Anyway, if I call clear_links() prior to to_dict() then things are faster (goes from about 1s to 200µs)

In [1]: import pystac

In [2]: item = pystac.Item.from_file("https://planetarycomputer-staging.microsoft.com/api/stac/v1/collections/sentinel-2-l2a/items/S2A_MSIL2A_20160208T190542_R0
   ...: 13_T10TET")
   ...: %time _ = item.to_dict()
CPU times: user 25 ms, sys: 339 µs, total: 25.4 ms
Wall time: 987 ms

In [3]: item = pystac.Item.from_file("https://planetarycomputer-staging.microsoft.com/api/stac/v1/collections/sentinel-2-l2a/items/S2A_MSIL2A_20160208T190542_R0
   ...: 13_T10TET")
   ...: item.clear_links()
   ...: %time _ = item.to_dict()
CPU times: user 167 µs, sys: 18 µs, total: 185 µs
Wall time: 192 µs

So a couple questions:

  1. Why is having those links present slowing things down? Glancing at the output of to_dict, I don't see why it would need to make an HTTP call or anything like that, right? Maybe through some chain calling link.get_href()?
  2. Have there been any recent changes that would have introduced this slowdown? We've also been making changes to that STAC endpoint that could have slowed things down. Brief testing shows that RC1 - RC3 all have similar performance on this same endpoint (about 1s to do to_dict()
  3. to_dict() has a parameter include_self_link. Could we add a parameter to exclude all links? For my application, I only need the assets and extension info, so making requests to fetch the link

Metadata

Metadata

Assignees

Labels

bugThings which are broken

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions