Skip to content

Unexpected behaviour of generate_subcatalogs #240

@fnattino

Description

@fnattino

Hello, first of all thanks so much for this fantastic library - it's really great! I observed an unexpected behaviour of the Catalog generate_subcatalogs method - I thought it might be worth reporting.

I am setting up a local catalog with some items organised in a sub-catalog structure. I would like to add items at different stages (e.g. whenever new assets become available). I would have expected that calling the generate_subcatalogs method every time I add new items to the root catalog would do the job. However, every time the method is called, the full sub-catalog structure is replicated. Here's a minimal working example:

import pystac 
from datetime import datetime

# initialise my catalog
catalog = pystac.Catalog(id='my-catalog', description='My Awesome Catalog')

properties = dict(first=1, second=2, third=3)

# add first set of items 
for id in ['A', 'B', 'C']:
    item = pystac.Item(id, None, None, datetime.utcnow(), properties)
    catalog.add_item(item)

# generate sub-catalog structure
catalog.generate_subcatalogs(template='${first}/${second}/${third}')
catalog.describe()
# * <Catalog id=my-catalog>
#     * <Catalog id=1>
#         * <Catalog id=2>
#             * <Catalog id=3>
#               * <Item id=A>
#               * <Item id=B>
#               * <Item id=C>

# add more items
for id in ['D', 'E', 'F']:
    item = pystac.Item(id, None, None, datetime.utcnow(), properties)
    catalog.add_item(item)
catalog.generate_subcatalogs(template='${first}/${second}/${third}')
catalog.describe()
# * <Catalog id=my-catalog>
#     * <Catalog id=1>
#         * <Catalog id=2>
#             * <Catalog id=3>
#                 * <Catalog id=1>
#                     * <Catalog id=2>
#                         * <Catalog id=3>
#                           * <Item id=A>
#                           * <Item id=B>
#                           * <Item id=C>
#     * <Catalog id=1>
#         * <Catalog id=2>
#             * <Catalog id=3>
#               * <Item id=D>
#               * <Item id=E>
#               * <Item id=F>

while I would like to obtain the following:

# * <Catalog id=my-catalog>
#     * <Catalog id=1>
#         * <Catalog id=2>
#             * <Catalog id=3>
#               * <Item id=A>
#               * <Item id=B>
#               * <Item id=C>
#               * <Item id=D>
#               * <Item id=E>
#               * <Item id=F>

If you agree this is unexpected, one could initialise the following sub-catalog dictionary:

subcat_id_to_cat = {}

with all the sub-catalogs up to the current level:

subcat_id_to_cat = {}
curr_parent = self
while curr_parent is not None:
    subcat_id_to_cat[curr_parent.id] = curr_parent
    curr_parent = curr_parent.get_parent()

In this way, sub-catalog levels that are already present are not replicated..
Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions