Skip to content

Conversation

@0x2b3bfa0
Copy link
Contributor

@0x2b3bfa0 0x2b3bfa0 commented Dec 16, 2023

As @skshetry proposed on #10178 (comment), deduplicating key_paths would be enough, because dpath.search with distinct keys will never output the same object twice.

Important

This patch seems to fix #10177 and some other creative variants I wrote, but I'm still afraid of unforeseen edge cases; not sure if this is a good idea.

@0x2b3bfa0 0x2b3bfa0 added the A: params Related to dvc params label Dec 16, 2023
@0x2b3bfa0 0x2b3bfa0 requested a review from skshetry December 16, 2023 17:30
@0x2b3bfa0 0x2b3bfa0 self-assigned this Dec 16, 2023
@codecov
Copy link

codecov bot commented Dec 16, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (abcd70b) 90.61% compared to head (ea173bb) 90.61%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #10180   +/-   ##
=======================================
  Coverage   90.61%   90.61%           
=======================================
  Files         499      499           
  Lines       37881    37881           
  Branches     5505     5505           
=======================================
  Hits        34327    34327           
  Misses       2912     2912           
  Partials      642      642           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@skshetry
Copy link
Collaborator

skshetry commented Dec 17, 2023

As I said in #10178 (comment), this won't fix the issue because different keypaths could still reference to same list.

params:
- references
- references.references
- references.references.references
- references.references.references.references
- references.references.references.references.references
- references.references.references.references.references.references

I think there's no way around deepcopying. I'll suggest restoring that, and also making key_paths distinct (to reduce too much deep-copying).

@0x2b3bfa0
Copy link
Contributor Author

0x2b3bfa0 commented Dec 17, 2023

As I said in #10178 (comment), this won't fix the issue because different keypaths could still reference to same list.

Do you have a minimal example for that? I have been unable to reproduce it, and can't think of any scenario where merge could use the same reference with all the keys being distinct. 🙈

@skshetry
Copy link
Collaborator

The childrens of the values that keypath/xpath points to could be same.

# dvc.yaml
stages:
  example:
    cmd: test
    params:
    - custom.yaml:
      - references
      - references.references
      - references.references.references
      - references.references.references.references
      - references.references.references.references.references
      - references.references.references.references.references.references
# custom.yaml
values: &anchor [0, 1, 2, 3]

references:
  references:
    references:
      references:
        references:
          references:
            references:
              first: *anchor
              second: *anchor
              third: *anchor
              fourth: *anchor
              fifth: *anchor
              sixth: *anchor
              seventh: *anchor
              eighth: *anchor
              ninth: *anchor
              tenth: *anchor
              eleventh: *anchor

@0x2b3bfa0 0x2b3bfa0 changed the title param: simplify fix for exponential memory allocation param: optimize fix for exponential memory allocation Dec 18, 2023
@0x2b3bfa0 0x2b3bfa0 requested a review from skshetry December 18, 2023 02:48
@0x2b3bfa0 0x2b3bfa0 enabled auto-merge (squash) December 18, 2023 02:52
@skshetry skshetry merged commit 72ab5a1 into main Dec 18, 2023
@skshetry skshetry deleted the simplify-fix branch December 18, 2023 02:52
@0x2b3bfa0
Copy link
Contributor Author

0x2b3bfa0 commented Dec 18, 2023

Thanks! I didn't know that dpath.search had that behavior. 🤦🏼‍♂️

@skshetry
Copy link
Collaborator

skshetry commented Dec 18, 2023

We could try pruning sub-paths here, but with aliases and anchors, I cannot say with confidence if that would be enough. So, I think it's better to deepcopy and detach from shared references.

@0x2b3bfa0 0x2b3bfa0 added enhancement Enhances DVC p3-nice-to-have It should be done this or next sprint labels Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A: params Related to dvc params enhancement Enhances DVC p3-nice-to-have It should be done this or next sprint

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Exponential memory allocation caused by YAML parameter files with reused anchors

2 participants