Skip to content

optimization in av_extend_guts #18681

Open
@hvds

Description

@hvds

As a side note in #18667 I mentioned: I'm also confused why we check if (key > *maxp - 10) - in this case, we have key <= *maxp so we could get away without reallocing at all. This seems like it will only ever cause unnecessary reallocs as well as needing to copy the array contents twice each time, but it has (impressively) been in there since a687059, when this file was still called array.c.

@richardleach commented:

Yes, I had also wondered why that was the case. IIRC, I queried it in the PR (#18072) for the changes above, but noone perked up with suggestions as to why that is as it is. So I left it as-is.

Oh, one possibility: if the shift & unshift/push happens lots in a loop, extending it might reduce the number of times that av_extend() has to be called.

I think the more general logic of av_extend_guts is worth examining, and either changing or commenting.

If the array is shifted - typically because we've called shift() on it, or splice()d something out of the early parts of the array - av_extend_guts will always move the contents of the array to undo any shift before considering any other possibility. It will then additionally realloc the array to make room for more elements if there is now room for fewer than 10 more elements to be added.

This seems unlikely to be optimal. I note that the behaviour was added as part of perl v3.0, long before the concept of things like pad arrays was added. It seems like an area worth more careful study.

If I read the code correctly, whenever we find we need to extend the array from current_max to new_max, we actually extend it to new_max + current_max / 5; that happens in the primary pathway for an unshifted but non-empty array, and my initial assumption is that that should be enough "extra" allocing.

There are rather a lot of places from which av_extend() is called (which will always end up going to av_extend_guts() if the array is not magical); most of those callers precede it with one of a wide variety of checks before calling it.

Can we unify those checks, perhaps with some new macros?

Should one of av_extend or av_extend_guts check whether there is already room for the requested elements, and return in that case with no changes? (The answer to this probably depends on where we end up with the checks in the previous question.)

Should we avoid extending the array (with the resulting additional copy) when we don't need to? (Ie change the existing key > *maxp - 10 check to key > *maxp.)

I haven't checked CPAN for other direct calls to av_extend or av_extend_guts, should we look at those too to determine optimal strategies?

@iabyn I'm particularly interested whether this is something you've already looked at in your prior optimization work.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions