-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: Rearrange vrf_bitmap_X api to reduce memory footprint #13333
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, 3500 seems like ... a lot...
Had a couple of questions.
@@ -384,54 +384,74 @@ static void vrf_hash_bitmap_free(void *data) | |||
XFREE(MTYPE_VRF_BITMAP, bit); | |||
} | |||
|
|||
vrf_bitmap_t vrf_bitmap_init(void) | |||
void vrf_bitmap_init(vrf_bitmap_t *pbmap) | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just "return NULL" ? that'd reduce the lines changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again I was going for consistency of api
return; | ||
|
||
if (!*pbmap) | ||
*pbmap = vrf_hash = | ||
hash_create_size(2, vrf_hash_bitmap_key, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a little concerned about the magic number '2', and about reducing the initial size so much. it'll take, what is it, four 'expand' cycles to get a hash back to the existing initial size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how often do you expect people to have > 16 vrf's configured on a box -vs- only having 1 or 2? I would bet most of our use case is just having a couple, in which case 32 was way over kill
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm - well if that's true then ... why use a hash at all - maybe we should just use a simpler scheme with a smaller footprint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aware of anything else that would be simpler here. o(1) lookups are pretty important imo.
I need to have a bit set for an arbitrary vrf id, which can range the entirety of the uint32_t space. Original (naive) implementation created an actual bit based upon the ifindex value. We ended up with bitmaps being absolutely huge( something like 64 mb per process per table ). The original implementation over to a hash reduced it to several hundred k's of bytes. I just want to reduce it further
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, ok, just wanted to ask
lib/vrf.c
Outdated
return; | ||
|
||
vrf_hash = *pbmap; | ||
|
||
bit = hash_get(vrf_hash, &lookup, vrf_hash_bitmap_alloc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems like a dicey place in the existing code: allocate a new hash that you know will not have any entries. could this be cleaned up too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand your comment? I'm not allocating a hash on a unset event if the hash does not exist here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was suggesting ... not doing the alloc in this path - no point in allocating a new, empty hash on an 'unset' operation at all, at any time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I understand now sorry
bit = hash_get(vrf_hash, &lookup, vrf_hash_bitmap_alloc); | ||
bit->set = false; | ||
} | ||
|
||
int vrf_bitmap_check(vrf_bitmap_t bmap, vrf_id_t vrf_id) | ||
int vrf_bitmap_check(vrf_bitmap_t *pbmap, vrf_id_t vrf_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change the signature of this api - we won't allocate in this path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to make it extremely consistent imo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, ok, but ... it makes the impact of the change meaningfully greater. when I see the pointer-to-pointer arg I think that the called api may ... make a change, but that's not the case here. I guess that's setting off some dissonance for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should have a const
, but I am in favor of in fact making it a pointer. That way it behaves like a container data structure — all of those always take pointers, even if some read access could work with a by-value parameter.
Having it as a pointer also gives us more freedom to do fancy things with the underlying data structure.
But, as a final caveat, I'd like to move vrf_bitmap_check
from vrf.c
to vrf.h
and make it a static inline
, exactly to eliminate the cost of it taking a pointer argument. Particularly since vrf_bitmap_check
is used quite a bit in hot paths.
P.S.: there's always the [1]
array trick if we want to avoid adding &
to call sites.
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 amd64 part 9: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9U18AMD64-10975/test Topology Tests failed for Topotests Ubuntu 18.04 amd64 part 9 Successful on other platforms/tests
|
Relevant Slack conversation snippet:
|
18bf448
to
450fa34
Compare
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 amd64 part 6: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO6U18AMD64-11264/test Topology Tests failed for Topotests Ubuntu 18.04 amd64 part 6 Topotests Ubuntu 18.04 i386 part 6: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO6U18I386-11264/test Topology Tests failed for Topotests Ubuntu 18.04 i386 part 6 Topotests Ubuntu 18.04 arm8 part 6: Failed (click for details)Topotests Ubuntu 18.04 arm8 part 6: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-11264/artifact/TOPO6U18ARM8/TopotestDetails/ Topotests Ubuntu 18.04 arm8 part 6: No useful log foundTopotests debian 10 amd64 part 6: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO6DEB10AMD64-11264/test Topology Tests failed for Topotests debian 10 amd64 part 6 Successful on other platforms/tests
|
450fa34
to
d7ea235
Compare
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 amd64 part 6: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO6U18AMD64-11463/test Topology Tests failed for Topotests Ubuntu 18.04 amd64 part 6 Successful on other platforms/tests
|
c365bca
to
ed7b2a7
Compare
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 amd64 part 9: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9U18AMD64-12301/test Topology Tests failed for Topotests Ubuntu 18.04 amd64 part 9 Topotests Ubuntu 18.04 i386 part 3: Failed (click for details)Topotests Ubuntu 18.04 i386 part 3: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12301/artifact/TOPO3U18I386/TopotestDetails/Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO3U18I386-12301/test Topology Tests failed for Topotests Ubuntu 18.04 i386 part 3 Successful on other platforms/tests
|
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 arm8 part 9: Failed (click for details)Topotests Ubuntu 18.04 arm8 part 9: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12304/artifact/TOPO9U18ARM8/TopotestDetails/ Topotests Ubuntu 18.04 arm8 part 9: No useful log foundSuccessful on other platforms/tests
|
ed7b2a7
to
fbcb2e6
Compare
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 arm8 part 9: Failed (click for details)Topotests Ubuntu 18.04 arm8 part 9: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12324/artifact/TOPO9U18ARM8/TopotestDetails/ Topotests Ubuntu 18.04 arm8 part 9: No useful log foundSuccessful on other platforms/tests
|
ci:rerun |
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 i386 part 3: Failed (click for details)Topotests Ubuntu 18.04 i386 part 3: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12338/artifact/TOPO3U18I386/TopotestDetails/Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO3U18I386-12338/test Topology Tests failed for Topotests Ubuntu 18.04 i386 part 3 Topotests debian 10 amd64 part 3: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO3DEB10AMD64-12338/test Topology Tests failed for Topotests debian 10 amd64 part 3 Topotests Ubuntu 18.04 arm8 part 7: Failed (click for details)Topotests Ubuntu 18.04 arm8 part 7: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12338/artifact/TOPO7U18ARM8/TopotestDetails/ Topotests Ubuntu 18.04 arm8 part 7: No useful log foundSuccessful on other platforms/tests
|
ci:rerun |
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 i386 part 3: Failed (click for details)Topotests Ubuntu 18.04 i386 part 3: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12340/artifact/TOPO3U18I386/TopotestDetails/Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO3U18I386-12340/test Topology Tests failed for Topotests Ubuntu 18.04 i386 part 3 Topotests debian 10 amd64 part 9: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9DEB10AMD64-12340/test Topology Tests failed for Topotests debian 10 amd64 part 9 Topotests Ubuntu 18.04 arm8 part 9: Failed (click for details)Topotests Ubuntu 18.04 arm8 part 9: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12340/artifact/TOPO9U18ARM8/TopotestDetails/ Topotests Ubuntu 18.04 arm8 part 9: No useful log foundSuccessful on other platforms/tests
|
fbcb2e6
to
c3f9ce3
Compare
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 i386 part 3: Failed (click for details)Topotests Ubuntu 18.04 i386 part 3: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12349/artifact/TOPO3U18I386/TopotestDetails/Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO3U18I386-12349/test Topology Tests failed for Topotests Ubuntu 18.04 i386 part 3 Topotests Ubuntu 18.04 arm8 part 9: Failed (click for details)Topotests Ubuntu 18.04 arm8 part 9: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12349/artifact/TOPO9U18ARM8/TopotestDetails/ Topotests Ubuntu 18.04 arm8 part 9: No useful log foundTopotests Ubuntu 18.04 amd64 part 9: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9U18AMD64-12349/test Topology Tests failed for Topotests Ubuntu 18.04 amd64 part 9 Successful on other platforms/tests
|
ci:rerun |
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests debian 10 amd64 part 9: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9DEB10AMD64-12353/test Topology Tests failed for Topotests debian 10 amd64 part 9 Topotests Ubuntu 18.04 arm8 part 9: Failed (click for details)Topotests Ubuntu 18.04 arm8 part 9: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12353/artifact/TOPO9U18ARM8/TopotestDetails/ Topotests Ubuntu 18.04 arm8 part 9: No useful log foundSuccessful on other platforms/tests
|
ci:rerun |
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests debian 10 amd64 part 9: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9DEB10AMD64-12359/test Topology Tests failed for Topotests debian 10 amd64 part 9 Successful on other platforms/tests
|
c3f9ce3
to
9568ff7
Compare
When running all daemons with config for most of them, FRR has sharpd@janelle:~/frr$ vtysh -c "show debug hashtable" | grep "VRF BIT HASH" | wc -l 3570 3570 hashes for bitmaps associated with the vrf. This is a very large number of hashes. Let's do two things: a) Reduce the created size of the actually created hashes to 2 instead of 32. b) Delay generation of the hash *until* a set operation happens. As that no hash directly implies a unset value if/when checked. This reduces the number of hashes to 61 in my setup for normal operation. Signed-off-by: Donald Sharp <sharpd@nvidia.com>
9568ff7
to
161972c
Compare
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: FailedCentOS 7 amd64 build: Failed (click for details)CentOS 7 amd64 build: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12558/artifact/CI005BUILD/frr.xref.xz/frr.xref.xz CentOS 7 amd64 build: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12558/artifact/CI005BUILD/config.log/config.log.gz CentOS 7 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-PULLREQ2-12558/artifact/CI005BUILD/config.status/config.status CentOS 7 amd64 build: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12558/artifact/CI005BUILD/ErrorLog/ CentOS 7 amd64 build: No useful log foundRedhat 9 amd64 build: Failed (click for details)Redhat 9 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-PULLREQ2-12558/artifact/RH9BUILD/config.status/config.status Redhat 9 amd64 build: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12558/artifact/RH9BUILD/ErrorLog/ Redhat 9 amd64 build: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12558/artifact/RH9BUILD/config.log/config.log.gz Redhat 9 amd64 build: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12558/artifact/RH9BUILD/frr.xref.xz/frr.xref.xz Redhat 9 amd64 build: No useful log foundRedhat 8 amd64 build: Failed (click for details)Redhat 8 amd64 build: config.status output from configure script can be found at https://ci1.netdef.org/browse/FRR-PULLREQ2-12558/artifact/REDHAT8/config.status/config.status Redhat 8 amd64 build: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12558/artifact/REDHAT8/ErrorLog/ Redhat 8 amd64 build: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12558/artifact/REDHAT8/config.log/config.log.gz Redhat 8 amd64 build: Unknown Log URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12558/artifact/REDHAT8/frr.xref.xz/frr.xref.xz Redhat 8 amd64 build: No useful log foundSuccessful on other platforms/tests
|
ci:rerun libyang2 issue |
Continuous Integration Result: FAILEDContinuous Integration Result: FAILEDSee below for issues. This is a comment from an automated CI system. Get source / Pull Request: SuccessfulBuilding Stage: SuccessfulBasic Tests: FailedTopotests Ubuntu 18.04 i386 part 9: Failed (click for details)Topology Test Results are at https://ci1.netdef.org/browse/FRR-PULLREQ2-TOPO9U18I386-12557/test Topology Tests failed for Topotests Ubuntu 18.04 i386 part 9 Successful on other platforms/tests
|
Continuous Integration Result: SUCCESSFULCongratulations, this patch passed basic tests Tested-by: NetDEF / OpenSourceRouting.org CI System CI System Testrun URL: https://ci1.netdef.org/browse/FRR-PULLREQ2-12559/ This is a comment from an automated CI system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My questions were answered, approving
When running all daemons with config for most of them, FRR has sharpd@janelle:~/frr$ vtysh -c "show debug hashtable" | grep "VRF BIT HASH" | wc -l 3570
3570 hashes for bitmaps associated with the vrf. This is a very large number of hashes. Let's do two things:
a) Reduce the created size of the actually created hashes to 2 instead of 32.
b) Delay generation of the hash until a set operation happens. As that no hash directly implies a unset value if/when checked.
This reduces the number of hashes to 61 in my setup for normal operation.