-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zebra: zebra crash for zapi stream #18359
base: master
Are you sure you want to change the base?
Conversation
7c7ee72
to
b392bef
Compare
lib/zclient.c
Outdated
@@ -1086,6 +1086,93 @@ static void zapi_nexthop_group_sort(struct zapi_nexthop *nh_grp, | |||
&zapi_nexthop_cmp); | |||
} | |||
|
|||
/* size needed by a stream for a zapi nexthop */ | |||
size_t zapi_nh_encode_stream_size(const struct zapi_nexthop *api_nh) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, no, it's not maintainable/safe to try to capture the sizes of our structs in code in this way.
can you take a look at the arithmetic approach that was recently added for the zapi_route struct, and see whether that or some adaptation of it could work here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That model is not directly applicable here. 1) In other model, they have bigger ref max size which is zebra route size(there they are encoding route, not nexthop), does msg_size = sizeof(struct zapi_route); then msg_size - nh size of unused struct zapi_nexthop, in my case, I dont have any reference to start except the max ZEBRA_MAX_PACKET_SIZ, size of zapi_nexthop == 320 bytes, for 520 it is 520x320 >150000, which is more than ZEBRA_MAX_PACKET_SIZ, so I can't do ZEBRA_MAX_PACKET_SIZ - no_of_unused_nhx320. (memory requirement in negative size) 2) Also for other model, it was just lucky that sizeof(struct zapi_route) ~= 320000, so that assuming sizeof(struct zapi_route) - no_of_unused_nh*320, will always work, but in my view this assumption itself can be wrong, considering there is variable part(pointer,array) in zapi_nexthop, i.e szieof(zapi_nexthop) may not reflect the actual size we need to encode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm- I'll try to say what I meant in a different way.
the receiving side of this zapi message in lib/zclient uses struct zapi_route
- and sizeof(struct zapi_route)
is a good proxy for the maximum message size you need - isn't it? there are a couple of fields that aren't used in this specific path, but those are not large, mostly. the 'opaque' blob is the exception, and you can math that out? you'll be left with the fixed fields, and the embedded nexthops. you can also math out the right number of regular and backup nexthops. since those are used on the decode side, they're a good proxy for the space needed on the encode side?
b392bef
to
d1a9970
Compare
to go over the phone conversation I had: a) adding a expandable ability to stream is desired. There should be a new |
65bf686
to
16b9835
Compare
Done |
lib/stream.h
Outdated
@@ -115,6 +116,8 @@ struct stream_fifo { | |||
#define STREAM_SIZE(S) ((S)->size) | |||
/* number of bytes which can still be written */ | |||
#define STREAM_WRITEABLE(S) ((S)->size - (S)->endp) | |||
/* Extra size needed for a stream in bytes, given new write size */ | |||
#define EXPAND_SIZE(S, WSZ) ((WSZ)-STREAM_WRITEABLE(S)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason to expose this computation?
what will happen if someone checks a stream and there is enough space - if WRITEABLE
is > the increment?
and ... if you really think you have to add a macro, please follow the existing pattern and prefix it with STREAM_
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, no need to expose, let me change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved in stream.c its mostly for internal usage in lib itself.
lib/stream.h
Outdated
extern void stream_free(struct stream *); | ||
/* Copy 'src' into 'dest', returns 'dest' */ | ||
extern struct stream *stream_copy(struct stream *dest, | ||
const struct stream *src); | ||
extern struct stream *stream_dup(const struct stream *s); | ||
|
||
extern size_t stream_resize_inplace(struct stream **sptr, size_t newsize); | ||
extern size_t stream_resize_inplace(struct stream *sptr, size_t newsize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change the signature - doesn't that increase the impact of the changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
** was need anymore, * is enough after we realloc only for *sptr->data, but true, this has bigger impact, let me work to change the signature to original one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted back to original signature
lib/stream.c
Outdated
if (!s->allow_expansion) | ||
return false; | ||
|
||
if ((needed_size == 0) || (needed_size > MAX_STREAM_EXPANSION_SZ)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, I think if the code needs the buffer to grow, it should grow. what's the point of failing to allocate 4K + 1, or 32K, etc?
the point, for me, of adding a bit of arithmetic to the code is to avoid repeated reallocs of a few octets, not to prevent code from allocating a relatively small initial buffer but discovering later that it needs a large one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can have different size for MAX_STREAM_EXPANSION_SZ, say it to zapi_route size or some other agreed value, but without having limit/check, this is library code, so anyone can send any request, that may be some very big size , say X due to some bug. Now say the system had just enough memory Y > X. so it will satisfy, so no issue so far, as no assert. But as Y-X too little, then subsequent all memory allocations can fail, where it may be very difficult find out, it was actually started by some buggy request before?. here at least caller can assert as memory request will fail, so it will pin point when/where that bad big memory request came from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your concern is misplaced: we allow stream_new()
to allocate a large buffer, don't we? this new behavior shouldn't impose any limitation that the existing api imposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed code.
16b9835
to
79c28ca
Compare
please fix the python style errors? |
8fb2c5c
to
0a25d5a
Compare
Done |
lib/stream.c
Outdated
static void stream_expand(struct stream *s, size_t expand_size) | ||
{ | ||
/* | ||
* Growth strategy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment block looks mis-aligned - is that an artifact of the webui, or is it actually wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran git clang-format on this file, added whatever got. Please check latest.
lib/stream.c
Outdated
* For small expansions (<= min expansion bytes): grow by min size | ||
* otherwise grow by needed size | ||
*/ | ||
size_t actual_expand_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's put the declaration at the start of the function, and maybe just init it to the default size? that will simplify the if
clause
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
lib/stream.c
Outdated
} | ||
|
||
/* Calculate new total size */ | ||
size_t new_size = s->size + actual_expand_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, variable at the start of the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
0a25d5a
to
1c217c0
Compare
lib/stream.c
Outdated
size_t actual_expand_size = expand_size; | ||
|
||
/* Growth strategy: | ||
* For small expansions (<= min expansion bytes): grow by min size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gosh, I don't know what to say about this: please indent things one level, with a tab. doesn't it look wrong in your editor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry it didnt get added during push, now pushed it.
@@ -1158,7 +1158,6 @@ zclient_send_rnh(struct zclient *zclient, int command, const struct prefix *p, | |||
vrf_id_t vrf_id); | |||
int zapi_nexthop_encode(struct stream *s, const struct zapi_nexthop *api_nh, | |||
uint32_t api_flags, uint32_t api_message); | |||
extern int zapi_redistribute_stream_size(struct zapi_route *api); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh man - I'm sorry to say I just registered what you've done here.
The purpose of this function was to compute a smaller stream buffer size, to avoid using a large buffer in a situation where that led to an excessive memory footprint. That's sort of the reverse of your case, where you want a really big buffer to satisfy your requirement in a different code path. I think you should leave this unchanged, and only change your own path at this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1)Main purpose is to allocate memory optimum way 2) zapi_redistribute_stream_size will still return 150K memory for say 512 nexthops, whereas for that we need only ~20k for next hop encoding, same thing for which we agreed to not do arithmetic calculation for nexthop encoding, and used new api for expand size 3) so I was suggested to fix this part too using new api for expand, ie revert back original other commit change 4) new expand api is not for to satisfy big buffer size, rather increase buffer amount as per need 5) if you think it's risky I can revert back , but I am just making sure about the behavior and expectation.
@@ -610,8 +609,7 @@ int zsend_redistribute_route(int cmd, struct zserv *client, const struct route_n | |||
SET_FLAG(api.message, ZAPI_MESSAGE_MTU); | |||
api.mtu = re->mtu; | |||
|
|||
stream_size = zapi_redistribute_stream_size(&api); | |||
struct stream *s = stream_new(stream_size); | |||
struct stream *s = stream_new_expandable(ZEBRA_MAX_PACKET_SIZ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is what I referred to in another comment: we don't want to go back to the behavior that led to a problem in this code path.
Issue: Currently, during encode time, if required memory is more than available space in stream buffer, stream buffer can't be expanded. This fix introduces new apis to support stream buffer expansion. Testing: Tested with zebra nexthop encoding with 512 nexthops, which triggers this new code changes, it works fine. Without fix, for same trigger it asserts. Signed-off-by: Soumya Roy <souroy@nvidia.com>
Signed-off-by: Soumya Roy <souroy@nvidia.com>
Issue: If static route is created with a BGP route as nexthop, which recursively resolves over 512 ECMP v6 nexthops, zapi nexthop encode fails, as there is not enough memory allocated for stream. This causes assert/core dump in zebra. Right now we allocate fixed memory of ZEBRA_MAX_PACKET_SIZ size. Fix: 1)Dynamically calculate required memory size for the stream 2)try to optimize memory usage Testing: No crash happens anymore with the fix zebra: zebra crash for zapi stream Issue: If static route is created with a BGP route as nexthop, which recursively resolves over 512 ECMP v6 nexthops, zapi nexthop encode fails, as there is not enough memory allocated for stream. This causes assert/core dump in zebra. Right now we allocate fixed memory of ZEBRA_MAX_PACKET_SIZ size. Fix: 1)Dynamically calculate required memory size for the stream 2)try to optimize memory usage Testing: No crash happens anymore with the fix r1# r1# sharp install routes 2100:cafe:: nexthop 2001:db8::1 1000 r1# r2# conf r2(config)# ipv6 route 2503:feca::100/128 2100:cafe::1 r2(config)# exit r2# Signed-off-by: Soumya Roy <souroy@nvidia.com>
This commit undo 8c9b007 stream lib has been modified to expand the stream if needed Now for zapi route encode, we use expandable stream Signed-off-by: Soumya Roy <souroy@nvidia.com>
1c217c0
to
cb2b2fe
Compare
Issue:
If static route is created with a BGP route as nexthop, which recursively resolves over 512 ECMP v6 nexthops, zapi nexthop encode fails, as there is not enough memory allocated for stream. This causes assert/core dump in zebra. Right now we allocate fixed memory of ZEBRA_MAX_PACKET_SIZ size.
Fix:
1)Dynamically calculate required memory size for the stream 2)try to optimize memory usage
Testing:
No crash happens anymore with the fix
zebra: zebra crash for zapi stream
Issue:
If static route is created with a BGP route as nexthop, which recursively resolves over 512 ECMP v6 nexthops, zapi nexthop encode fails, as there is not enough memory allocated for stream. This causes assert/core dump in zebra. Right now we allocate fixed memory of ZEBRA_MAX_PACKET_SIZ size.
Fix:
1)Dynamically calculate required memory size for the stream 2)try to optimize memory usage
Testing:
No crash happens anymore with the fix
r1#
r1# sharp install routes 2100:cafe:: nexthop 2001:db8::1 1000 r1#
r2# conf
r2(config)# ipv6 route 2503:feca::100/128 2100:cafe::1 r2(config)# exit
r2#