Skip to content

Conversation

@duke8253
Copy link
Contributor

@duke8253 duke8253 commented May 26, 2021

After running #7808 in our production for a while, we noticed there are memory leakage. Turns out ATS isn't following OpenSSL's documentation on how we store sessions externally, namely our new_session_cb (https://www.openssl.org/docs/manmaster/man3/SSL_CTX_sess_set_get_cb.html) always returns 0. So using the same method with the origin session cache caused the leak.

Also, simplified the way we store ex_data, which is just a curve_id, not sure why we created struct ssl_session_cache_exdata to just store this integer value.

Mentioning #7479 to link all related PRs.

@duke8253 duke8253 added this to the 10.0.0 milestone May 26, 2021
@duke8253 duke8253 requested a review from maskit May 26, 2021 20:12
@duke8253 duke8253 self-assigned this May 26, 2021
@duke8253 duke8253 requested a review from bryancall as a code owner May 26, 2021 20:12
Copy link
Member

@maskit maskit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes around i2d_SSL_SESSION and exdata overwhelmed me. Can you tell me why we need to simplify those parts on this PR that is to fix memory leak?

ssl_session_cache_exdata is used for client sessions as well. Removing it from the both client session and origin session on a PR would make more sense than removing it only from origin session with a fix for memory leak.

I guess I could approve this if I was in a good mood but there was enough number of PRs for Origin Session Cache to make me picky.

{
}

~SSLOriginSession() { SSL_SESSION_free(session); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling SSL_SESSION_free here is the fix, and other changes are not, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, this is freeing the session when it's being replaced or removed from the cache, as we're not doing the i2d_SSL_SESSION conversion and returning the correct return value through the new session callback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for the leak is because when we do the i2d_SSL_SESSION for the server sessions, we always return 0 so the reference count for the actual OpenSSL's session object is decreased after returning from the callback, and we do some manual cleanups in our NetVC if we convert the session object back. But we seem to not do that for the origin sessions, and just rely on OpenSSL's own ref counter to clean up things.

@bryancall bryancall added the Bug label Jun 25, 2021
@bryancall
Copy link
Contributor

@duke8253 is going to take a look at the comment above.

@duke8253
Copy link
Contributor Author

As mentioned above, we didn't follow OpenSSL's documentation on how we handle the new session callback. I tried making the callback for origin sessions do the same thing as server sessions, but looks like we deal with origin session differently too so the memory leak still happens even if I the call back returns 0 all the time, seems like we rely on openssl's own freeing mechanism on the origin side but not the server side. That's why I removed the whole i2d_SSL_SESSION conversion part. As for the exdata, as we only use it to store a single integer value, I see no reason to pass it as a pointer to a structure.

@maskit
Copy link
Member

maskit commented Jun 25, 2021

My understanding of i2d_SSL_SESSION is just a serialization mechanism. So I think it should be possible to use i2d_SSL_SESSION without memory leak unless we are missing something. Is this correct?

Removing the serialization part is fine if it's unnecessary, but is it only possible for origin sessions?

As for the exdata, as we only use it to store a single integer value, I see no reason to pass it as a pointer to a structure.

Is this only true for origin sessions and we still need to pass it as a pointer to a structure for server sessions?

@duke8253
Copy link
Contributor Author

My understanding of i2d_SSL_SESSION is just a serialization mechanism. So I think it should be possible to use i2d_SSL_SESSION without memory leak unless we are missing something. Is this correct?
Removing the serialization part is fine if it's unnecessary, but is it only possible for origin sessions?

Correct, but when I tried to do the same for server sessions, weird things starts to happen: the sessions can only be reused once, and if I remember correctly new leaks started to happen after doing that, I'll have to check my notes on that. Which is why I'm saying we are doing special treatments in NetVC I'm not aware of. So for now removing the serialization can only be applied to origin sessions.

Is this only true for origin sessions and we still need to pass it as a pointer to a structure for server sessions?

This is true for both origin sessions and server sessions, I didn't touch it in the server sessions as I'm not sure what is causing the problem I mentioned above. I'll make a PR once I figure out what's causing the problem.

Copy link
Member

@maskit maskit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to see a bug fix and cleanup separately, but it's probably not worth leaving memory leak for long time.

@duke8253 duke8253 merged commit 5057c29 into apache:master Jun 29, 2021
@zwoop zwoop modified the milestones: 10.0.0, 9.2.0 Sep 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants