Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To compare string ignoring case, use the option StringComparison.OrdinalIgnoreCase #1321

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

randruc
Copy link
Contributor

@randruc randruc commented Feb 13, 2025

This is much better for performance, as it avoids string allocations and conversion of the whole string prior to comparing. It is also idiomatic.

…nalIgnoreCase

instead of converting the case of the whole string.
@paulomorgado
Copy link
Contributor

@randruc, although you made the code "more correct", there's another issue with ToUpper() and ToLower(), they use the current thread culture.

The correct equivalent would be ToUpperInvariant() and ToLowerInvariant(), but that would stiill allocate new strings.

Every time any case insensitive comparison is done, the overload with StringComparison should be used. And given the nature of the texts being compared here, it should always be ordinal.

You'll find lots in this code base.

@randruc
Copy link
Contributor Author

randruc commented Feb 14, 2025

I indeed found a lot. More generally in the code base, string manipulation is not done the best way. There are lots of potential improvements in changing the way it deals with strings. In particular, concatenations happen everywhere.

@ha-ves
Copy link
Contributor

ha-ves commented Feb 14, 2025

Extracting the strings to a static HashMap(?) thenToLower() it once when comparing might be better, readability wise and "logically performance" wise.

@randruc
Copy link
Contributor Author

randruc commented Feb 14, 2025

I don't see how ToLower() could be faster than ordinal parsing. ToLower() allocates a new string, and iterates over each character.

@paulomorgado
Copy link
Contributor

I think what @ha-ves is suggesting is having a pre-built dictionary (a frozen dictionary for the supported targets) with a StringComparer.OrdinalIgnoreCase key comparer preloaded with the desired lower-case tokens.

@randruc
Copy link
Contributor Author

randruc commented Feb 14, 2025

I see. Again, in the overall, the mechanics of strings extraction and parsing could indeed be significantly improved.
I'm quite curious to know about the current state of the project aims.

value.Equals("rejected", StringComparison.OrdinalIgnoreCase) ||
value.Equals("replaced", StringComparison.OrdinalIgnoreCase) ||
value.Equals("remote-bye", StringComparison.OrdinalIgnoreCase) ||
value.Equals("timeout", StringComparison.OrdinalIgnoreCase);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is being refactored why not make it even readable:

return value.ToLowerInvariant() is "cancelled" 
                                   or "error"
                                   or "local-bye"
                                   or "rejected"
                                   or "replaced"
                                   or "remote-bye"
                                   or "timeout";

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will potentially allocate a string.

The purposed code is a bit more verbose, but potentially more performant.

Unless all input has be previously sanitized and guaranteed to be lower case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't agree more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling to see why the emphasis on such fine grained optimisations (in itself an anti-pattern). This part of the code is dealing with the SIP signalling protocol which has far lower traffic volumes than the media & RTP traffic. And even if you're intent on milking performance from the SIP stack you're starting in the wrong place, for example see the extensive use of regex'es in SIPHeader.cs which will far outweigh any string allocations (not that there have been any performance issues reported in that class the last 20 years).

For 90% of the classes in this library the emphasis should be on readability and maintainability ahead of extracting the last 10% of performance. The RTP stack is an exception.

Unlike this lirbary there is a prototype C# video encoding attempt that could benefit a lot from perfomance improvements.

@randruc
Copy link
Contributor Author

randruc commented Feb 14, 2025

In my use case, I'm trying to run dozens of agents in parallel on a machine with few memory. The performance we are talking about is memory footprint. The very simple use cases with a basic PCM codec is already quite greedy. The part of the library that deals with SIP consumes lots of memory, only for sending signaling messages, as it is full of ToLower(), Trim(), concatenations, and the like, that allocate a new string on each call.
I totally agree with you about SIPHeader.cs, and I'm thinking about improvements that could be done.
And why I don't start with that... Well, I have to start with something to make me familiar with the project.

@sipsorcery
Copy link
Member

Fair enough and if performance is a primary requirement there are C/C++ projects (pjsip, kamailio etc) that will always out perform a managed application environment.

My point here is that while I, and I'm sure other library users, are grateful for all improvements we're here in c#/dotnet for the ease of use and as above readability & maintainability are signigicant concerns.

tldr; PR's that revolve around solving a string allocation here and there at the cost of readability/maintainability are unlikely to be met with much enthusiasm.

@paulomorgado
Copy link
Contributor

With modern C#/.NET you can get good performance and be on par with C/C++.

@randruc
Copy link
Contributor Author

randruc commented Feb 15, 2025

I'm absolutely impressed by what you did with creating this project. I discovered it quite recently, tried it, and the fact is that not only it fits my need, but it worked on the first place!
I think this project has huge potential, as I don't see such complete equivalent library in C#/.net, and open source. Actually I even don't understand why it is not more popular.
I am myself a professional network engineer. I took the time to read the whole code base, including codecs, and I came to the conclusion that strings manipulations and allocations are the big flaw of this library. SIP relies heavily on text, and some implementations in the library are not done the best way. When dealing with a network protocol stack, performance should be a concern (with security). If written correctly, readability will not be impacted in a wrong way. It is simply a matter of establishing a paradigm for a given program.
C# has evolved a lot in the last years and can now compete with historically fast langagues. Many modern video games are written in C# nowadays.
I'm sure there is a way to fit clarity and performance is this codebase.
Actually I would be interested to talk more with you and the implied contributors about the future plans of this project (if any). @paulomorgado seems to have interesting ideas also.
I'm sorry to read that string allocation issues will not be welcomed with much enthusiasm, as this was the thing I was planning to spend some time on.
As we are speaking about SIPHeader.cs, we could think and chat about the possibilities we have to improve it if you want. I already have ideas.

@paulomorgado
Copy link
Contributor

I'm working on high performance logging at the moment (it will take a couple of weeks) as a quick win to get out of the way the cost of logging even when not logging.

Then I intend to tackle the easy performance issues you mention with strings.

Then, eventually, get rid of strings (arrays of 16bit chars) and move to arrays of bytes. That will half the size of most data manipulation.

@randruc
Copy link
Contributor Author

randruc commented Feb 15, 2025

We came to the same kind of conclusions regarding strings.
Building the SIP header must definitely be done in a pre-allocated buffererized way.
And for parsing, to avoid copying strings everywhere, I'm thinking about using an equivalent of slices of Go language (extremely powerful btw), by using new type ReadonlySpan. It's a shame that C# doesn't provide a native implementation of slices like Go. Even coding it in C is easier than in C# 😬

@paulomorgado
Copy link
Contributor

C# is a programing language for the .NET runtime. They work in tandem.

I don't know Go. What does those go slices provide that Span/ReadOnlySpan doesn't?

@randruc
Copy link
Contributor Author

randruc commented Feb 15, 2025

Go includes slices in the syntax of the language itself, not as an API like ReadOnlySpan in C#. Slices manipulation is actually the idiomatic way to deal with strings in Go. This is why strings in Go are extremely fast.
Recently C# introduced ranges, that are included in the syntax, but I think they perform a copy each time a range is used...

@randruc
Copy link
Contributor Author

randruc commented Feb 15, 2025

But to answer your question, we should be able to do the equivalent of slices with ReadOnlySpan of course. It's just a bit annoying to write, as it's an API.

@sipsorcery
Copy link
Member

With modern C#/.NET you can get good performance and be on par with C/C++.

Sure, for some things, for others .NET, or any other memory managed environment, can't compete. There are no general purpose real-time video codecs that can deal with 1080p @ 30fps (or even close) built with C#, Java etc (native library wrappers excluded).

@sipsorcery
Copy link
Member

Actually I would be interested to talk more with you and the implied contributors about the future plans of this project (if any). @paulomorgado seems to have interesting ideas also. I'm sorry to read that string allocation issues will not be welcomed with much enthusiasm, as this was the thing I was planning to spend some time on. As we are speaking about SIPHeader.cs, we could think and chat about the possibilities we have to improve it if you want. I already have ideas.

From my point of view I don't have any BIG plans for the library. I like messing around with SIP & WebRTC and this library, it scratches an itch. I was alwyas thinking Microsoft would eventually add a real-time comms library to .NET but it's never happened. Maybe the AI Realtime API's will finally push them to do it, probably not.

It's also not a dictatorship. There have been periods where other maintainers have been more active than me. Maybe someone else will take it in a different direction one day. Again probably not. It's a massive undertaking to get to grips with the underlying standards & concepts for any of SIP, VoiP, RTP or WebRTC let alone all of them together. It's more than a full-time job and it's not a particularly lucrative area (VoIP and WebRTC services are typically free).

As far as contributions go they are always welcome (and note that above I said "not much enthusiasm" rather than "not welcome"). The challenge as a maintainer is that people can come often along with big ideas to make wholesale changes. If those changes don't work out, or leave things in a worse state, it's the maintainer(s) that invaribly need to sort it out.

What;s appreciated a lot more than applying the latest style fads or optimisations is adding a new capability or enhancement. There's a big list to choose from. Once you get familiar with the library and/or other contributors it's easier to be comfortable accepting changes that have an impact in a broader area.

@paulomorgado
Copy link
Contributor

I don't know almost nothing about SIP, VoIP, RTP, WebRTC or even networking, but I know my way around C# and .NET.

I'm working on a massive PR for High-performance logging, so that logging really becomes pay for play. At the moment, even after #1306, there's a lot of resources being consumed, and that will fix it.

Skimming at the code, I noticed a lot of unnecessary string creation (concatenation, upper/lower). Some need to be removed, others need to make more use of StringBuilder, and these need to be pooled.

But, all this text (as in char - 16bit) is not even needed as it will end up being transmitted as utf-8 or ASCII. Working directly in byte will cut the memory usage in half.

If it's about byte instead of char, then we can get rid of StringBuilder and move to Stream (or RecyclableMemoryStream).

And, finally, Pipelines.

I'm not a fan of big bang changes. So, iterating over this will be safer to keep everything going.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants