-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve available heap #3740
Comments
Actually I suggest to modify debug messages also to store text into PROGMEM. I've write macro for this reason: |
There is this macro USE_OPTIMIZE_PRINTF in osapi.h which, if defined, makes all os_printf's fmt stored in flash. |
Yes, but IMHO this is not used in debug messages inside libraries - for example look into ESP8266WiFiGeneric.h macro DEBUG_WIFI_GENERIC. Every debug macro inside library uses DEBUG_ESP_PORT.printf() so it uses Serial, or Serial1 object, not os_printf... |
You probably already know this, but the following command lines are very useful for finding stuff in RODATA that's eating up RAM:
Some wasteful things that stick out for me:
|
@earlephilhower about dns_table, what would be reasonable? 4 hosts, 64 chars each ? |
@d-a-v well, that's a good question. I do know the RFC says 256 octets, but practically on a tiny ESP8266 you're likely only initiating connections to 1 or 2 servers with 32-64 byte hostnames at any time. With no changes to code, I like your 64-byte x 4 suggestion. If small LWIP patches can be entertained, then I'd either submit something that only stored hashes of the hostname (and thus only requiring 64-bytes for any length hostname) or did a dynamic allocation like @devyte is considering. AFAIK there are already memory allocations in LWIP, so this second option may not break any contracts (but I've only spent a little bit going through the code so could be wrong here). And if we're going heroic on it, valid DNS hostnames only allow something like 40 valid characters so you could even do a packed 6-bit encoding to save 25% of any space without any compromise on lengths. |
@earlephilhower thank you for that readelf/objdump suggestions, I've actually done that, but only once before and a looong time ago. It didn't even cross my mind to apply it here. |
I seem to remember that at the nodemcu firmware repo they pulled some tricks in a 2-nd pass linker step, or post-link, or something along those lines. In essence, they post-processed the built binary. |
@d-a-v what do you think of compiling lwip2 with C++? I migrated a big C project (>1M lines of code) to C++ a few years back, and it wasn't too painful. |
@earlephilhower updating dns names from 256 to 64 chars can be done at no cost (and I guess no harm), this is what lwipopts is intended for. So @devyte, we can try and compile lwip with C++, but I think we should submit all patches to lwip-maintainers, not trying to maintain them here. I too have a proposal for fun: #3362 is an abstraction of lwip's netif (the link-layer) for any tcp stack on top of it. If you know about a smaller/lighter ip/tcp implementation (better ram+flash footprint), we can try and use it - WiFi* classes would however have to use this new stack. |
@d-a-v I saw you working on this when I was doing the HTTPS Server, but it didn't click that you were using an unmodified LWIP build. That sounds great! I'd like to take a deeper look (should I ever get some time) at the LWIP repos and see if I can submit a PR that does at least the dynamic allocation of hostnames. That itself should be an almost trivial change and reduce memory dramatically in most cases (DNS for www.yahoo.com would take 16-bytes for the string vs. 256 w/a fully RFC-compliant configuration or even 64 with the reduced name length one). On LWIP init allocate 1-byte "\0" string pointers for the hostname. Then on any assignment w/the hostname on the LHS just do a "free(hostname[x]), hostname[x] = strdup(searchname)"... @devyte - One additional tweak to the command lines I use can give you a sorted list by symbol size: |
@earlephilhower it seems a good idea. Asking them if they would adopt such a patch would be the first harmless thing to do ? |
@devyte in your migration process, did you just make your C project C++ compliant without further changing data structures ? |
@d-a-v The (simplified) outline of what I did was as follows:
I didn't change any data structures., except due to reserved keywords. |
I think that'd kind of defeat @d-a-v 's hope of just pulling LWIP and compiling straight from the sources, no? Are you just trying to get a deeper linting? Maybe a "-Wall -Wpedantic" to the boards.txt GCC command line would get you where you want to be without code changes and file renames? I'm not sure what running g++ instead of gcc is going to give other than stricter warnings... |
I detected that the biggest HEAP consumption happened when moving from 2.3.0 to 2.4.0-RC1 With the same project i had 34020B Free RAM to just 25680B and 28344B (2.4.0-rc2) And with ArduinoJson and SecureConnection my heap goes bye bye |
@earlephilhower the C++ compiler is stricter, and there are differences in how code is optimized. In our project, speed went up by 5-7%, and out of about 200+ bugs, 110 were found straight up due to the C++ compiler complaining about something weird in the code. |
@mrd2 have you tried the readelf to see where the heap went ? |
@penfold42 I shall check |
I would like to mention here #3978. |
896 is definitely an improvement over 536, but it goes from about a 1/2 duty cycle of play/stutter to a 2/3 duty cycle which unfortunately still isn't usable. Memory usage seems to have gone up ~0.5K (not scientific as the measurement fluctuates during runtime). |
So the conclusion is that v2@1460 works but takes a little more ram than v1@1460. I had to increase back dns cache name length to (256->48->)128 since 48 was too small for some IOT clouds. We are back to the OT's subject which is to improve available heap. |
Additional dynamic heap savings could be had by rewriting the sprintf_P() function to actually use th PROGMEM format string directly from ROM, and not doing a malloc(), strcpy_P, then free(). Just ran into that in my own app where the index.html string is ~1.5K and the heap was too fragmented to find enough space. I ended up writing my own mysprintf_P(), although the workaround could also have been to sprintf_P() line-by-line. This would also allow for %S == PSTR string formats, since you're probably stuck rewriting the whole vsprintf() function anyway. |
I was thinking that I have a class made to behave like a If we do rewrite vsnprintf() please don't forget %i which I miss a lot. |
String seems to currently always allocate up to 16 bytes more than requested, even when using reserve(): |
@devyte About the String.resize(), isn't it just trying be allocating to the internal umm blocksize, and adding space for \0 in doing so? IIRC, though, UMM is running on 8-byte chunks, not 16, so a better line would be:
There's no savings in allocating less than an 8-byte chunk since the remaining bytes are lost, anyway, @d-a-v There are so many changes in 2.4 I can't be sure, but I think that they're going w/newlib's vs_sprintf code and not os_vsprintf. I'd rather not (poorly) roll my own sprintf() if we can take a nice piece of proven code and apply a minor tweak. |
@earlephilhower the null term is implicit in how the calculation works. I suppose that's ok, given that if:
About the 16 bytes at a time. I don't see anything about a block size in umm_malloc_cfg.h. A quick glance at README does show that a block has 8 bytes. So, I guess the test here would be to reduce 16 to 8 in changeBuffer(), like you said. |
I've just sent 4 small pull requests to @igrr 's newlib repo which take care of the memory issues I noted above. Most apps should see around 500 bytes add'l static heap, and if the printf() patch is merged they'll need no add'l dynamic memory for the *printf_P()s (actually, there's no need for *printf_P() anyway). @devyte 's note of String is not involved, since that's an Arduino core thing. |
Perhaps this is naive, but it looks possible to save 192 bytes by allowing dead stripping |
I think this is the biggest issue in the 8266 library today.. I found this thread when a relatively small program of mine on the 8266 started acting strangely - yep, memory problems.. The problem? The ardunio library (2.4.1) is taking up too many resources! A compile of an 'empty' program is taking up so going from https://tech.scargill.net/esp8266-ram/ Am I an idiot or is this library now so big it doesn't leave scope for anyone to do anything with it that isn't trivial? |
.. and doesn't that .text somehow end up in ram too? |
@phoddie all ideas are open for discussion, especially if somebody already has preliminary results. |
@steminabox please don't confuse heap RAM with IRAM. |
yep, I've been reading more about it - and IRAM is obviously the problem. Once .text hits 31204 the program is stuffed... Why is the library doing that? ie how much of that IRAM really needs to be there? eg why on earth (using your above example) has floating point been put in as a default? So instead of the end user having to go through a lot of work - and still not fitting into the remaining 7% of IRAM, it would be better to fix the library by taking out all the optional stuff by default, and allow users to shift it back in if needed... Otherwise this library is getting close to useless for anything needing more than a page or two of code... It would be much better to use 10k more heap (when needed) and 10K less IRAM! But that's just me - the bigger philosophical question is why is the 8266 ardunio library growing in this direction, ie becoming bloatware? |
Although off-topic in this issue, one more idea to have more IRAM is to only use 16KB for the cache, instead of the 32KB used now. That would free up 16KB for the application. Regarding IRAM usage by the core itself. Since user applications rarely need IRAM (interrupt handlers are the only case), IRAM usage was never really part of the optimization effort (until very recently). Instead of trying to free up IRAM I think it might be worth making a few changes to allow interrupt handlers to go into Flash. |
I got rid of my two small user interrupt routines (while trying to find where the space was going), so they aren't the problem. So something else must be using it.. |
OK, I found the thread over here - #4551 - which is more discuss the major IRAM problem. It seems it's even worse if you take the changes since 2.4.1 (ie only 440 bytes available according to one post). |
@steminabox about floating point support, you misunderstood me: it was not put into IRAM, it was just added to the code base. I meant it as an example as to why the core is always increasing in size: features are being added. |
yep, I'll continue about IRAm in the other thread, though having infinite heap and no ability to load your program isn't that useful :-) So my last comment, actually on topic :-) - with the heap the best way to keep spare space is to never use memory unless you have to, defer it until the last moment, and keep for as short as time as possible.. There is normally a trade off with doing this and cpu time (and mem fragmentation)- but the 8266 seems to have lots of cpu for the amount of memory it has.. I don't know if fragmentaion would be a problem, then again, most 8266s could reboot every now and again.. |
A lot has been done for this, but there is still more to do... |
See also [PlatformIO docs](http://docs.platformio.org/en/latest/faq.html#program-memory-usage) and esp8266/Arduino#3740 (comment)
Hardware
Hardware: all
Core Version: git / 2.4-rc2
Problem Description
Summary
This issue is meant to track a RAM optimization effort. The goal is to increase available RAM as much as possible.
Details
The following are initial ideas to investigate.
-Revise all uses of String. When concatenating, the internal managed array will grow, sometimes far beyond what is needed. It is sometimes better to reserve, if there is an estimate of how much space will be needed.
The text was updated successfully, but these errors were encountered: