-
Notifications
You must be signed in to change notification settings - Fork 460
Open
Labels
Description
We use Babel's read_po() function to load our webapp's translations, and we've realized that doing it this way means that source strings (a.k.a. message IDs) are stored multiple times in memory, while they should be stored only once since they're common between PO files. With large numbers of messages and catalogs this can result in significant RAM consumption.
We fixed this inefficiency by creating the following share_source_strings function:
def share_source_strings(catalog, shared_strings):
"""Share message IDs between catalogs to save memory.
"""
if not shared_strings:
shared_strings.update((m.id, m.id) for m in catalog)
return
for m in list(catalog):
if not m.id:
continue
if m.id in shared_strings:
m.id = shared_strings[m.id]
catalog.delete(m.id)
catalog[m.id] = m
else:
shared_strings[m.id] = m.idand calling it after each read_po():
source_strings = {}
for f in po_files:
catalog = read_po(f)
share_source_strings(catalog, source_strings)
...
del source_stringsMaybe a similar mechanism could be integrated into Babel so that memory usage would be optimized by default? If not, then a note could be added in the documentation about how to optimize the memory footprint of catalogs.