Skip to content

Sharing message IDs between catalogs #571

@Changaco

Description

@Changaco

We use Babel's read_po() function to load our webapp's translations, and we've realized that doing it this way means that source strings (a.k.a. message IDs) are stored multiple times in memory, while they should be stored only once since they're common between PO files. With large numbers of messages and catalogs this can result in significant RAM consumption.

We fixed this inefficiency by creating the following share_source_strings function:

def share_source_strings(catalog, shared_strings):
    """Share message IDs between catalogs to save memory.
    """
    if not shared_strings:
        shared_strings.update((m.id, m.id) for m in catalog)
        return
    for m in list(catalog):
        if not m.id:
            continue
        if m.id in shared_strings:
            m.id = shared_strings[m.id]
            catalog.delete(m.id)
            catalog[m.id] = m
        else:
            shared_strings[m.id] = m.id

and calling it after each read_po():

source_strings = {}
for f in po_files:
    catalog = read_po(f)
    share_source_strings(catalog, source_strings)
    ...
del source_strings

Maybe a similar mechanism could be integrated into Babel so that memory usage would be optimized by default? If not, then a note could be added in the documentation about how to optimize the memory footprint of catalogs.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions