Skip to content

Faster access to per-interpreter globals. #692

Open
@markshannon

Description

@markshannon

We recently saw a big performance regression on the telco benchmark when the decimal module was moved to multi-phase init.
Accessing state is now much slower than before.
Anecdotally, accessing a global now takes 7 dependent loads instead of 1. (@mdboom do you have a link for this?)

If we make the observation that we do not need per-module variables, but per interpreter ones, to replace (C) global variables, we can design an API that needs much fewer indirections.

This API is largely stolen from HPy with a few tweaks for better performance. https://docs.hpyproject.org/en/stable/api-reference/hpy-global.html

typedef struct { uintptr_t index } PyGlobal;
/* Declare a global */
#define PyGLOBAL_DECLARE(NAME) PyGlobal NAME = PY_GLOBAL_INIT;

/* Initialize global, this must be called at least once per-process.
 * This function is idempotent, so can be called whenever a module is loaded */
PyGlobal_Init(PyGlobal *name);

PyObject *PyGlobal_Load(PyGlobal name);
void PyGlobal_Store(PyGlobal name, PyObject *value);

Implementation

Each interpreter states has a reference to an array of PyObject * pointers.
PyGlobal_Init() initializes the global to so non-zero index and makes sure that each interpreter has a table large enough to store that index.
Then load and store can be implemented as follows

PyObject *
PyGlobal_Load(PyGlobal name)
{
      return Py_NewRef(_PyThreadState_GET()->globals_table[name.index]);
}

void
PyGlobal_Store(PyGlobal name, PyObject *value)
{
    PyObject **table = _PyThreadState_GET()->globals_table;
    PyObject *tmp = table[name.index];
    table[name.index] = Py_NewRef(value);
    Py_XDECREF(tmp);
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions