Skip to content

Conversation

@Vizonex
Copy link
Contributor

@Vizonex Vizonex commented Jun 8, 2025

What do these changes do?

This Pull Request Features a brand new C-API CPython Capsule that will allow for the implementation of enabling multidict to work with Cython a bit more fluently but also allow CPython Developers to intract with multidict as well.
I have plans to make it so that multidict can have a mini distutils-like setup for finding the header files it needs for compiling with other CPython & Cython extensions similar to what numpy does except we don't need to link any libraries.

This was mainly an idea and after being given the ok that someone would look into it that I was excited enough to spend 3 hours writing it and making sure it was as readable as possible. I have a bit of experience in the field of writing Cython and CPython extensions and doing this felt like a treat to me.

I was up a little bit later than usual so if I forgot something feel free to let me know.
I'm not planning to push everything right away and I would like to takes these features
slowly one step at a time to ensure I didn't miss anything.

A Later Pull request plans to include cython bindings and a mini distutils tool like I mentioned earlier.

Are there changes in behavior for the user?

On the development side this doesn't change much except adds in an experimental Capsule.

Related issue number

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • Documentation reflects the changes

@Vizonex Vizonex requested a review from asvetlov as a code owner June 8, 2025 04:26
@webknjaz webknjaz requested a review from bdraco June 8, 2025 09:12
@bdraco
Copy link
Member

bdraco commented Jun 8, 2025

Thanks so much for this PR, it's great to see you contributing!

We used to have quite a bit of Python overhead in the header parser due to jumping back into the Python API to add headers to multidict, but we resolved that by building a list in Cython and passing the result to multidict at the end:

Your change would avoid that extra copy, which is nice, but unfortunately that's the only spot in aiohttp I can think of where it would help. That makes it hard to justify the long-term maintenance cost of exposing and stabilizing a new ABI in multidict.

The timing is also a bit tricky. #1128 is in the works, aiming to change lookups from O(n) to O(1). If we were to consider introducing a stable ABI, it would have to wait until that lands and proves itself performance-wise.

That said, we really appreciate you digging into performance and submitting this. If you're interested in more optimization opportunities, we’d love to chat. Feel free to drop by #aio-libs on Matrix. There are still a few hotspots on codspeed worth exploring.

Even if this PR doesn’t move forward, please don’t let that discourage you. We’d love to see more contributions from you in the future!

@Vizonex
Copy link
Contributor Author

Vizonex commented Jun 8, 2025

Thanks so much for this PR, it's great to see you contributing!

We used to have quite a bit of Python overhead in the header parser due to jumping back into the Python API to add headers to multidict, but we resolved that by building a list in Cython and passing the result to multidict at the end:

Your change would avoid that extra copy, which is nice, but unfortunately that's the only spot in aiohttp I can think of where it would help. That makes it hard to justify the long-term maintenance cost of exposing and stabilizing a new ABI in multidict.

That was my plan. I mainly was looking into getting rid of some heavier api calls and one of the biggest changes would be making multidict have access in cython which seemed to be the best solution. One thing to note is that my goal was to make the C-Capsule and inline functions as minimal as possible so that focusing on the internal functions wouldn't be a problem.

The timing is also a bit tricky. #1128 is in the works, aiming to change lookups from O(n) to O(1). If we were to consider introducing a stable ABI, it would have to wait until that lands and proves itself performance-wise.

I think I can be patient enough to wait for this to pass.

That said, we really appreciate you digging into performance and submitting this. If you're interested in more optimization opportunities, we’d love to chat. Feel free to drop by #aio-libs on Matrix. There are still a few hotspots on codspeed worth exploring.

I made it if you wanted to say hello.

Even if this PR doesn’t move forward, please don’t let that discourage you. We’d love to see more contributions from you in the future!

Copy link
Member

@asvetlov asvetlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.
It opens an exciting field of improvements; I love to have something like this eventually.

PyObject* (*_MultiDict_Reduce)(MultiDictObject* self);

// multidict_repr
PyObject* (*_MultiDict_Repr)(MultiDictObject* self);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we expose tp_repr slot?
I doubt if it is a hot path in any code; PyObject_Repr() should call the slot in a performant enough way.

Also, I'm not sure about __reduce__, but I think that .extend() is worth being listed here.

Copy link
Contributor Author

@Vizonex Vizonex Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it PyObject_Repr() could be a better approach to this.

} PyMultiDict_CAPI;


static PyMultiDict_CAPI* MultiDict_CAPI;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a bad feeling about mixing two-phase init, mod_state, and this static variable.

} PyMultiDict_CAPI;


static PyMultiDict_CAPI* MultiDict_CAPI;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a bad feeling about mixing two-phase init, mod_state, and this static variable.

/********************* MultiDict *********************/


#define MultiDict_Len(self) \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the beginning of a public C API?
If yes, it shouldn't use macros with direct access to the multidict internals.
Say, after merging hashtable-based solution this api will be broken if a third-party compiles exactly this version but imports HT-based multidict library version.

Did I miss something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I started waiting for that after being told about the hash-table solution so I put this on hold, however when it's done I plan to start redoing stuff shouldn't take me that long as I found writing this all to be a lot quicker than I expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asvetlov When the new hash-table is finished I have a few questions that could help me with revising this more

  • Should I look into only writing C-API Capsules for all functions that require functions that are only written in _multidict.c or should I try hooking all the major components that do not have Vector-Call Styled approaches?
  • Should I start looking at this more from a CPython API Approach for the Dictionaries?
  • Would Looking at python's datetime.h be a good reference for me to use or do you know of any better approaches for it?
  • Should the Capsule just carry the mod_state as opposed to the Capsule having to carry all of these types itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asvetlov I also have an idea since I have a lot of time on my hands, I'm gonna close this pull request, help you finish the hash-table end and then after that I can restart this again.

@Vizonex Vizonex closed this Jun 9, 2025
@Vizonex
Copy link
Contributor Author

Vizonex commented Jun 10, 2025

Gonna open a new PR when #1128 is complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants