-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multithreading and unsupported platforms #8169
Comments
I think the broader question here is whether pip should support platforms that don't provide usable threading support. That's basically what @McSinyx said, but summarised down to the bare essential point. From the Python documentation, the We currently claim to support Python 3.5+ (I'm going to ignore Python 2, as we'll be dropping support for that in 2021, and it's not the real issue anyway). So on that basis, we need to cover platforms without threading1, at least until we drop Python 3.5 and 3.6 support. Personally, I'd suggest that what we do is have a compatibility module that implements whatever concurrency primitives we want, and has fallbacks for non-threaded platforms. We can then unit-test those wrappers to ensure that we behave the same with or without threading, and then we use the wrappers wherever we need them in the rest of the code. Once we drop support for platforms without threading, we can decide whether to keep the wrappers or use the core features directly. 1 #8161 was actually reported on Python 3.8.2, on Android Termux. If we take the Python docs seriously, that platform is broken by not providing a working threading implementation. I don't know how we want to deal with that. Python on mobile is an important enough area that I can see core Python being sympathetic to the idea of not being too strict here. Luckily, the point is irrelevant for now if we are going to support platforms without threading anyway. |
We'll likely be dropping Python 3.5 the same time as Python 2.7 btw -- since Python 3.5 goes EoL in August / September 2020. |
One potential consideration after 2021 is asyncio (if pip ever wants to use it). A lot of the async stuff use threading as a backend when whatever they want to do doesn’t have OS-level event loop support. |
I do not think it is important to support non threading Pythons. We don't have to support it just because it is possible option. |
That being said, I think longer term we are ideally using some form of async code instead of threading or multiprocessing directly (I would love to use trio as it is much better than asyncio imo, but it has some C stuff so it would be a much harder change). |
Hi @McSinyx I was not aware of this thread and just opened #8187 It should support python 2.7 and python 3.5 because it relies on ThreadPool and Pool , both are available on 2.7/3.5. Notice how we had solved the multi-instance progress bar download issue, I would love some feedback on our solution, any suggestions are welcome. |
This is opened as the continuation of GH-7962, GH-8161, GH-8162, GH-3981 and GH-4654 and is one of the approaches to solve GH-825.
Why multithreading?
One common inconvenience with using
pip
is the delay for networking, since most package indices are not really fast[citation needed] and during package managementpip
needs to fetch many things (the package list, the packages themselves, etc.). Parallelization is one obvious solution to tackle this, and I hope it will the cheaper one, hence this issue is open to ensure that the implementation process will not be a labor-expensive work.Until next year when Python 2 support is dropped, there are two options: multithreading and multiprocessing. While the latter is safer, (1) not every platform has multiple CPU cores and (2) the modified code will need to undergo a huge refactoring to give each core the data it needs. So we are left with multiprocessing. The Python 3
asyncio
immediate solution however (plus it also require making many existing routines awaitable).What is the problem with multithreading?
Putting thread-safety aside (not because it's not a problem, but rather because I think everyone knows how problematic it is), the most obvious solution provided by Python
multiprocessing.dummy.Pool
requiressem_open
(bpo-3770), which seems to raisesImportError
during initialization of the pool's attributes. Sincesem_open
is to be provided by the operating system, this raises the question that whethermultiprocessing.dummy
is supported on platforms thatpip
care to support and is (the more generic?)threading
suffers the same issue if we implement thePool
ourselves. How aboutconcurrent.futures
(GH-3981)? Would it be worth it to do it, from the developers' perspective as well as that of our users, if things go wrong on their platform?If we decide to do it anyway, how?
From GH-8162, IMHO it is safe to assume that (this is a really dangerous thing to say 😞) we can fallback to
map
ifmultiprocessing.dummy.Pool
can't havesem_open
. If this works, personally I suggest to declare a higher order function to reuse in other places, namely for parallel downloading of packages (GH-825). Still under the assumption that this is correct, we can easily mock the failing behavior for testing. However, with my modest experience in threading and the overwhelming responsibility of not breaking thousands[citation needed, could be millions] of people's workflows, please do not take my words for granted and kindly share your thoughts on this particular matter.The text was updated successfully, but these errors were encountered: