Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add collection of worked examples to tutorial #472

Open
njsmith opened this issue Mar 16, 2018 · 43 comments
Open

Add collection of worked examples to tutorial #472

njsmith opened this issue Mar 16, 2018 · 43 comments
Labels

Comments

@njsmith
Copy link
Member

njsmith commented Mar 16, 2018

We should refactor the tutorial into an initial part that's similar to what we have now, or the middle part of my talk, and then a collection of examples that also serve as excuses to go into more depth on particular topics.

I expect the list will grow over time, but here are some ideas. (Actually the main reason I'm filing this is to have a place collect these so I don't lose them.)

  • The current tracing demo should move here. It's a good intro to trio introspection and to co-op concurrency, but having it in the main tutorial like it is now is a big blob of text for folks to wade through if they already know this stuff. (We can/should link to it from the async/await intro though.)

  • Happy eyeballs (for people who saw the talk but want a text version; as a demo of passing nursery objects around; ...)

  • Multiplexing rpc (Add "one obvious way" for implementing the common multiplexed request/response pattern #467)

  • Catch-all exception handler

  • Custom nursery, like a race function or ignore-errors nursery (maybe both)

  • Some standard stuff like echo server, proxy, fast web spider, ... Whatever doesn't end up in the main tutorial. (We could have echo server and TCP proxy as two examples, then show how to run them both within a single process as an example of implementing multi-protocol servers... and also to show off how proxy_one_way can be re-used for both! Maybe the proxy should demonstrate mirroring localhost:12345 to httpbin:80, so people can try it out with their web browsers?)

  • trio-asyncio example?

  • nursery.start

Possibly some of these could be combined or form sequences, eg echo server -> catch all handler -> nursery.start

@Fuyukai
Copy link
Member

Fuyukai commented Mar 16, 2018

An asyncio.gather-like (collect the results of all tasks and return the results) would be a good example (as you've said before on Gitter iirc)

@njsmith
Copy link
Member Author

njsmith commented Mar 17, 2018

Oh yeah, good idea! (And we should use the terms asyncio.gather and Promise.all in the headline, because people seem to be looking for those.)

@njsmith njsmith added the docs label Mar 18, 2018
@njsmith
Copy link
Member Author

njsmith commented Mar 18, 2018

Oh, see also #421, which is a partial duplicate and has some more discussion of the nursery-based examples.

@njsmith
Copy link
Member Author

njsmith commented Mar 18, 2018

Oh duh, here's another one: an example of implementing a custom protocol, by combining a sansio protocol with the stream interface. (Probably some simple line-oriented or netstring-oriented thing. This is one of the motivations for why I started working on sansio_toolbelt. I should make it actually usable...)

@smurfix
Copy link
Contributor

smurfix commented Apr 10, 2018

A walkthrough for converting a sync protocol to Trio might also make sense.

WRT trio-asyncio: maybe simply refer to it. I do need to add some example that shows how to convert from asyncio to trio-asyncio to trio, and how that improves the code. ;-)

@nicoddemus
Copy link
Member

I do need to add some example that shows how to convert from asyncio to trio-asyncio to trio, and how that improves the code. ;-)

Would love to see that. 😁

@njsmith
Copy link
Member Author

njsmith commented Apr 14, 2018

Something like @jab's HTTP CONNECT proxy from #489 (comment) might be interesting too.

(Possibly rewritten to use h11 ;-).)

@njsmith
Copy link
Member Author

njsmith commented Apr 14, 2018

@oremanj's as_completed here might be interesting as the basis for a gather equivalent: https://gitter.im/python-trio/general?at=5ad186345d7286b43a29af53

Maybe examples of testing and debugging would be good too. (Testing might just refer to the pytest-trio docs. Which we still need to write...)

@oremanj
Copy link
Member

oremanj commented Apr 26, 2018

There was some more discussion of this on Gitter today, which resulted in rough drafts of reasonable implementations for gather and as_completed both: https://gitter.im/python-trio/general?at=5ae22ef11130fe3d361e4e25

@N-Coder pointed out that there are a number of useful "asyncio cookbook" type articles floating around, and probably trio would benefit from something that serves a similar role. I think that's the same idea in this thread, but the examples are potentially helpful:

@njsmith
Copy link
Member Author

njsmith commented May 10, 2018

#527 is a common question; we should have something for it.

@Fuyukai Fuyukai changed the title add collection of worked examples to tutorial Add collection of worked examples to tutorial May 14, 2018
@njsmith
Copy link
Member Author

njsmith commented May 20, 2018

As mentioned in #537, a UDP example would be good. This would also be a good way to demonstrate using trio.socket directly.

The example in that comment thread is kind of boring. Doing a dns query or ntp query would be more interesting. [edit: see notes-to-self/ntp-example.py for an ntp query example.]

njsmith added a commit to njsmith/trio that referenced this issue May 21, 2018
blocking-read-hack.py: This demonstrates a really weird approach to
solving python-triogh-174. See:
  python-trio#174 (comment)

ntp-example.py: A fully-worked example of using UDP from Trio,
inspired by
  python-trio#472 (comment)
This should move into the tutorial eventually.
@njsmith
Copy link
Member Author

njsmith commented Jul 7, 2018

It would be good to have an example that discusses the subtleties of aclose: in general, cancellation means "stop what you're doing ASAP and clean up". But if what you're doing is cleaning up... what does that mean? Basically just "clean up ASAP" → graceful vs forceful close.

Maybe this would fit in with an example that wraps a stream in a higher-level protocol object, so we have to write our own aclose? Esp. if the protocol has some kind of cleanup step.

@njsmith
Copy link
Member Author

njsmith commented Jul 9, 2018

Interaction between __del__ and trio is another thing we should discuss somewhere. (It's very tricky. The obvious problem is that __del__ isn't an async method, so it can't await anything. But it's actually much worse than that. __del__ methods are called at arbitrary moments, so they have all the same complexity as signal handlers. Basically the only operation that's guaranteed to be usable from __del__ is TrioToken.run_sync_soon. At least this is better than asyncio, where AFAICT literally no methods are guaranteed to be usable from __del__, but it's still extremely tricky.)

@njsmith
Copy link
Member Author

njsmith commented Oct 5, 2018

Channel examples – we might move the ones that are currently in reference-core.rst here.

It would also be good to have an example of tee, if only to have something to point to when explaining that it's not what ReceiveChannel.clone does. The simplest version is something like:

async def send_all(value, send_channels):
    async with trio.open_nursery() as nursery:
        for send_channel in send_channels:
            nursery.start_soon(send_channel.send, value)

But then there are complications to consider around cancellation, and error-handling, and back-pressure...

@njsmith
Copy link
Member Author

njsmith commented Dec 4, 2018

Using a buffered memory channel to implement a fixed-size database connection pool

@njsmith
Copy link
Member Author

njsmith commented Dec 4, 2018

Re previous message: @ziirish wrote a first draft: https://gist.github.com/ziirish/ab022e440a31a35e8847a1f4c1a3af1d

@njsmith
Copy link
Member Author

njsmith commented Dec 19, 2018

Zero-downtime upgrade (via socket activation, socket passing, unix-domain socket + atomic rename?)

@njsmith
Copy link
Member Author

njsmith commented Jan 27, 2019

example of how to "hide" a nursery inside a context manager, using @asynccontextmanager (cf #882 (comment))

[Edit: And also, what to do in case you need to support async with some_object: ...]

@N-Coder
Copy link

N-Coder commented Feb 26, 2019

Maybe some information about how to do Tasks in trio, like here: #892 (comment)

[Note: I think this means asyncio.Task-equivalents -njs]

@njsmith
Copy link
Member Author

njsmith commented Apr 23, 2019

As requested by @thedrow (e.g. #931), it would be great to have a simple worked example of wrapping a callback/fd-based C library and adapting it to Trio style, demonstrating wait_readable/wait_writable. We'll want to take special care to talk about cancellation too, because that's important and easy for newcomers to forget about.

I'm not sure what the best way to do this would be. Callback-based C libraries tend to be complicated and have idiosyncratic APIs. Which to some extent is useful for an example, because we want to show people how to handle their own complicated and idiosyncratic API, but it can also be problematic, because we don't want to force people to go spend a bunch of time learning about details of some random library they don't care about.

We could write our own toy library just for the example, in C or Rust or whatever.

We could pick an existing library that we think would be good pedagogically. Which one? Ideally: fairly straightforward interface, accomplishes a familiar task, already has a thin Python wrapper or it's trivial to make one through cffi. Some possibilities:

  • libpq (wrapper: psycopg2, async support)
  • curl (wrapper: pycurl, this uses multi support, curl docs)
  • c-ares? pretty simple but integrating its cancellation support with trio would be a mess (you can cancel all operations using a socket, but can't cancel one operation using a socket)
  • hiredis? @thedrow mentioned it as a library they were interested in. There's a wrapper called hiredis-py, but from the README it sounds like it's a pure parsing/sans-io library, and doesn't do I/O at all, so wrapping it in trio would look exactly like wrapping it with any other I/O system? The underlying hiredis library has sync, async, and sans-io APIs, so I guess hiredis-py just wraps the sans-io API. I suppose we could demonstrate using cffi to wrap the async API?
  • Does anyone else have a favorite?

@thedrow
Copy link
Contributor

thedrow commented Apr 23, 2019

One of the things Celery will need to do is to write an sd_notify implementation with async support.
It should be fairly easy to write in Rust/C and there's only one socket to integrate with.

@njsmith
Copy link
Member Author

njsmith commented Apr 23, 2019

@thedrow we already have an issue for sd_notify and friends – let's discuss that over on #252. This thread is about examples to teach people about trio, and AFAIK there isn't anything pedagogically interesting about sd_notify – it's pretty trivial to implement in pure python, or if you have an existing C/Rust implementation you like then it's trivial to bind in python and the binding won't care what io library you're using.

@oremanj
Copy link
Member

oremanj commented May 1, 2019

Some examples of commonly-desired "custom supervisors" would be useful, e.g. the dual-nurseries trick in #569.

@wgwz
Copy link
Contributor

wgwz commented Aug 9, 2019

As mentioned here in gitter earlier today, when I was working the Semaphore primitive I felt unsure of how I was using it. I'd appreciate some examples and documentation on the common use cases of the synchronization primitives shipped with trio and will try to help with this. There is existing documentation here that should be considered too.

@njsmith
Copy link
Member Author

njsmith commented Nov 2, 2019

Here's a sketch for a web spider, including a tricky solution to figuring out when a circular channel flow is finished: https://gist.github.com/njsmith/432663a79266ece1ec9461df0062098d

@snedeljkovic
Copy link

Hey @njsmith I just tested your spider, and it seems there is an issue regarding the closing of the send channel clones. Here is a mwe with a proposed fix:

import trio
import random
from collections import deque

WORKER_COUNT = 10

tasks = deque(i for i in range(103))
results = []

async def worker(worker_id, tasks, results, receive_chan):

  async def process(task):
    await trio.sleep(random.uniform(0, 0.1))
    return task

  async for send_chan, task in receive_chan:
    async with send_chan:
      result = await process(task)
      results.append((result, worker_id))
      if tasks:
        await send_chan.send((send_chan.clone(), tasks.popleft()))
        continue
      print('Worker {} reached an empty queue.'.format(worker_id))
      break

async def batch_job(tasks, results):
  send_chan, receive_chan = trio.open_memory_channel(float("inf"))
  for _ in range(WORKER_COUNT):
    await send_chan.send((send_chan.clone(), tasks.popleft()))
  async with trio.open_nursery() as nursery:
    for worker_id in range(WORKER_COUNT):
      nursery.start_soon(worker, worker_id, tasks, results, receive_chan)

trio.run(batch_job, tasks, results)

If I remove the break in the async for send_chan, task in receive_chan loop the program hangs. Could you explain why exactly this fix works? Is there a more correct way to fix the issue?

@smurfix
Copy link
Contributor

smurfix commented Nov 4, 2019

You're not closing the original send_chan.
Also you fill your queue first and start the workers afterwards, which in a real program (i.e. with a non-infinite queue) isn't a terribly good idea.

@smurfix
Copy link
Contributor

smurfix commented Nov 4, 2019

Anyway, why is your code so complicated?

Simplified:

tasks = deque(range(103))
results = []

async def worker(worker_id, results, receive_chan):

  async def process(task):
    await trio.sleep(random.uniform(0, 0.1))
    return task

  async for task in receive_chan:
    result = await process(task)  
    results.append((result, worker_id))  
  print('Worker {} reached an empty queue.'.format(worker_id))

async def batch_job(tasks, results):
  send_chan, receive_chan = trio.open_memory_channel(WORKER_COUNT)
  async with trio.open_nursery() as nursery:
    async with send_chan:
      for worker_id in range(WORKER_COUNT):
        nursery.start_soon(worker, worker_id, results, receive_chan)
      for t in tasks:
        await send_chan.send(t)
      await send_chan.aclose()

trio.run(batch_job, tasks, results)

Note that you don't need (and don't want) an infinite queue; limiting it to the number of workers is more than sufficient.

@snedeljkovic
Copy link

snedeljkovic commented Nov 4, 2019

@smurfix Thanks for the explanation. Could you please elaborate on why it is a bad idea to first fill the queue and then start the workers? I'm building a concurrent scraper that should roughly speaking be given a batch of urls and then scrape them concurrently. Because of retries some links may get added back to the queue.
Yeah the code is more complicated than it should be, but I figured it's simple enough to get the point.

@smurfix
Copy link
Contributor

smurfix commented Nov 4, 2019

OK, yeah, if you're scraping then your original code makes more sense. ;-)

The point is that you should never use an infinite queue. Infinite queues tend to fill memory. Also, the point of a queue is to supply back-pressure to the writers (i.e. your scraper) to slow down because the workers can't keep up. This, incidentally, significantly improves your chances of not getting blocked by the scraped sites.

OK, now you have 103 sites in your initial list, 10 workers, and a 20-or-however-many job queue. If you fill the queue first, the job doing that will stall, and since there's no worker running yet you get a deadlock.

@snedeljkovic
Copy link

snedeljkovic commented Nov 4, 2019

Thank you for pointing out the issue with the infinite queue.
I don't understand how filling a regular deque can stall. The workers either add the response to the result or add the url back into the queue (of course there are various retry limits, timeouts etc...). You mentioned a "20-or-however-many job queue". I'm adding all my urls to the queue first. All other additions are done by the workers themselves in case of retries.
Regarding rate limiting I do have a mechanism based on timing, and I do not rely on any sort of queue for that if that's what you meant.

Edit: My use case involves a low number of thousands of links at most, so I can afford to keep the entire queue in memory ie. I don't need a separate queue for workers to feed on and another to fill the first one.

@njsmith
Copy link
Member Author

njsmith commented Nov 4, 2019

@smurfix you do need something like an infinite queue for a traditional recursive web scraper, since each consumer task may produce an arbitrary number of new work items. So any finite queue limit could produce a deadlock, when all the consumer tasks are blocked waiting for another consumer task to pull from the queue...

It sounds like I misunderstood @snedeljkovic's problem, and I was assuming a single starting URL, while they actually have the full list of urls up front. So in my gist, I took a kind of shortcut that works for the single URL case, of sending in the original send channel without cloning it, but I didn't point out the tricky bit there, so it wasn't obvious how to correctly generalize to a case with multiple starting urls. That's useful to know for future docs – we need to cover the multiple URLs case and explain the logic behind starting it up correctly.

@snedeljkovic It sounds like you figured out what you need to know? If you still have questions, please let us know – but let's move to a new issue or a thread on trio.discourse.group, so we can focus on your questions properly and keep this thread for tutorial update ideas.

@smurfix
Copy link
Contributor

smurfix commented Nov 4, 2019

@njsmith Yeah, I realized that but got confused. (It's been a long day.)

In any case, boiling this down to the essentials. you need

  • an infinite deque
  • a way to wake up just one worker after the dequeue has run temporarily dry
  • tell the sender when all workers are done.

Doing this with a lot of cloned senders is perhaps not the most efficient way, thus I have encapsulated the idea in a (somewhat minimal) class, and packaged this version of @snedeljkovic's code in a gist:

https://gist.github.com/smurfix/c8efac838e6b39bedc744a6ff8ca4405

Might be a good starting point for a tutorial chapter.

@njsmith njsmith mentioned this issue Nov 4, 2019
@snedeljkovic
Copy link

@smurfix Wow, thank you very much for the effort! I'm new to this kind of programming so this is of great help to me.
@njsmith Sorry, I won't pollute this thread any more ;)

@smurfix
Copy link
Contributor

smurfix commented Nov 5, 2019

It's good to keep a basic tenet in mind: don't code to the tools you have – that leads to bad design. Code to the tools you need, and build them with the tools you have. Recurse if necessary. ;-)

@jtrakk
Copy link
Contributor

jtrakk commented Nov 10, 2019

"How do I"...

  • make line streams
  • make a process-pool executor
  • interact with a repl, pexpect-style
  • make an ssh connection

@decentral1se
Copy link
Member

Quick note about getting started with writing a protocol and wanting to have a way of having pluggable transports (from the Gitter chat just now). For some reason, I couldn't bend my head around this for a while and this might help others:

I'm reading from https://trio.readthedocs.io/en/stable/reference-io.html where I see "it lets you write generic protocol implementations that can work over arbitrary transports". This gave me the idea that I can focus on writing my protocol (encoding, sending, receiving, decoding logic, etc.) and have my Stream class become an instance of MemoryStream, TCPStream, UTPStream, etc. later on

ahh, I see. So the idea is that you implement your protocol as a function or class that takes a Stream object, and uses its methods to send/receive bytes and your code does whatever higher-level thing it wants to with those bytes

so maybe you'd do like my_proto_over_memory_stream = MyProtocol(memory_stream) or my_proto_over_tcp = MyProtocol(tcp_stream)
composition, not inheritance

decentral1se added a commit to decentral1se/trio that referenced this issue Nov 20, 2019
decentral1se added a commit to decentral1se/trio that referenced this issue Nov 20, 2019
decentral1se added a commit to decentral1se/trio that referenced this issue Nov 20, 2019
@njsmith
Copy link
Member Author

njsmith commented Dec 5, 2019

Here's an example that could be turned into a good cookbook tutorial: why TCP flow control can cause deadlocks, and a way to avoid deadlocks when pipelining: https://gist.github.com/njsmith/05c61f6e06ca6a23bef732fbf5e832e6

@decentral1se
Copy link
Member

#1141 really helped me a lot and I think it would be great to include an example of thinking about providing a open_foo function when trying to design an API for whatever objects you expose to the end-user.

@jtrakk
Copy link
Contributor

jtrakk commented Feb 8, 2020

From chat

Q: Anyone has experience has experience with adapting callback-based APIs into Trio? I'm looking at pycurl (http://pycurl.io/docs/latest/curlmultiobject.html). I'm wondering if it's possible.

A: the general pattern is to first set up a a callback that will call trio.hazmat.reschedule, and then call trio.hazmat.wait_task_reschedule to put the task to sleep until the callback comes in or the operation is cancelled

@merlinz01
Copy link

I'd like to see a table mapping as many Go idioms to Trio idioms as possible, for those like me who have worked with Go.
For starters (probably not entirely correct as I'm fairly new to Trio):


ch := make(chan X)                      | send, recv = trio.open_memory_channel(0)

ch := make(chan X, 37)                  | send, recv = trio.open_memory_channel(37)

close(ch)                               | send.close(); recv.close()

x = <-ch                                | x = await recv.receive()

ch <- X                                 | await send.send(X)

done := make(chan any)                  | with trio.open_nursery() as n:
go func(done)                           |     n.start_soon(func)
<-done                                  |

select { case <-ch: XXX }               | try: recv.receive_nowait()
                                        | except trio.WouldBlock: pass
                                        | else: XXX

for x := range ch { XXX }               | async with recv:
                                        |     async for x in recv: XXX

sync.Mutex                              | trio.Lock

context.Context.Get(XXX)                | XXX = contextvars.ContextVar()

@TeamSpen210
Copy link
Contributor

@merlinz01 It might make sense indeed to have a table with a few different languages for comparison. I'm not too familiar with Go, but on the Trio side:

  • It'd be better to say "send.close() or recv.close()". It'd be very odd to close both sides of a channel simultaneously, since those would normally be given to two different tasks, each just responsible for its own end. Looks like closing behaves the same as Go, except that the two sides are independent, and double-closing is just a no-op.
  • Opening a nursery and starting a single function is equivalent to just await func(), it'd only be useful if doing more than one. Unless you specifically want a new task to be created. Also obviously in the Go example the function could choose to close the channel early or not at all, while in the Trio case that's impossible.
  • I don't think the select example is equivalent - the docs say it'll block until at least one channel succeeds, but the Trio example is purely synchronous and will only do anything if the channel already has something in its buffer. The way you'd write a select statement is with code like the example in the supervisors section. Create a nursery with a bunch of blocking tasks, then have them cancel the nursery once they succeed. trio-util offers a wait_any() function for when you already have the predicates.

@merlinz01
Copy link

Thanks for the corrections! It's only a couple days since I learned about Trio, so I don't have all the Trio idioms mastered.

For the go func() example, this is more what I was thinking of:

num_workers = 5
ch := make(chan MyFunctionResult, num_workers)
// Start the workers
for x := range num_workers {
    go send_the_result_over_chan(ch)
}
// Wait for the results
for x := range num_workers {
    res := <-ch
    if res.err != nil {
        [handle error]
    }
}
close(ch)

vs.

num_workers = 5
try:
    # Start the workers
    with trio.open_nursery() as nursery:
        for x in range(num_workers):
            nursery.start_soon(return_the_result)
    # implicitly wait for the results
except* Exception:
    [handle error]

For the select example, I think this is more accurate:

select {
    case <-ch:
        XXX
    default:
        YYY
}

vs.

try:
    recv.receive_nowait()
except Trio.WouldBlock:
    YYY
else:
    XXX

Go is super for networking and concurrency, but as it is a compiled language with monolithic binaries it is not quite as flexible. For my project I needed Python's extensibility and dynamic code compilation, so I'm using Trio's async functionality.

I suppose there would be other comparisons besides this that would also be helpful to programmers coming from other languages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests