Skip to content

Commit 7f8103a

Browse files
committed
adding notes
0 parents  commit 7f8103a

File tree

1 file changed

+110
-0
lines changed

1 file changed

+110
-0
lines changed

notes.org

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
* Two questions
2+
1. Designing a general-purpose interface between datacenters
3+
apps and programmable hardware
4+
2. Using hardware features to better schedule low-latency datacenter
5+
applications
6+
7+
* Intro
8+
*** Datacenter servers have increasing amounts of programmable hardware and hardware-acceleration
9+
- e.g., I/O virtualization, IOMMU, programmable NICs/flash
10+
devices, FPGAs
11+
- as a result, the hardware keeps changing and the
12+
hardware/software interface keeps changing
13+
- what does this mean for apps?
14+
*** recent work has one off solutions
15+
- Arrakis/IX for virtual I/O devices
16+
- FlexNIC, FlexTCP for programmable NICs
17+
- .. (find more related work
18+
*** but no general solution for app programmers
19+
- if an app programmer wants to make their app future proof, what
20+
do they do?
21+
*** Conclusion: We need new (lib)OS abstractions for datacenter applications
22+
- so hardware can change underneath apps, the line between
23+
hardware and software can change
24+
25+
* Background: Why won't POSIX work?
26+
- POSIX wanted to abstract differences between devices (everything
27+
is a file)
28+
- not only has the hardware changed since we designed POSIX, the
29+
applications are very different too
30+
***** Conclusion: have new goals
31+
*** datacenter apps need to move a lot of data, not perform computation (even ML apps are limited by moving data between phases)
32+
- move memoru to network (memcached, http servers)
33+
- move ssd to network (file servers)
34+
***** Some apps don't even look at all of the data
35+
***** many things can be passed off to hardware
36+
- move to hardware hashing for checksumming
37+
- ?
38+
***** Conclusion
39+
- need to make it as cheap as possible to move data
40+
- starting to look like a middlebox?
41+
*** hardware is faster at moving data than the processor
42+
- I/O devices can now move data faster than processor, so we need
43+
to have a zero-copy interface
44+
- POSIX is fundamentally not: it's built around copying into app
45+
memory and then out to device
46+
- we need an interface that is able to hand pre-allocated buffers
47+
to app and have to app hand off buffer to another device,
48+
potentially without looking at some or any of the buffer
49+
50+
* Zero-copy event queues
51+
- replaces socket, file descriptor abstractions
52+
- open(), accept() returns an event queue (id)
53+
- has concept of granularity (not just a stream of data)
54+
- moves data with pointers, not by streaming into a buffer
55+
- use COW every time the pointer is transferred to another address
56+
space to avoid complex pointer hand-offs
57+
- use user-level page tables and a directed TLB shoot-down to
58+
reduce cost of setting COW
59+
***** Interface
60+
- qid = open(file)
61+
- qid = listen()
62+
- qid = accept(qid)
63+
- insert(qid, scatter gather array)
64+
- *sga = head(qid)
65+
- *sga = dequeue(qid)
66+
- filter(qid, *filter_func)
67+
- merge(qid, qid)
68+
- sort(qid, *sort_func)
69+
* Benefits
70+
*** No copying latency (at least 2K cycles for a 4K page)
71+
*** Less cache pollution
72+
- Only data that the app needs has to be brought into cache
73+
*** can be implemented in hardware or software or both
74+
- even advanced filtering, merging and sorting can be implemented
75+
in hardware easily
76+
77+
* Datacenters have increasingly demanding workloads (low latency, low tail latency)
78+
- driving much of the programmable and hardware acceleration
79+
- how can we use this hardware for these workloads
80+
*** Current solution: Datacenters do not make good use of cores for these apps
81+
- context switches are expensive and increase tail latency, so they pin apps to cores
82+
- interrupts are expensive and increase tail latency, so they poll
83+
- both are terrible for CPU utilization
84+
*** Key Observation: datacenter apps are event-based programs, not long running serial programs
85+
- interrupt scheduling is ineffective for datacenter workloads
86+
when they have natural yield points
87+
- polling helps but takes too much time to switch back, so only
88+
works for low latency workloads if they are pinned
89+
90+
* Cooperative Event Scheduling
91+
*** Idea
92+
- yield between every event to check for higher priority tasks
93+
- process with low latency and go back to lower priority processes
94+
*** Design requirements
95+
- scheduling decisions must be fast
96+
- context switches must be cheap
97+
*** Possible implementations
98+
- move scheduling into hardware based on queues (IOCPU instead of
99+
IOMMU?)
100+
- tagged TLBs and partitioned caches kept warm for low latency
101+
apps
102+
- yielding between events means that old cached data might not be
103+
useful for next event anyway (experiment: flush cache between
104+
every libevent/memcached handler and check performance)
105+
106+
* Summary
107+
- we can't keep changing the hardware without some abstractions to
108+
buffer apps from those changes
109+
- we can't effectively schedule low latency apps without co-design
110+
between the app, the OS and the hardware

0 commit comments

Comments
 (0)