-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
dad92b0
commit 77b7b14
Showing
1 changed file
with
29 additions
and
165 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,204 +1,68 @@ | ||
# Ducky | ||
|
||
![Ducky](https://user-images.githubusercontent.com/1985555/91746299-ac944a80-ebbc-11ea-808f-bfae2586955b.png) | ||
![Ducky](https://user-images.githubusercontent.com/1985555/91746299-ac944a80-ebbc-11ea-808f-bfae2586955b.png) | ||
|
||
A quack quacky UDP cache server 🦆 developed for the Networking course of ["Sicurezza dei Sistemi e Delle Reti Informatiche"](http://sicurezzaonline.di.unimi.it/) bachelor's degree program. | ||
A quack quacky UDP cache server 🦆, developed for the Networking course of ["Sicurezza dei Sistemi e Delle Reti Informatiche"](http://sicurezzaonline.di.unimi.it/) bachelor's degree program. | ||
|
||
![C/C++ CI](https://github.com/gabrieledarrigo/ducky/workflows/C/C++%20CI/badge.svg?branch=master) | ||
|
||
## Rationale | ||
|
||
Ducky is a network memory cache server. | ||
Ducky is an network cache server written in C. | ||
It stores unstructured data sent from a client in memory, ready to be served when a client asks for it using a unique key. | ||
Ducky's principal purpose is to reduce networking and computation load from a server (for example, an API, a Web application, or a database) so that a client can store data that is accessed with a high frequency, increasing the overall performance of the system. | ||
Ducky is loosely inspired by projects like [Memcached](https://github.com/memcached/memcached) and [Redis](https://github.com/redis/redis). | ||
While its approach to the "key-value" server implementation is naive, it worked well as a demonstrative and learning project on how to build a simple networking server. | ||
|
||
## Protocol | ||
|
||
Clients of Ducky communicate with the server through `UDP`. | ||
Ducky listens on port `20017` for incoming messages; since Ducky doesn't support TCP connection, clients simply open a UDP socket | ||
with the given port and send the commands within a datagram. | ||
Ducky focuses on velocity and bandwidth saving, reducing the latency overhead of a classic TCP connection. | ||
|
||
Data sent in Ducky protocol (both requests and responses) is in `ASCII`. | ||
Each message corresponds to a command that a client sends to the server or a response from the server; | ||
a message is made up of the name of the command, optional command parameters, and the structured data that clients want to store or retrieve from Ducky. | ||
A message is always terminated by a `\n` characters that determine where the data blocks end. | ||
Each response from the server has a status code that specifies the result of the command: | ||
`2xx` for a successful operation, `5xx` for an errored one. | ||
Ducky supports only two commands, `GET` and `SET`. | ||
One client/server session has the following lifecycle: | ||
|
||
1. The server listens on port `20017`. | ||
2. A client opens a connection and sends a `UDP` datagram to the server. | ||
3. When the server receives the datagram tries to parse it into a known command. | ||
4. If the command is not recognized the server responds with a proper status code/error message. | ||
5. If the command is recognized as a `GET` the server tries to return the request data to the client. | ||
- If the operation is successful the server returns the data along with the success status code. | ||
- Otherwise, it responds with a proper status code/error message. | ||
6. If the command is recognized as a `SET` the server tries to persist the data into the memory. | ||
- If the `SET` operation is successful the server returns a response with a success status code, | ||
- Otherwise, it responds with a proper status code/error message. | ||
|
||
![Diagram](https://user-images.githubusercontent.com/1985555/91744013-14e12d00-ebb9-11ea-8300-7eb5b7331949.jpg) | ||
|
||
## Network I/O | ||
|
||
Designers of networking software can choose various strategies on how to handle client connections and input/output from/towards them. | ||
Notably, these strategies are: | ||
|
||
- Fork multiple _child processes_, one per client, to achieve concurrency, and serve multiple requests. | ||
- Spawn multiple _threads_, one per client, to achieve concurrency, and serve multiple requests. | ||
- Use asynchronous, non-blocking I/O, using [kqueue(2)](https://man.openbsd.org/kqueue), or [epoll(4)](https://man7.org/linux/man-pages/man7/epoll.7.html). | ||
- Use synchronous I/O multiplexing using [select(2)](https://man7.org/linux/man-pages/man2/select.2.html) vs [poll(2)](https://man7.org/linux/man-pages/man2/poll.2.html). | ||
|
||
Even well known, battle-tested and production-ready servers use different approaches. | ||
For example, [Apache HTTP Server](https://httpd.apache.org/) can handle concurrency both with forking child processes (via [prefork](https://httpd.apache.org/docs/2.4/mod/prefork.html) module) or by spawning multiple system's threads (via [worker](https://httpd.apache.org/docs/2.4/mod/worker.html) module), one per each incoming connection. | ||
Other servers or backends use the third approach, basing their concurrency model on asynchronous, non-blocking I/O, using an [event loop](https://en.wikipedia.org/wiki/Event_loop) to handle requests from clients. | ||
Usually, an event loop is implemented using a library or a framework that abstracts away the underlying kernel system call (kqueue(2), epoll(4), event completions) and offers high-level APIs to handle events on file descriptors. | ||
For example, Memcached uses [libevent](https://libevent.org/), an event notification library, to implement its event loop, | ||
while Node.Js, the famous JavaScript runtime built on Chrome's V8 JavaScript engine uses [libuv](https://github.com/libuv/libuv). | ||
|
||
While these solutions are far more efficient, well tested, and probably more elegant I decided to not use any framework; Ducky uses the traditional select(2) system call to be notified when a file descriptor (a client connection) is ready for reading. | ||
select(2) performances are poor in comparison to the other strategies we just illustrated; it works linearly, so the more file descriptors select(2) is required to handle, the slower the system gets. | ||
Depending on the hardware specification Ducky (and so an application that uses select(2)) can reach few hundreds of open file descriptors before the mere waiting for file descriptor activity becomes a bottleneck. | ||
Nevertheless, select(2) has great portability (it's implemented almost everywhere) and served well in designing a simple memory cache server like Ducky. | ||
|
||
## Keys and memory structure | ||
|
||
Data stored in Ducky is identified with the help of a key. | ||
A key is a string that uniquely identifies the data for clients that want to store and retrieve it. | ||
The maximum length limit of a key is **100** characters. | ||
The maximum size for a single item to be stored is **1MB**. | ||
|
||
Internally Ducky uses a [hash table](http://staff.ustc.edu.cn/~csli/graduate/algorithms/book6/chap12.htm) data structure. | ||
It offers an O(1) algorithmic complexity for both storing and retrieving data within a given key. | ||
There are several ways to implement a hash table data structure, especially on how to handle element collision. | ||
Ducky uses an _open-addressed_, _double-hashed_ hash table: | ||
instead of using a linked list for each bucket of the hash table, an _open-addressed_ implementation stores the element in the hash table itself. | ||
That is, each table bucket contains either an element of the dynamic set or `NULL`; | ||
when searching for an item with a given key, the hash table is systematically examined, until the desired item is found in one of its buckets or it is clear that it is not in the table. | ||
There are no lists or elements stored outside the table, as there are in chaining. | ||
The index that points the position in which the element needs to be inserted is computed by a double hash function with the following form: | ||
While Ducky's approach to the "key-value" server implementation is naive, it worked well as a demonstrative and learning project on how to build a simple networking server. | ||
Ducky's internal implementation is illustrated here: [protocol](https://github.com/gabrieledarrigo/ducky/blob/master/PROTOCOL.md). | ||
|
||
``` | ||
h(k, i) = (h1(k) + ih2(k)) mod m | ||
``` | ||
|
||
where _h1_ and _h2_ are the computation of a hash function that: | ||
## Dependencies | ||
|
||
- Takes the string _k_ as an input. | ||
- Converts it into a large integer number. | ||
- Reduces the size of the integer to a fixed range, by taking its remainder mod m, where m is the number of buckets of the hash table. | ||
- Returns the reduced integer. | ||
Ducky is a portable project and has only one dependency: | ||
|
||
When a collision happens, the collided item is placed in some other bucket in the hash table, depending on the result of the double hash function. | ||
You should note how the index of the new position depends on the number of _i_ collisions. | ||
While an open-addressed hash table can fill up its space Ducky implementation can resize itself when the load of the data structure is above 70%. | ||
- [greatest](https://github.com/silentbicycle/greatest), used to write the unit tests for the project | ||
|
||
## Commands | ||
|
||
As we said Ducky supports only two commands: | ||
SET to store some unstructured data identified by a key and GET to retrieve some data corresponding | ||
to a specific key. | ||
The semantics is the following: | ||
## Build and test | ||
|
||
#### SET | ||
Ducky is built with an out-of-source approach using [CMake](https://cmake.org/cmake/help/v3.18/manual/cmake.1.html). | ||
To build Ducky first clone the repository and then, inside the Ducky folder, generate the _build tree_: | ||
|
||
``` | ||
SET key data | ||
``` | ||
|
||
#### GET | ||
|
||
``` | ||
GET key | ||
$ cmake -B ./build | ||
``` | ||
|
||
## Response and status codes | ||
|
||
Each Ducky response is made up of a status code and an optional payload; its parsing is up to the client. | ||
Status codes are divided into two families: | ||
|
||
- The 2xx status codes indicate that the request has succeeded. | ||
- The 5xx status codes indicate that the server encountered an error. | ||
|
||
##### 200 (OK) | ||
Is sent after a successful GET operation. | ||
Now you can build the project: | ||
|
||
``` | ||
200 OK data | ||
$ cmake --build ./build | ||
``` | ||
|
||
##### 201 (Created) | ||
Is sent after a successful SET operation. | ||
To run the unit tests just use the following command: | ||
|
||
``` | ||
201 STATUS_CREATED | ||
$ cmake --build ./build --target test | ||
``` | ||
|
||
##### 500 (Generic error) | ||
Is sent after the server encounters a generic, not known, error. | ||
## Run | ||
|
||
``` | ||
500 ERR_GENERIC_ERROR | ||
``` | ||
To start Ducky simply enter in the `build/src` folder and run the executable: | ||
|
||
##### 501 (Cannot recv) | ||
The server cannot receive the data from the client. | ||
|
||
``` | ||
501 ERR_CANNOT_RECV | ||
``` | ||
$ cd build/src | ||
$ ./ducky | ||
##### 502 (Command not recognize) | ||
The server is not able to parse or recognize that command received by the client. | ||
|
||
``` | ||
502 ERR_COMMAND_NOT_RECOGNIZED | ||
14:00:57 INFO Ducky up and running, listening on port 20017 | ||
``` | ||
|
||
##### 503 (Max data size) | ||
The data attached to the SET command is greater than the maximum allowed data size (1MB). | ||
Now you can start to store and or retrieve data from Ducky; | ||
just open another terminal session and use Netcat to send commands via UDP: | ||
|
||
``` | ||
503 ERR_MAX_DATA_SIZE | ||
``` | ||
|
||
##### 504 (Key length) | ||
The specified key of the GET or SET command is greater than the maximum allowed length (100 characters). | ||
|
||
``` | ||
504 ERR_KEY_LENGTH | ||
``` | ||
|
||
##### 505 (No key) | ||
The GET or SET commands are recognized but the key is missing. | ||
|
||
``` | ||
505 ERR_NO_KEY | ||
``` | ||
|
||
##### 506 (No data) | ||
The SET commands haven't attached data. | ||
|
||
``` | ||
506 ERR_NO_DATA | ||
$ nc -u localhost 20017 | ||
SET key data | ||
201 CREATED | ||
GET key | ||
200 data | ||
``` | ||
|
||
## References | ||
|
||
Developing Ducky was both fun and formative, because it was the cue to learn better how TCP and UDP networking works at the system level, and to gain a better understaing on subjects like concurrency models, blocking and non blocking I/O, data structures. | ||
It was impossible to implement Ducky without some great books, article and materials that I used to dig deeper into the subject: | ||
|
||
- I must start with the great [C10K problem article](http://www.kegel.com/c10k.html) by Dan Kegel, full of concepts and references. It's a must read | ||
- The great [Poll vs Select article](https://daniel.haxx.se/docs/poll-vs-select.html) by Daniel Stenberg | ||
- The obvious [Cormen reference to the hash table](http://staff.ustc.edu.cn/~csli/graduate/algorithms/book6/chap12.htm) data structure | ||
- The super cool [C Hash Table implementation](https://github.com/jamesroutley/write-a-hash-table) by James 明良 Routley, that I used as an example to implements the internal memory cache of Ducky | ||
- A useful book on network programming [Hands-On Network Programming with C](https://www.packtpub.com/networking-and-servers/hands-network-programming-c) | ||
- The essential [Beej's Guide to Network Programming](http://beej.us/guide/bgnet/), really worth a read! | ||
|
||
## Contributors | ||
|
||
A huge thanks goes to [Roberto Carrà](https://www.linkedin.com/in/robertocarra) for his precious work on the logo. | ||
A huge thanks go to [Roberto Carrà](https://www.linkedin.com/in/robertocarra) for his precious work on the logo. |