After more than 500 commits since kfs-5, we're glad to present the new release of KFS !
Project rename
First of all, the project finally has a name, and a great logo ! #197
Introducing: SunriseOS
Filesystem
The main focus of this release have been around creating a filesystem service for the Sunrise OS, from scratch.
Since we're a micro-kernel, the kernel is completely agnostic of concepts of files, directories, and such fancy abstractions, and also from the concept of disks, blocks, etc, as it is the drivers' job to provide a mean to read and write blocks from/to the disk. The way we do this is by having several userspace services in charge of doing just that, and let them make requests to each other via Remote Procedure Calls (RPC).
AHCI Driver
The AHCI driver is responsible for discovering disks, and exposing endpoints to read and write blocks to them.
Supports (P)ATA and SATA devices. All operations are done via DMA.
Holds a minimal PCI implementation to discover AHCI devices, and query their Base Address Register (BAR).
PR: #206
exposed API: ahci.id
documentation
FS service
The filesystem service. Think of it as the equivalent of VFS on linux: an abstract layer on top of more concrete file systems. Any user that wants to read or write a file connects to it, makes a request, and depending on the partition the file would be found on, it routes the request to the appropriate filesystem driver (e.g. ext2, fat32, nfs, ...), which in turns make requests to the ahci
service to get the appropriate blocks, and interprets the raw data as structured files and directories. Right now only FAT32 is implemented.
The filesystem service supports multiple partitions on a disk if a GPT is present.
The filesystem keeps a Least Recently Used (LRU) cache of blocks in RAM, to improve performances.
PR: #390
exposed API: fs.id
documentation
Libfat
A no_std FAT12/FAT16/FAT32 compatible crate. This is the first of the supported filesystem formats.
Repo: sunriseos/libfat
disk-initializer
An utility that permit to generate a valid disk image from a file name, a size and a template directory.
The resulting disk image contains a standard GPT partition layout and a FAT filesystem.
From the disk size, it chooses a FAT filesystem that may be appropriate to fit in it.
SwIPC-Gen
In KFS-5, we introduced our IPC mechanism and had a first rough implementation of them. On the client-side, we would auto-generate clients based on SwIPC id files, while the server side relied on an object
macro. It worked well, but had multiple maintenance issues:
- It was easy to have a mismatch between the client and server
- The server object macro had a lot of rough edges due to being an old-style macro
- The server object macro required putting lots of code inside macro code, which would break IDE features like auto-completion
Furthermore, there was a few documentation issues. As such, it was decided to completely revamp the IPC Server subsystem.
In the previous section you've seen some IPC definitions files, like example.id. Those are SwIPC interface definition files. When building libuser, they will be parsed, and automatically generate some rust code:
Server trait
The .id
will generate a rust Trait (interface) for every exposed interface. A server only has to implement this generated trait on their own structure, and fill the endpoints with some code.
The trait will automatically implement a dispatch()
method, which routes an RPC cmdid to the appropriate trait function, unpacking the arguments to pass to the function and packing the return value back to the IPC buffer. This function is then passed to the various IPC wrappers that are spawned on the event loop, to be automatically called when RPC requests are received.
This brings a huge improvement to our workflow: the lack of special macro means all standard IDE features work, and is a lot less magic and intimidating to learn. The documentation is also a lot cleaner.
Client types
Swipc_gen also generates client structs like xxxProxy
for every interface in the .id
. The structure will wrap an IPC session handle, and expose functions to call its known endpoints, with the appropriate parameters and return values.
Such a struct can be constructed with the new()
function, which will get the sm
service, and ask it to create a session to the xxx
service for us, and wrap this handle in our proxy type.
See the generated structs and traits for Vi for a real life example.
Note that the documentations in the .id
are used to document the generated types.
PR: #248
documentation
SwIPC-gen CLI usage
In order to help debug SwIPC, we also have a new cargo make swipc-gen
rule that writes a .rs
file containing the code that is automatically generated for a given .id
file.
PR: #365
Futures
The main event loop of every service is pretty much the same: wait for RPC request, do some stuff, wait for IRQ to notify stuff we did is ready, do some more stuff, and eventually return from RPC. But while we wait, we don't want to be blocking, and want to be able to serve multiple RPCs simultaneously. This means that asynchronicity is at a the heart of every service.
In rust asynchronicity is done through Futures (think javascript promises), you can declare a function as async
, and then .await
on it. This way every RPC can be an async function, which gives back control to the event loop every time it .await
s on IRQs, and is polled again some time later when the IRQ has been received.
So, here we are, implementing a future executor !
Our service event loop is basically a light userspace scheduler (fibers/co-routines), which waits on a list of ports, sessions, and IRQ handles (like select
on linux), and schedules the appropriate future when one of its handles is signaled, lets the future advance until it .await
s on another event, and loops again. Once a future is eventually finished, it is removed from the list.
All of this lives in libuser, and is available for every service. A service only has to implement the xxxAsync
flavour of the trait generated from the .id
for its interface instead of the synchronous xxx
trait, and its RPC is now a future, congratulation.
This means that writing asynchronous services/drivers is now painfully easy, as you have the full power of rust and a bit of magic to help you behind the scenes.
Please note that our event loop lives only on a single thread, we never spawn any threads for it to work. In the future (pun intended) we might want to divide the work across a pool of worker threads.
PR: #384
documentation
Time
The time sysmodule. This sysmodule is charge of handling the RTC time and timezone.
The timezone code is based on the tz database code with some custom modifications (Leap seconds aren't handled and Tzif1 support removed)
In the future, the time service will be used to provide the filesystem service with time informations.
PR: #318
exposed API: time.id
documentation
Dynamic relocations
Userspace binaries are now relocatable. Actually they now have their own llvm target i386-unknown-none-user
, and use their own linker script.
To pave the way for ASLR, they are loaded at an arbitrary address by the kernel (for now at 0x40000), and one of the first thing they do before calling main is to find their .dynamic
section, parse it and apply it, effectively relocating themselves.
PR: #247
userspace linker script: userspace.ld
Threads and Thread Local Storage
In the libuser we created the threads module, which exposes an api to create and start threads, passing them arguments, akin to pthreads on linux. When we eventually port the libstd on sunrise, it will be used as a backend for std threads, instead of pthread.
But I'm afraid now is the time to talk a bit about Thread Local Storage.
On the Sunrise OS, each userspace thread has three special thread local memory regions: the kernel-allocated TLS region, which points to a userspace-allocated Thread Context, where ELF Thread Local Storage memory lives.
On top of that, we implement kernel cpu-local statics, like CURRENT_THREAD
, as if they were ELF thread local.
The documentation is quite good at explaining everything, but let's do a quick tour.
TLS region
This is just a 0x200 bytes region that the kernel allocates for every thread. Its layout is documented here, but it mostly contains the IPC buffer where the kernel reads and copies packed arguments and return values for every RPC. It also contains a user-controlled pointer to a context, that is ignored by the kernel.
Because it is created by the kernel, and since this is a really important structure as the thread will likely read and write to it all the time, we keep a register pointing to it at any time. We use the fs
segment selector register and a bit of segmentation magic to do that.
This can seem pretty weird, but this is to stay close to how things work on aarch64, where we'll be using tpidrro_el0
.
TLS context
We make the user-controlled context pointer in the TLS region point to a struct ThreadContext, where we do a bit of bookkeeping about the current thread: its stack address, argument, entry point, own thread handle for some syscalls ... This struct is also used to describe a thread, start it, etc. Think of it as a pthread_t
.
TLS ELF
In the userspace programs, we want to have some thread-local statics. Like declaring a thread_local int foo = 0;
in C11, and every thread has its own view of the global. No, not that Posix Key-Value store in pthreads, nobody wants that.
The equivalent code in rust is
#![feature(thread_local)]
#[thread_local]
static MY_THREAD_LOCAL: core::cell::Cell<u8> = core::cell::Cell::new(42);
What this does, both in C and in Rust, is tell the compiler to generate a special TLS program header, which we are responsible for loading at the creation of every thread, and make gs
point to it. The generated code will then access your global via an offset relative to gs
.
So we do just that, and expose a syscall (with meme) for the user to set where its gs
segment points to.
This has the advantage (at the cost of being an absolute pain in the a** to implement) of being an extremely efficient way of doing TLS, since it's only an offset to a register, and LLVM is in charge of optimizing everything.
CPU-local globals
Finally, yet again paving the way for SMP, in the kernel we want some globals to be per-cpu (think current
in linux, which points to the current thread executing on this core). The way we do this is recycling the concept of TLS ELF in kernel space. Instead of loading the TLS program header for every thread, we load it for every cpu core, and every core points to its own. Besides that it's pretty much the same.
But since we use gs
both in userspace and kernel to point to different locations, we must swap gs
every time we enter and leave the kernel. This is all documented in the cpu_locals module (with more meme).
This lead to rewriting our interrupt handlers, they are now generated by the generate_trap_gate_handler! macro. This is a lot cleaner.
PR: #339
Thread argument ABI
Finally, the kernel has changed the ABI for the way it passes arguments to threads when starting them. When creating the first thread of a process, it passes it a handle to this thread itself. This will be required for mutex to function in the future. When creating any other thread, the argument is the one provided by the user in svcCreateThread
.
The argument is passed in a register by following the fastcall convention. This spares us the need for an asm wrapper on the userspace side, as it is natively understood by the compiler.
Kernel Drivers
Even though we're a micro-kernel, some drivers still need to live inside the kernel (MMU, timer, interrupts, rs232 for kernel logging). We updated some of them.
HPET
We now have a driver for the High Precision Event Timer, which replaces the Programmable Interrupt Timer. This lets us go down a lot regarding timer interrupts frequency resolution (from 10MHz down to femtoseconds).
For this to work we finally parse some ACPI, especially the HPET table, to get the relevant device configurations.
Welcome to the 21st century.
We made the timer api in the kernel a lot more generic, so we can now register multiple timer sources, each with its own frequency, and if they are not found, it will fallback to using the PIT. Long term, we're hoping to change our timeout API so that it doesn't wake up the kernel as often as it currently does.
PR: #256
IOAPIC and LAPIC
PIC is so 1980s, IOAPIC is so 2000s.
Kernel uses the IO-APIC (I/O Advanced Programmable Interrupt Controller) and LAPIC (Local ...) instead of the PIC.
Moving to the APIC provides us with two immediate and one long term benefit:
- It allows us to use MSI for PCI interruptions, which should cleanly resolve the problem that level triggered interrupts don't work well in the context of a microkernel.
- It provides us with a nice speed boost compared to the PIC, what with the IO-APIC being a lot cleaner to use.
- It will eventually allow SMP.
PR: #304
IOMMU
We still don't have an driver for the IOMMU to virtualize DMAs, 😢 but this is a work in progress.
Kernel features
As usual, we're implementing missing features in the kernel based on immediate necessity. Here's a quick overview of what we added.
IPC
The way IPC works, we are limited to a bit less than 0x100 bytes of raw data to pass around between two threads. While that may seem like a lot, there are some kind of IPC where this will clearly not be enough, such as filesystem accesses. In order to solve this problem, we need a way to efficiently pass some kind of "pointer" between processes efficiently.
Meet IPC descriptors. Those little fellas allow passing large zones of memory between processes efficiently. There are two kinds of them:
Buffer Descriptors
Also known as descriptors A, B and W, those allow passing memory by remapping it from one process to the other. This is extremely useful for data that is likely to span multiple pages. To avoid enforcing alignment constraints, the first and last pages (if they are misaligned) will get a fresh page allocated for them, and the data will get memcpy'd. For the rest of the pages, those are simply remapped. This means that process-allocated pages are now refcounted, so they can be remapped freely.
Each descriptor type has diffrent permissions. From the point of view of the server, A descriptors are read-only, B descriptors are "write-only", and W descriptors are "read-write". Nintendo being Nintendo, B descriptors are also read-write and W are completely unused 🤷.
Pointer Descriptors
For data that is likely to be less than a page big, but still need to be bigger than the small 0x100 bytes, we have pointer descriptors, also known as descriptors X and C. Those work in unisson: for each X descriptor sent on one side, a C descriptor exists on the other side. The data from the X descriptor will be memcpy'd to the C descriptor of the other side.
PR: #229
Readable event
We implemented a new kind of handles: The ReadableEvent and WritableEvent pair. These can be used to signal an event to another process via IPC, or to another of our threads. They are created by the svcCreateEvent syscall. The writer part is meant to be passed to the signaler, and the reader part to the event listeners. When a listener waits on the ReadableEvent handle, it is put to sleep, until the signaler calls svcSignalEvent on the WritableEvent handle. The event will now behave as a level-triggered signal, waking up all current and future waiters on the ReadableEvent handle, until one of the listeners calls svcClearEvent to reset it.
commit: 27c7901
Mappings bookkeeping
TODO
Misc
Kernel errors
Reduce Kernel errors to be a 1:1 match with userspace errors. #230
Logging
We improve logging with a lot of colors and filtering. Userspace programs can log any message they want with the svcOutputDebugString
passing a debug level and the module and function it originates from (this is all done by the debug!
/info!
/warn!
/error!
macros). We declare the default log level in the grub.cfg
and the exceptions to this log level (e.g. default is "info" and we also want to see every "debug" logs of the module we're debugging). The kernel fetches the loglevel description in the parameters grub passes to it, and drops anything that's below the log level.
Screenshot, with the default log-level being "info":
PR: #306
Kernel panic
Rewrote the kernel panic function, it now takes the origin of the panic as parameter (failed assert/kernel fault/double fault/userspace fault for debug), so it can filter what informations to display. We now display a Blue Screen Of Death so the user knows we have panicked.
Also memes.
left: serial output. right: vga ouptut.
This is always followed by a stack dump.
Infra
Mac OS X and Windows build
We now support being built on Mac OS X and Windows out of the box. Building SunriseOS is now as simple as git clone
and cargo make iso
.
We ensure we never break builds on Mac OS and Windows in the CI. Tests are still only run on linux for now.
PR #278
Meta docs
Even though we take pride from the code of SunriseOS being extremely well documented, this being due mostly to radical politics regarding missing docs (still looking at you, linux ರ_ರ), we lack some high-level documentation introducing newcomers to the SunriseOS design choices and goals, and general abstract documentation about the infra and such.
So we created the /docs/
folder to host some of those papers. It is mostly .md
files, but we made it so it is also treated as a crate, so it is parsed by cargo with the rest of the crates when you do cargo make doc
, and you can read it in html format along with the rest of our doc on sunriseos.github.io.
PR: #377
GDB scripts
Created the /scripts/gdb/
folder to host some .py
scripts to be loaded into gdb to help poor sunrise developers during debugging sessions.
Bug fixes
Lots and lots of bug fixes: #42 #79 #217 #223 #227 #66 #219 #313 #314 to name a few.