Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress our elf image #98

Closed
glommer opened this issue Nov 21, 2013 · 6 comments
Closed

Compress our elf image #98

glommer opened this issue Nov 21, 2013 · 6 comments

Comments

@glommer
Copy link
Contributor

glommer commented Nov 21, 2013

One of the ways to boot faster is to read a smaller image. We have open issues that aim at reducing the size of the image per se, but something that goes well even after all those optimization are applied is compression.

BZIP2 is BSD licensed so we could just include the source in our tree. It reduces the image size from 9.5 Mb to 4.6Mb in my System. Maybe we can fit into 1Mb someday...

We can compress the code in our build process, and decompress as the bootloader's last step. I am suggesting bz2, but others can be used as well.

@avikivity
Copy link
Member

On 11/21/2013 01:22 PM, Glauber Costa wrote:

One of the ways to boot faster is to read a smaller image. We have
open issues that aim at reducing the size of the image per se, but
something that goes well even after all those optimization are applied
is compression.

BZIP2 is BSD licensed so we could just include the source in our tree.
It reduces the image size from 9.5 Mb to 4.6Mb in my System. Maybe we
can fit into 1Mb someday...

We can compress the code in our build process, and decompress as the
bootloader's last step. I am suggesting bz2, but others can be used as
well.

We'll need a static library for that, and one with a minimal dependency
on libc, since the decompressor won't have any support code around it.

@glommer
Copy link
Contributor Author

glommer commented Nov 21, 2013

Absolutely. That is why I considered using bzip2. Since it is BSD licensed, we can just pull the source as a submodule, and build the code the way we want as part of our build process. We make it as tiny as we can and load it together with our binary

@nyh
Copy link
Contributor

nyh commented Nov 24, 2013

On Thu, Nov 21, 2013 at 1:55 PM, Glauber Costa notifications@github.comwrote:

Absolutely. That is why I considered using bzip2. Since it is BSD
licensed, we can just pull the source as a submodule, and build the code
the way we want as part of our build process. We make it as tiny as we can
and load it together with our binary

You want to decompress our binary, so the decompressor cannot be part of
our binary, nor can it use any of the normal standard-library facilities we
offer (even malloc(), read(), etc.). So I think you'll need to do quite a
bit of modifications to the bzip2 decompressor to use it. But I didn't look
at it to see how difficult this would be.
As another option to consider, I'm guessing that FreeBSD and others also
have some kernel compression support, so we can look at what they are doing.

I definitely agree that the general direction - of compressing the kernel -
is a good one.

penberg pushed a commit that referenced this issue Jan 13, 2014
See

  scripts/trace.py prof-wait -h

The command is using sched_wait and sched_wait_ret tracepoints to
calculate the amount of time a thread was waiting. Samples are
collected and presented in a form of call graph tree.

By default callees are closer to the root. To inverse the order pass
-r|--caller-oriented.

If there is too much output, it can be narrowed down using
--max-levels and --min-duration options.

The presented time spectrum can be narrowed down using --since and --until
options which accept timestamps.

Example:

  scripts/trace.py prof-wait --max-levels 3 trace-file

=== Thread 0xffffc0003eaeb010 ===

12.43 s (100.00%, #7696) All
 |-- 12.43 s (99.99%, #7658) sched::thread::do_wait_until
 |    |-- 10.47 s (84.22%, #6417) condvar::wait(lockfree::mutex*, unsigned long)
 |    |    condvar_wait
 |    |    |-- 6.47 s (52.08%, #6250) cv_timedwait
 |    |    |    txg_delay
 |    |    |    dsl_pool_tempreserve_space
 |    |    |    dsl_dir_tempreserve_space
 |    |    |    dmu_tx_try_assign
 |    |    |    dmu_tx_assign
 |    |    |
 |    |    |-- 2.37 s (19.06%, #24) arc_read_nolock
 |    |    |    arc_read
 |    |    |    dsl_read
 |    |    |    traverse_visitbp
 |    |    |
 |    |    |-- 911.75 ms (7.33%, #3) txg_wait_open
 |    |    |    dmu_tx_wait
 |    |    |    zfs_write
 |    |    |    vfs_file::write(uio*, int)
 |    |    |    sys_write
 |    |    |    pwritev
 |    |    |    writev
 |    |    |    __stdio_write
 |    |    |    __fwritex
 |    |    |    fwrite
 |    |    |    0x100000005a5f
 |    |    |    osv::run(std::string, int, char**, int*)

By default every thread has a separate tree, because duration is best
interpreted in the context of particular thread. There is however an
option to merge samples from all threads into one tree:
-m|--merge-threads. It may be useful if you want to inspect all paths
going in/out to/from particular function. The direction can be changed
with -r|--caller-oriented option. Function names is passed to --function
parameter.

Example: check where zfs_write() blocks:

  scripts/trace.py prof-wait -rm --function=zfs_write trace-file

7.46 s (100.00%, #7314) All
 zfs_write
 |-- 6.48 s (86.85%, #6371) dmu_tx_assign
 |    |-- 6.47 s (86.75%, #6273) dmu_tx_try_assign
 |    |    dsl_dir_tempreserve_space
 |    |    |-- 6.47 s (86.75%, #6248) dsl_pool_tempreserve_space
 |    |    |    txg_delay
 |    |    |    cv_timedwait
 |    |    |    condvar_wait
 |    |    |    condvar::wait(lockfree::mutex*, unsigned long)
 |    |    |    sched::thread::do_wait_until
 |    |    |
 |    |    |-- 87.87 us (0.00%, #24) mutex_lock
 |    |    |    sched::thread::do_wait_until
 |    |    |
 |    |    \-- 6.40 us (0.00%, #1) dsl_dir_tempreserve_impl
 |    |         mutex_lock
 |    |         sched::thread::do_wait_until
 |    |
 |    \-- 7.32 ms (0.10%, #98) mutex_lock
 |         sched::thread::do_wait_until
 |
 |-- 911.75 ms (12.22%, #3) dmu_tx_wait
 |    txg_wait_open
 |    condvar_wait
 |    condvar::wait(lockfree::mutex*, unsigned long)
 |    sched::thread::do_wait_until

Signed-off-by: Tomasz Grabiec <tgrabiec@cloudius-systems.com>
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
@efpiva
Copy link
Contributor

efpiva commented Jan 22, 2014

Just for the record, I'm working on this issue...

I already have a bzloader that has the core of the bunzip library without any syscall, unconpressing the loader-stripped.bz2 available as a char[].

The size was reduced by half.

I'll start putting everything together this night, may have something to post next week.

@nyh
Copy link
Contributor

nyh commented Jan 26, 2014

On Wed, Jan 22, 2014 at 7:33 PM, efpiva notifications@github.com wrote:

Just for the record, I'm working on this issue...

I already have a bzloader that has the core of the bunzip library without
any syscall, unconpressing the loader-stripped.bz2 available as a char[].

The size was reduced by half.

I'll start putting everything together this night, may have something to
post next week.

Looking forward to seeing this, I'm a fan of the block sorting algorithm...

What is the "bunzip library", is it this?
http://www.bzip.org/1.0.5/bzip2-manual-1.0.5.html? According to what I read
about library, its "low-level" API doesn't do any IO, just memory-to-memory
decompression, so supposedly can work without any system calls, right out
of the box.

Nadav Har'El
nyh@cloudius-systems.com

@nyh
Copy link
Contributor

nyh commented May 7, 2014

Eduardo Piva already added a kernel-compression feature a few months ago - see commit 597db05

So @glommer or someone with permissions - please close this bug.

@glommer glommer closed this as completed May 7, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants