Skip to content

CachyOS Performance Guide

Mattscreative edited this page Dec 5, 2025 · 2 revisions

CachyOS Performance Guide

This guide explains how CachyOS achieves its performance improvements, what optimizations are used, and how to get the best performance from your system.


Table of Contents

  1. Understanding CachyOS Performance Optimizations
  2. CPU Instruction Set Optimizations
  3. BORE Scheduler
  4. Other Scheduler Options
  5. Link Time Optimization (LTO)
  6. Profile-Guided Optimization (PGO)
  7. BOLT Optimization
  8. Custom Kernel (linux-cachyos)
  9. Performance Tips
  10. Measuring Performance

Understanding CachyOS Performance Optimizations

What Makes CachyOS Fast?

CachyOS achieves better performance through multiple optimization techniques:

  1. CPU Instruction Set Optimizations - Packages compiled for modern CPU features
  2. Advanced Schedulers - Better CPU task scheduling (BORE, EEVDF, etc.)
  3. Link Time Optimization (LTO) - Compiler optimizations across entire programs
  4. Profile-Guided Optimization (PGO) - Packages optimized based on real usage
  5. BOLT Optimization - Binary-level optimizations for specific packages
  6. Custom Kernel - Optimized kernel with performance patches

Real-World Performance Benefits

What you'll notice:

  • Faster application startup - Programs launch quicker
  • Lower input lag - Mouse and keyboard feel more responsive
  • Smoother gaming - Better frame times and lower latency
  • Faster compilation - Developers build code faster
  • Better multitasking - System stays responsive under load
  • Improved battery life - More efficient CPU usage (on laptops)

CPU Instruction Set Optimizations

What Are CPU Instruction Sets?

CPU instruction sets are collections of commands that a CPU can execute. Newer CPUs support more advanced instruction sets that can perform operations faster.

What is an instruction set?

  • Instruction: A command that tells the CPU what to do
  • Set: A collection of available instructions
  • Example instructions: Add two numbers, multiply, load data from memory
  • Different CPUs: Support different instruction sets

Why do instruction sets matter?

  • Older CPUs: Support basic instructions (can do the job, but slower)
  • Newer CPUs: Support advanced instructions (can do the same job faster)
  • Optimized software: Uses advanced instructions when available
  • Result: Same program runs faster on newer CPUs with advanced instructions

Real-world analogy:

  • Older CPUs: Basic tools (hammer, screwdriver)
  • Can build things, but takes longer
  • More manual work required
  • Slower but gets the job done
  • Newer CPUs: Power tools (drill, impact driver)
  • Can build the same things, but much faster
  • Less manual work required
  • Faster and more efficient

How CachyOS uses this:

  • Compiles software: Uses advanced instructions if your CPU supports them
  • Result: Programs run faster on your specific CPU
  • Example: If your CPU supports AVX2, CachyOS uses AVX2 instructions (faster)
  • If CPU doesn't support it: Uses basic instructions (still works, just slower)

CachyOS Optimization Levels

CachyOS compiles packages for different CPU generations:

x86-64-v3 (Recommended Minimum)

What it is:

  • Optimized for CPUs from 2015 onwards
  • Uses AVX, AVX2, and other modern instructions

Supported CPUs:

  • Intel: Haswell (4th gen Core) or newer
  • Examples: Core i5-4xxx, Core i7-4xxx, Xeon E3 v3+
  • AMD: Excavator or newer
  • Examples: FX-8xxx, Ryzen series, EPYC

Performance gain:

  • 5-15% faster than generic x86-64
  • Better for most modern systems

How to check:

# Check if your CPU supports x86-64-v3
lscpu | grep "Flags" | grep -i "avx2"

What this command does:

  • lscpu: Lists CPU information
  • Shows CPU model, cores, architecture, and features
  • | grep "Flags": Finds the line showing CPU feature flags
  • Flags: CPU features/instructions your processor supports
  • | grep -i "avx2": Searches for "avx2" (Advanced Vector Extensions 2)
  • -i: Case-insensitive search
  • AVX2: A CPU instruction set required for x86-64-v3

Example output if supported:

Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp

What to look for:

  • If you see avx2 in the output: Your CPU supports x86-64-v3
  • If you don't see avx2: Your CPU doesn't support v3 (use lower optimization level)

Example output if NOT supported:

Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm

What this means:

  • No avx2 in the list
  • Your CPU is older (pre-2015)
  • Use standard x86-64 packages (not v3 optimized)

Alternative check method:

# Check CPU model directly
lscpu | grep "Model name"

Example output:

Model name:            Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz

What this tells you:

  • i7-9700K: 9th generation Intel Core (2018)
  • This CPU supports x86-64-v3 (Haswell or newer)
  • You can use v3 optimized packages

x86-64-v4 (Better Performance)

What it is:

  • Optimized for newer CPUs (2016+)
  • Uses AVX-512 and other advanced instructions

Supported CPUs:

  • Intel: Skylake (6th gen Core) or newer
  • Examples: Core i5-6xxx, Core i7-6xxx, Xeon Scalable
  • AMD: Zen (Ryzen 1000 series) or newer
  • Examples: Ryzen 5 1600, Ryzen 7 1800X, EPYC

Performance gain:

  • 10-20% faster than x86-64-v3
  • Best for systems from 2016 onwards

How to check:

# Check if your CPU supports x86-64-v4
lscpu | grep "Flags" | grep -i "avx512"

What this command does:

  • lscpu: Lists CPU information
  • | grep "Flags": Finds CPU feature flags
  • | grep -i "avx512": Searches for "avx512" (Advanced Vector Extensions 512)
  • AVX-512: A CPU instruction set required for x86-64-v4
  • More advanced than AVX2 (used in v3)

Example output if supported:

Flags: ... avx2 avx512f avx512dq avx512cd avx512bw avx512vl ...

What to look for:

  • If you see avx512 (or avx512f, avx512dq, etc.): Your CPU supports x86-64-v4
  • Multiple AVX-512 variants indicate full v4 support

Example output if NOT supported:

Flags: ... avx2 ... (no avx512)

What this means:

  • Your CPU supports v3 (has AVX2) but not v4 (no AVX-512)
  • Use x86-64-v3 optimized packages
  • Still get good performance improvements

Important note:

  • Some newer CPUs (especially AMD Ryzen 5000 series) may not have AVX-512
  • This doesn't mean they're slow - they're just optimized differently
  • Check your specific CPU model for best optimization level

Zen4 (Best for AMD Ryzen 7000+)

What it is:

  • Specifically optimized for AMD Zen 4 architecture
  • Latest optimizations for newest AMD CPUs

Supported CPUs:

  • AMD: Ryzen 7000 series (Zen 4) or newer
  • Examples: Ryzen 5 7600X, Ryzen 7 7700X, Ryzen 9 7900X
  • EPYC 9004 series

Performance gain:

  • Best performance on supported CPUs
  • Optimized for latest AMD architecture features

How to check:

# Check your CPU model
lscpu | grep "Model name"

# Look for "Ryzen 7xxx" or "7xxx" in the name

Which Optimization Level Should You Use?

General rule:

  • Use the highest level your CPU supports
  • Higher levels = better performance (if CPU supports it)
  • Using a level your CPU doesn't support will cause errors

Recommendations:

  • 2015-2016 CPU: Use x86-64-v3
  • 2016-2022 CPU: Use x86-64-v4
  • 2022+ AMD Ryzen: Use Zen4 if available
  • Not sure?: Use x86-64-v3 (most compatible)

BORE Scheduler

What is a CPU Scheduler?

The CPU scheduler is a critical part of the operating system that decides:

  • Which programs run on which CPU cores: Distributes work across CPU cores
  • When programs get CPU time: Decides when each program gets to run
  • How CPU time is distributed: Shares CPU time fairly (or prioritizes certain tasks)

What is a CPU core?

  • CPU core: A processing unit inside your CPU
  • Modern CPUs: Have multiple cores (2, 4, 6, 8, 12, 16, etc.)
  • Each core: Can run one program at a time (or multiple with hyperthreading)
  • Scheduler's job: Decide which program runs on which core

Why it matters:

  • Affects system responsiveness: How quickly your system responds to you
  • Good scheduler: System feels snappy and responsive
  • Bad scheduler: System feels sluggish and laggy
  • Determines input lag: Delay between your action and system response
  • Example: Moving mouse → cursor moves (lower lag = better)
  • Gaming: Lower input lag = better gaming experience
  • Impacts gaming performance: How smoothly games run
  • Good scheduler: Games run smoothly, consistent frame times
  • Bad scheduler: Games stutter, inconsistent performance
  • Affects multitasking: Running multiple programs at once
  • Good scheduler: All programs run smoothly
  • Bad scheduler: Some programs lag when others are running

Real-world example:

  • Without good scheduler:
  • You're playing a game, then open a browser
  • Game stutters because browser gets too much CPU time
  • System feels unresponsive
  • With good scheduler:
  • You're playing a game, then open a browser
  • Game keeps running smoothly (gets priority)
  • Browser still works, but doesn't interfere with game
  • System stays responsive

What is BORE?

BORE stands for "Burst-Oriented Response Enhancer".

What does "Burst-Oriented" mean?

  • Burst: A sudden increase in activity (you click something, type, move mouse)
  • Oriented: Designed to handle bursts of activity
  • BORE's focus: Responds quickly when you interact with the system

What it does:

  • Prioritizes interactive tasks: Gives priority to things you're actively using
  • Interactive tasks: Mouse movement, keyboard input, games, applications you're using
  • Background tasks: File downloads, system updates, background processes
  • BORE's approach: Interactive tasks get CPU time first
  • Reduces latency: Makes things respond faster
  • Latency: Delay between action and response
  • Example: Click button → application responds (lower latency = faster response)
  • Improves responsiveness under load: System stays responsive even when busy
  • Under load: When CPU is busy (compiling, rendering, etc.)
  • BORE's benefit: System still feels responsive even when CPU is working hard

Key features:

Burst detection:

  • What it does: Identifies when you're actively using the system
  • How it works: Detects sudden increases in activity (mouse movement, keyboard input)
  • Result: System knows when you're interacting and prioritizes accordingly
  • Example: You move mouse → BORE detects burst → gives priority to your applications

Priority boost:

  • What it does: Gives interactive tasks more CPU time
  • How it works: Temporarily increases priority of tasks you're using
  • Result: Your active applications get more CPU time than background tasks
  • Example: Game you're playing gets more CPU time than background download

Low latency:

  • What it does: Reduces delay between input and response
  • How it works: Prioritizes tasks that need immediate response
  • Result: System responds faster to your actions
  • Example: Clicking a button → application responds almost instantly

How BORE Improves Performance

Gaming:

  • Lower input lag: Delay between your action and game response is reduced
  • What it means: Mouse movement, keyboard presses feel more immediate
  • Real-world impact: Games feel more responsive, easier to aim, better control
  • Example: Moving mouse in FPS game → crosshair moves almost instantly
  • More consistent frame times: Frame rendering times are more stable
  • What it means: Each frame takes similar time to render
  • Real-world impact: Smoother gameplay, less stuttering
  • Example: Game runs at 60 FPS consistently instead of jumping between 50-70 FPS
  • Better performance in CPU-intensive games: Games that need lots of CPU run better
  • What it means: Games that heavily use CPU get better performance
  • Real-world impact: Complex games run smoother, less lag
  • Example: Strategy games, simulation games, games with many NPCs run better

Desktop use:

  • Instant response to mouse/keyboard: Input devices respond immediately
  • What it means: Mouse cursor and keyboard input feel instant
  • Real-world impact: System feels snappy and responsive
  • Example: Moving mouse → cursor moves instantly, no delay
  • Smoother window animations: Window transitions are fluid
  • What it means: Opening, closing, resizing windows is smooth
  • Real-world impact: Desktop feels polished and professional
  • Example: Opening application → window animates smoothly, no stuttering
  • No stuttering when background tasks run: System stays smooth even when busy
  • What it means: Background tasks don't cause visual stuttering
  • Real-world impact: Can run updates, downloads, etc. without affecting desktop
  • Example: Downloading large file → desktop still smooth, no lag

Multitasking:

  • Active applications stay responsive: Programs you're using don't lag
  • What it means: Applications you're actively using get priority
  • Real-world impact: Can work with multiple programs without slowdown
  • Example: Browser, text editor, music player all run smoothly together
  • Background tasks don't interrupt your work: Background processes don't interfere
  • What it means: System updates, downloads, etc. don't slow down active work
  • Real-world impact: Can run background tasks without affecting productivity
  • Example: System updating packages → your work continues smoothly
  • Better balance between foreground and background: System balances priorities well
  • What it means: Active tasks get priority, but background tasks still progress
  • Real-world impact: Best of both worlds - responsive system and background progress
  • Example: Your work is responsive, but downloads still complete

BORE vs Standard Scheduler

Standard Linux scheduler (CFS - Completely Fair Scheduler):

  • Fair distribution of CPU time: All tasks get equal CPU time
  • What it means: Every program gets the same amount of CPU time
  • Problem: Doesn't prioritize what you're actively using
  • Result: Background tasks can slow down active work
  • All tasks treated equally: No priority for interactive tasks
  • What it means: Your active application gets same priority as background download
  • Problem: System doesn't know what you're actively using
  • Result: Can cause lag when background tasks are running
  • Can cause latency spikes: Sometimes has delays
  • What it means: System can occasionally feel unresponsive
  • Problem: Fair scheduling can cause delays for interactive tasks
  • Result: Mouse/keyboard input can feel laggy sometimes

BORE scheduler:

  • Prioritizes interactive tasks: Gives priority to things you're using
  • What it means: Active applications get more CPU time
  • Benefit: System knows what you're using and prioritizes it
  • Result: Active work stays responsive
  • Reduces latency for user actions: Input responds faster
  • What it means: Mouse/keyboard input gets immediate response
  • Benefit: System feels more responsive
  • Result: Lower input lag, faster response times
  • Better for desktop and gaming use: Optimized for interactive use
  • What it means: Designed for how people actually use computers
  • Benefit: Better experience for desktop users and gamers
  • Result: Smoother, more responsive system

Real-world difference:

Scenario 1: Playing a game while downloading files

  • With CFS (standard scheduler):
  • Game stutters when download starts
  • Input lag increases
  • Frame rate drops
  • Why: Download gets equal CPU time, interferes with game
  • With BORE:
  • Game continues running smoothly
  • Input lag stays low
  • Frame rate remains stable
  • Why: Game gets priority, download runs in background

Scenario 2: Compiling code while browsing the web

  • With CFS (standard scheduler):
  • Browser becomes laggy during compilation
  • Scrolling stutters
  • Page loading slows down
  • Why: Compilation gets equal CPU time, slows down browser
  • With BORE:
  • Browser stays responsive
  • Scrolling is smooth
  • Page loading remains fast
  • Why: Browser gets priority when you're using it

Scenario 3: System update while working

  • With CFS (standard scheduler):
  • Active applications lag during update
  • Mouse/keyboard feel unresponsive
  • System feels sluggish
  • Why: Update process competes equally with active work
  • With BORE:
  • Active applications stay responsive
  • Mouse/keyboard feel instant
  • System feels snappy
  • Why: Active work gets priority, update runs in background

Specific improvements:

  • Mouse movement: Feels instant with BORE (no delay, immediate response)
  • Game input: Lower latency, smoother gameplay (better control, less lag)
  • Application switching: Faster, more responsive (Alt+Tab feels instant)

Other Scheduler Options

CachyOS offers multiple scheduler options for different use cases:

EEVDF Scheduler

What it is:

  • Earliest Eligible Virtual Deadline First
  • Modern scheduler from Linux kernel 6.6+
  • Fair and efficient

Best for:

  • General desktop use
  • Servers
  • Balanced workloads

Characteristics:

  • Fair CPU time distribution
  • Good for multitasking
  • Modern design

sched-ext Scheduler

What it is:

  • Extensible Scheduler Framework
  • Allows custom scheduler implementations
  • Experimental but powerful

Best for:

  • Advanced users
  • Custom scheduler development
  • Research and experimentation

Characteristics:

  • Highly customizable
  • Can implement custom scheduling policies
  • Requires kernel support

ECHO Scheduler

What it is:

  • Another scheduler option
  • Different scheduling algorithm
  • Alternative to BORE

Best for:

  • Users who want to try different schedulers
  • Specific workloads that benefit from ECHO

RT (Real-Time) Scheduler

What it is:

  • Real-Time scheduler
  • For time-critical applications
  • Guarantees response times

Best for:

  • Audio production
  • Real-time applications
  • Time-critical tasks

Characteristics:

  • Predictable timing
  • Low latency guarantees
  • May affect other applications

Which Scheduler Should You Use?

Recommendations:

  • Most users: BORE (default) - best for desktop and gaming
  • Servers: EEVDF - fair and efficient
  • Audio production: RT - time-critical
  • Experimentation: sched-ext - customizable
  • Not sure: Stick with BORE (default)

How to change scheduler:

  • Select during installation
  • Or change kernel package later:
    # Install different kernel variant
    sudo pacman -S linux-cachyos-eevdf  # For EEVDF

Link Time Optimization (LTO)

What is LTO?

Link Time Optimization (LTO) is a compiler optimization technique that optimizes code across the entire program, not just individual files.

What is a compiler?

  • Compiler: Software that converts source code (human-readable) into machine code (computer-readable)
  • Optimization: Making code run faster and use resources more efficiently
  • Traditional compilation: Each file is compiled and optimized separately
  • LTO compilation: Entire program is analyzed and optimized together

How it works:

  1. Compiler analyzes entire program
  • Looks at all source files together (not separately)
  • Understands how different files interact
  • Sees relationships between code in different files
  • Benefit: Can make better optimization decisions
  1. Optimizes across file boundaries
  • Can optimize code that spans multiple files
  • Removes redundant code between files
  • Better function inlining across files
  • Benefit: More efficient code overall
  1. Removes unused code
  • Identifies functions and code that's never called
  • Removes dead code (code that can never execute)
  • Benefit: Smaller programs, faster execution
  1. Inlines functions more aggressively
  • Inlining: Replaces function calls with the actual function code
  • Why it helps: Eliminates function call overhead
  • LTO advantage: Can inline functions even when they're in different files
  • Benefit: Faster execution (no function call overhead)
  1. Better register allocation
  • Registers: Fast storage inside the CPU (much faster than RAM)
  • Allocation: Deciding which variables go in registers
  • LTO advantage: Can make better decisions across entire program
  • Benefit: More variables in fast registers = faster execution

Think of it like:

  • Without LTO: Optimizing each room of a house separately
  • Each room optimized independently
  • Doesn't consider how rooms connect
  • May miss optimization opportunities
  • Example: Two rooms both have heating - could share one system
  • With LTO: Optimizing the entire house as a whole
  • Considers entire house layout
  • Optimizes connections between rooms
  • Better overall optimization
  • Example: One heating system for entire house (more efficient)

Benefits of LTO

Performance improvements:

  • 5-15% faster execution: Programs run noticeably faster
  • Real-world impact: Applications feel more responsive
  • Example: Web browser loads pages 10% faster
  • Smaller binary sizes: Compiled programs are smaller
  • Why: Unused code is removed
  • Benefit: Less disk space, faster loading from disk
  • Better code optimization: More efficient code generation
  • Why: Compiler sees entire program
  • Benefit: Better optimization decisions
  • More efficient memory usage: Better memory access patterns
  • Why: Optimized code layout
  • Benefit: Better CPU cache usage

Real-world examples:

  • Application startup: 10-20% faster launch times
  • Game loading: Levels and assets load quicker
  • Compilation: Development tools compile code faster
  • System responsiveness: Overall system feels snappier

Trade-offs:

  • Longer compilation time: Takes more time to compile packages
  • Why: Compiler does more analysis (looks at entire program)
  • Impact: Only affects package builders, not end users
  • For you: You don't notice this (packages are pre-compiled)
  • More memory during compilation: Needs more RAM when compiling
  • Why: Analyzes entire program at once (needs more memory)
  • Impact: Only affects package builders, not end users
  • For you: No impact (you're not compiling packages)
  • Slightly larger package repository: LTO packages may be slightly larger
  • Why: Contains optimization metadata
  • Impact: Minimal (usually offset by smaller final binaries)
  • For you: Negligible impact on disk space

Is LTO worth it?

  • For end users: Absolutely! You get faster programs with no downsides
  • For package maintainers: Trade-off between build time and performance
  • CachyOS choice: Uses LTO for better performance (worth the trade-off)

What Packages Use LTO?

In CachyOS:

  • Core system packages
  • Frequently used applications
  • Performance-critical software

Examples:

  • Kernel
  • Desktop environments
  • System libraries
  • Development tools

Profile-Guided Optimization (PGO)

What is PGO?

Profile-Guided Optimization (PGO) optimizes packages based on how they're actually used in real-world scenarios.

What is profiling?

  • Profiling: Recording how a program runs in real use
  • Data collected: Which functions are called, how often, which code paths are taken
  • Purpose: Understand real-world usage patterns (not theoretical)
  • Result: Optimize based on actual usage, not compiler guesses

How it works:

  1. Package is compiled with profiling enabled
  • First compilation includes special profiling code
  • Profiling code records execution information as program runs
  • Creates "instrumented" binary (binary with tracking code built in)
  • Think of it: Like adding sensors to a car to see how it's driven
  1. Package is used in typical scenarios
  • Package is run in real-world situations
  • People use it normally (browse web, edit documents, play games, etc.)
  • Profiling code records what happens during use
  • Think of it: Driving the car normally while sensors record data
  1. Profiling data is collected
  • System gathers information about:
  • Which functions are called most: Most-used functions identified
  • Which code paths are taken: Common execution paths found
  • How often operations occur: Frequency of different operations
  • Execution time: How long different parts take
  • Think of it: Analyzing the sensor data to see driving patterns
  1. Package is recompiled using profiling data
  • Compiler reads the profiling data
  • Understands how the program is actually used
  • Optimizes based on real usage patterns (not guesses)
  • Creates final optimized binary
  • Think of it: Redesigning the car based on how it's actually driven
  1. Result: Optimized for actual usage patterns
  • Code is optimized for how it's really used
  • Frequently used code is optimized more (gets more attention)
  • Rarely used code is optimized less (saves compilation time)
  • Better overall performance for real-world use
  • Think of it: Car is now optimized for actual driving conditions

Think of it like:

  • Without PGO: Optimizing based on guesses
  • Compiler guesses what's important
  • May optimize wrong things (things rarely used)
  • May miss optimization opportunities (things frequently used)
  • Example: Optimizing a rarely-used feature instead of common one
  • With PGO: Optimizing based on real usage data
  • Compiler knows what's actually used
  • Optimizes the right things (frequently used code)
  • Better optimization decisions
  • Example: Optimizing the feature people use most

Benefits of PGO

Performance improvements:

  • 10-30% faster for optimized packages: Significant speedup
  • Real-world impact: Noticeably faster applications
  • Example: Web browser renders pages 20% faster
  • Better branch prediction: CPU predicts code paths better
  • Branch prediction: CPU guessing which code path will be taken (if/else, loops)
  • Better prediction: Less CPU stalls, faster execution
  • Why: Code layout optimized based on actual paths taken
  • More efficient code paths: Frequently used paths optimized more
  • Code paths: Different ways code can execute (different branches)
  • Optimization: Common paths get more optimization
  • Benefit: Most-used code runs fastest
  • Optimized for common operations: Common tasks run faster
  • Common operations: Things people do most often
  • Optimization: These get special attention
  • Benefit: Everyday tasks are faster

Real-world examples:

  • Web browsers: Faster page loading, smoother scrolling, quicker JavaScript execution
  • Compilers: Faster code compilation (compilers compile themselves with PGO)
  • Desktop environments: Smoother animations, faster window operations
  • System libraries: Better performance for all applications using them
  • Games: Faster loading, smoother gameplay

What gets optimized:

  • Frequently used functions: Functions called often are optimized more
  • Example: In a web browser, page rendering function gets optimized
  • Hot code paths: Code paths taken frequently are optimized
  • Example: Common user interactions get optimized
  • Common operations: Everyday operations run faster
  • Example: Opening files, saving documents, scrolling
  • Real-world usage patterns: Optimized for how software is actually used
  • Example: Optimized for typical user behavior, not edge cases

What Packages Use PGO?

In CachyOS:

  • Critical system components
  • Frequently used applications
  • Performance-sensitive software

Examples:

  • Web browsers
  • Compilers
  • System libraries
  • Desktop environments

BOLT Optimization

What is BOLT?

BOLT stands for Binary Optimization and Layout Tool.

What it does:

  • Optimizes compiled binaries at the binary level
  • Rearranges code for better CPU cache usage
  • Improves branch prediction
  • Optimizes hot code paths

How it works:

  1. Analyzes binary execution
  2. Identifies hot (frequently used) code
  3. Rearranges code layout
  4. Optimizes for CPU cache
  5. Improves performance

Benefits of BOLT

Performance improvements:

  • 5-20% faster for optimized binaries
  • Better CPU cache utilization
  • Improved branch prediction
  • More efficient code layout

What gets optimized:

  • Frequently executed code
  • Hot functions
  • Critical code paths

What Packages Use BOLT?

In CachyOS:

  • Selected high-impact packages
  • Performance-critical applications
  • Frequently used software

Examples:

  • Web browsers
  • Development tools
  • System utilities

Custom Kernel (linux-cachyos)

What is linux-cachyos?

linux-cachyos is CachyOS's custom-compiled Linux kernel with performance optimizations.

What's different:

  • BORE scheduler (or other scheduler options)
  • Optimized compilation flags
  • LTO compilation
  • Performance patches
  • Modern CPU optimizations

Kernel Variants

Available kernels:

  • linux-cachyos - Default with BORE scheduler
  • linux-cachyos-eevdf - EEVDF scheduler
  • linux-cachyos-sched-ext - sched-ext scheduler
  • linux-cachyos-rt - Real-time kernel

How to check your kernel:

# Check kernel version
uname -r

# Should show something like:
# 6.x.x-cachyos

Benefits of Custom Kernel

Performance:

  • Better scheduler (BORE)
  • Optimized compilation
  • Performance patches
  • Modern optimizations

Features:

  • Multiple scheduler options
  • Better hardware support
  • Performance improvements

Performance Tips

1. Use the Right CPU Optimization Level

Make sure you're using the highest optimization level your CPU supports:

# Check your CPU
lscpu

# Verify optimization level in package names
pacman -Q | grep cachyos

2. Choose the Right Scheduler

For most users:

  • Use BORE scheduler (default)
  • Best for desktop and gaming

For specific use cases:

  • Servers: EEVDF
  • Audio: RT
  • Experimentation: sched-ext

3. Keep System Updated

Regular updates include performance improvements:

# Update system regularly
sudo pacman -Syu

4. Use SSD for System Drive

SSD provides:

  • Faster boot times
  • Faster application launches
  • Better overall responsiveness

If using HDD:

  • Consider upgrading to SSD
  • Or use SSD for system, HDD for data

5. Optimize Desktop Environment

Lightweight DEs are faster:

  • XFCE, LXQt: Lightweight
  • KDE, GNOME: More features, more resources

Choose based on:

  • Your hardware capabilities
  • Your preferences
  • Performance requirements

6. Disable Unnecessary Services

Reduce background processes:

# Check running services
systemctl list-units --type=service --state=running

# Disable services you don't need
sudo systemctl disable service-name

7. Use Appropriate Graphics Drivers

Install proper drivers:

# Use chwd for hardware detection
sudo chwd -h

# Or install manually
# NVIDIA:
sudo pacman -S nvidia

# AMD: Usually works out of box

8. Monitor System Resources

Keep an eye on resource usage:

# Check CPU and memory
htop

# Or
top

Identify resource hogs:

  • Close unnecessary applications
  • Optimize startup programs
  • Manage browser tabs

Measuring Performance

Benchmarking Tools

CPU performance:

# Install benchmarking tools
sudo pacman -S sysbench

# Run CPU benchmark
sysbench cpu --threads=4 run

Disk performance:

# Test disk speed
sudo pacman -S hdparm

# Test read speed
sudo hdparm -tT /dev/sda

Memory performance:

# Test memory
sysbench memory --threads=4 run

Real-World Performance Tests

Application startup time:

  • Time how long applications take to launch
  • Compare before/after optimizations

Gaming performance:

  • Use in-game FPS counters
  • Monitor frame times
  • Check input lag

System responsiveness:

  • Test mouse/keyboard latency
  • Check window animation smoothness
  • Monitor system under load

Comparing Performance

Before/after comparisons:

  • Benchmark before optimizations
  • Apply optimizations
  • Benchmark again
  • Compare results

Note: Performance improvements vary by:

  • Hardware
  • Workload
  • Applications used
  • System configuration

Additional Resources


Summary

This guide covered:

  1. CPU instruction set optimizations - x86-64-v3, v4, Zen4
  2. BORE scheduler - Burst-Oriented Response Enhancer
  3. Other schedulers - EEVDF, sched-ext, RT, ECHO
  4. LTO - Link Time Optimization
  5. PGO - Profile-Guided Optimization
  6. BOLT - Binary Optimization and Layout Tool
  7. Custom kernel - linux-cachyos
  8. Performance tips - Getting the best performance
  9. Measuring performance - Benchmarking and testing

Key Takeaways:

  • CachyOS uses multiple optimization techniques
  • BORE scheduler improves responsiveness
  • CPU instruction set optimizations provide significant gains
  • LTO, PGO, and BOLT optimize packages further
  • Custom kernel adds performance improvements
  • Choose optimizations based on your hardware

This guide is based on the CachyOS Wiki and expanded with detailed explanations for beginners. For the most up-to-date performance information, always refer to the official CachyOS documentation.

Clone this wiki locally