-
-
Notifications
You must be signed in to change notification settings - Fork 1
CachyOS Performance Guide
This guide explains how CachyOS achieves its performance improvements, what optimizations are used, and how to get the best performance from your system.
- Understanding CachyOS Performance Optimizations
- CPU Instruction Set Optimizations
- BORE Scheduler
- Other Scheduler Options
- Link Time Optimization (LTO)
- Profile-Guided Optimization (PGO)
- BOLT Optimization
- Custom Kernel (linux-cachyos)
- Performance Tips
- Measuring Performance
CachyOS achieves better performance through multiple optimization techniques:
- CPU Instruction Set Optimizations - Packages compiled for modern CPU features
- Advanced Schedulers - Better CPU task scheduling (BORE, EEVDF, etc.)
- Link Time Optimization (LTO) - Compiler optimizations across entire programs
- Profile-Guided Optimization (PGO) - Packages optimized based on real usage
- BOLT Optimization - Binary-level optimizations for specific packages
- Custom Kernel - Optimized kernel with performance patches
What you'll notice:
- Faster application startup - Programs launch quicker
- Lower input lag - Mouse and keyboard feel more responsive
- Smoother gaming - Better frame times and lower latency
- Faster compilation - Developers build code faster
- Better multitasking - System stays responsive under load
- Improved battery life - More efficient CPU usage (on laptops)
CPU instruction sets are collections of commands that a CPU can execute. Newer CPUs support more advanced instruction sets that can perform operations faster.
What is an instruction set?
- Instruction: A command that tells the CPU what to do
- Set: A collection of available instructions
- Example instructions: Add two numbers, multiply, load data from memory
- Different CPUs: Support different instruction sets
Why do instruction sets matter?
- Older CPUs: Support basic instructions (can do the job, but slower)
- Newer CPUs: Support advanced instructions (can do the same job faster)
- Optimized software: Uses advanced instructions when available
- Result: Same program runs faster on newer CPUs with advanced instructions
Real-world analogy:
- Older CPUs: Basic tools (hammer, screwdriver)
- Can build things, but takes longer
- More manual work required
- Slower but gets the job done
- Newer CPUs: Power tools (drill, impact driver)
- Can build the same things, but much faster
- Less manual work required
- Faster and more efficient
How CachyOS uses this:
- Compiles software: Uses advanced instructions if your CPU supports them
- Result: Programs run faster on your specific CPU
- Example: If your CPU supports AVX2, CachyOS uses AVX2 instructions (faster)
- If CPU doesn't support it: Uses basic instructions (still works, just slower)
CachyOS compiles packages for different CPU generations:
What it is:
- Optimized for CPUs from 2015 onwards
- Uses AVX, AVX2, and other modern instructions
Supported CPUs:
- Intel: Haswell (4th gen Core) or newer
- Examples: Core i5-4xxx, Core i7-4xxx, Xeon E3 v3+
- AMD: Excavator or newer
- Examples: FX-8xxx, Ryzen series, EPYC
Performance gain:
- 5-15% faster than generic x86-64
- Better for most modern systems
How to check:
# Check if your CPU supports x86-64-v3
lscpu | grep "Flags" | grep -i "avx2"What this command does:
-
lscpu: Lists CPU information - Shows CPU model, cores, architecture, and features
-
| grep "Flags": Finds the line showing CPU feature flags - Flags: CPU features/instructions your processor supports
-
| grep -i "avx2": Searches for "avx2" (Advanced Vector Extensions 2) -
-i: Case-insensitive search - AVX2: A CPU instruction set required for x86-64-v3
Example output if supported:
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
What to look for:
- If you see
avx2in the output: Your CPU supports x86-64-v3 - If you don't see
avx2: Your CPU doesn't support v3 (use lower optimization level)
Example output if NOT supported:
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm
What this means:
- No
avx2in the list - Your CPU is older (pre-2015)
- Use standard x86-64 packages (not v3 optimized)
Alternative check method:
# Check CPU model directly
lscpu | grep "Model name"Example output:
Model name: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
What this tells you:
- i7-9700K: 9th generation Intel Core (2018)
- This CPU supports x86-64-v3 (Haswell or newer)
- You can use v3 optimized packages
What it is:
- Optimized for newer CPUs (2016+)
- Uses AVX-512 and other advanced instructions
Supported CPUs:
- Intel: Skylake (6th gen Core) or newer
- Examples: Core i5-6xxx, Core i7-6xxx, Xeon Scalable
- AMD: Zen (Ryzen 1000 series) or newer
- Examples: Ryzen 5 1600, Ryzen 7 1800X, EPYC
Performance gain:
- 10-20% faster than x86-64-v3
- Best for systems from 2016 onwards
How to check:
# Check if your CPU supports x86-64-v4
lscpu | grep "Flags" | grep -i "avx512"What this command does:
-
lscpu: Lists CPU information -
| grep "Flags": Finds CPU feature flags -
| grep -i "avx512": Searches for "avx512" (Advanced Vector Extensions 512) - AVX-512: A CPU instruction set required for x86-64-v4
- More advanced than AVX2 (used in v3)
Example output if supported:
Flags: ... avx2 avx512f avx512dq avx512cd avx512bw avx512vl ...
What to look for:
- If you see
avx512(oravx512f,avx512dq, etc.): Your CPU supports x86-64-v4 - Multiple AVX-512 variants indicate full v4 support
Example output if NOT supported:
Flags: ... avx2 ... (no avx512)
What this means:
- Your CPU supports v3 (has AVX2) but not v4 (no AVX-512)
- Use x86-64-v3 optimized packages
- Still get good performance improvements
Important note:
- Some newer CPUs (especially AMD Ryzen 5000 series) may not have AVX-512
- This doesn't mean they're slow - they're just optimized differently
- Check your specific CPU model for best optimization level
What it is:
- Specifically optimized for AMD Zen 4 architecture
- Latest optimizations for newest AMD CPUs
Supported CPUs:
- AMD: Ryzen 7000 series (Zen 4) or newer
- Examples: Ryzen 5 7600X, Ryzen 7 7700X, Ryzen 9 7900X
- EPYC 9004 series
Performance gain:
- Best performance on supported CPUs
- Optimized for latest AMD architecture features
How to check:
# Check your CPU model
lscpu | grep "Model name"
# Look for "Ryzen 7xxx" or "7xxx" in the nameGeneral rule:
- Use the highest level your CPU supports
- Higher levels = better performance (if CPU supports it)
- Using a level your CPU doesn't support will cause errors
Recommendations:
- 2015-2016 CPU: Use x86-64-v3
- 2016-2022 CPU: Use x86-64-v4
- 2022+ AMD Ryzen: Use Zen4 if available
- Not sure?: Use x86-64-v3 (most compatible)
The CPU scheduler is a critical part of the operating system that decides:
- Which programs run on which CPU cores: Distributes work across CPU cores
- When programs get CPU time: Decides when each program gets to run
- How CPU time is distributed: Shares CPU time fairly (or prioritizes certain tasks)
What is a CPU core?
- CPU core: A processing unit inside your CPU
- Modern CPUs: Have multiple cores (2, 4, 6, 8, 12, 16, etc.)
- Each core: Can run one program at a time (or multiple with hyperthreading)
- Scheduler's job: Decide which program runs on which core
Why it matters:
- Affects system responsiveness: How quickly your system responds to you
- Good scheduler: System feels snappy and responsive
- Bad scheduler: System feels sluggish and laggy
- Determines input lag: Delay between your action and system response
- Example: Moving mouse → cursor moves (lower lag = better)
- Gaming: Lower input lag = better gaming experience
- Impacts gaming performance: How smoothly games run
- Good scheduler: Games run smoothly, consistent frame times
- Bad scheduler: Games stutter, inconsistent performance
- Affects multitasking: Running multiple programs at once
- Good scheduler: All programs run smoothly
- Bad scheduler: Some programs lag when others are running
Real-world example:
- Without good scheduler:
- You're playing a game, then open a browser
- Game stutters because browser gets too much CPU time
- System feels unresponsive
- With good scheduler:
- You're playing a game, then open a browser
- Game keeps running smoothly (gets priority)
- Browser still works, but doesn't interfere with game
- System stays responsive
BORE stands for "Burst-Oriented Response Enhancer".
What does "Burst-Oriented" mean?
- Burst: A sudden increase in activity (you click something, type, move mouse)
- Oriented: Designed to handle bursts of activity
- BORE's focus: Responds quickly when you interact with the system
What it does:
- Prioritizes interactive tasks: Gives priority to things you're actively using
- Interactive tasks: Mouse movement, keyboard input, games, applications you're using
- Background tasks: File downloads, system updates, background processes
- BORE's approach: Interactive tasks get CPU time first
- Reduces latency: Makes things respond faster
- Latency: Delay between action and response
- Example: Click button → application responds (lower latency = faster response)
- Improves responsiveness under load: System stays responsive even when busy
- Under load: When CPU is busy (compiling, rendering, etc.)
- BORE's benefit: System still feels responsive even when CPU is working hard
Key features:
Burst detection:
- What it does: Identifies when you're actively using the system
- How it works: Detects sudden increases in activity (mouse movement, keyboard input)
- Result: System knows when you're interacting and prioritizes accordingly
- Example: You move mouse → BORE detects burst → gives priority to your applications
Priority boost:
- What it does: Gives interactive tasks more CPU time
- How it works: Temporarily increases priority of tasks you're using
- Result: Your active applications get more CPU time than background tasks
- Example: Game you're playing gets more CPU time than background download
Low latency:
- What it does: Reduces delay between input and response
- How it works: Prioritizes tasks that need immediate response
- Result: System responds faster to your actions
- Example: Clicking a button → application responds almost instantly
Gaming:
- Lower input lag: Delay between your action and game response is reduced
- What it means: Mouse movement, keyboard presses feel more immediate
- Real-world impact: Games feel more responsive, easier to aim, better control
- Example: Moving mouse in FPS game → crosshair moves almost instantly
- More consistent frame times: Frame rendering times are more stable
- What it means: Each frame takes similar time to render
- Real-world impact: Smoother gameplay, less stuttering
- Example: Game runs at 60 FPS consistently instead of jumping between 50-70 FPS
- Better performance in CPU-intensive games: Games that need lots of CPU run better
- What it means: Games that heavily use CPU get better performance
- Real-world impact: Complex games run smoother, less lag
- Example: Strategy games, simulation games, games with many NPCs run better
Desktop use:
- Instant response to mouse/keyboard: Input devices respond immediately
- What it means: Mouse cursor and keyboard input feel instant
- Real-world impact: System feels snappy and responsive
- Example: Moving mouse → cursor moves instantly, no delay
- Smoother window animations: Window transitions are fluid
- What it means: Opening, closing, resizing windows is smooth
- Real-world impact: Desktop feels polished and professional
- Example: Opening application → window animates smoothly, no stuttering
- No stuttering when background tasks run: System stays smooth even when busy
- What it means: Background tasks don't cause visual stuttering
- Real-world impact: Can run updates, downloads, etc. without affecting desktop
- Example: Downloading large file → desktop still smooth, no lag
Multitasking:
- Active applications stay responsive: Programs you're using don't lag
- What it means: Applications you're actively using get priority
- Real-world impact: Can work with multiple programs without slowdown
- Example: Browser, text editor, music player all run smoothly together
- Background tasks don't interrupt your work: Background processes don't interfere
- What it means: System updates, downloads, etc. don't slow down active work
- Real-world impact: Can run background tasks without affecting productivity
- Example: System updating packages → your work continues smoothly
- Better balance between foreground and background: System balances priorities well
- What it means: Active tasks get priority, but background tasks still progress
- Real-world impact: Best of both worlds - responsive system and background progress
- Example: Your work is responsive, but downloads still complete
Standard Linux scheduler (CFS - Completely Fair Scheduler):
- Fair distribution of CPU time: All tasks get equal CPU time
- What it means: Every program gets the same amount of CPU time
- Problem: Doesn't prioritize what you're actively using
- Result: Background tasks can slow down active work
- All tasks treated equally: No priority for interactive tasks
- What it means: Your active application gets same priority as background download
- Problem: System doesn't know what you're actively using
- Result: Can cause lag when background tasks are running
- Can cause latency spikes: Sometimes has delays
- What it means: System can occasionally feel unresponsive
- Problem: Fair scheduling can cause delays for interactive tasks
- Result: Mouse/keyboard input can feel laggy sometimes
BORE scheduler:
- Prioritizes interactive tasks: Gives priority to things you're using
- What it means: Active applications get more CPU time
- Benefit: System knows what you're using and prioritizes it
- Result: Active work stays responsive
- Reduces latency for user actions: Input responds faster
- What it means: Mouse/keyboard input gets immediate response
- Benefit: System feels more responsive
- Result: Lower input lag, faster response times
- Better for desktop and gaming use: Optimized for interactive use
- What it means: Designed for how people actually use computers
- Benefit: Better experience for desktop users and gamers
- Result: Smoother, more responsive system
Real-world difference:
Scenario 1: Playing a game while downloading files
- With CFS (standard scheduler):
- Game stutters when download starts
- Input lag increases
- Frame rate drops
- Why: Download gets equal CPU time, interferes with game
- With BORE:
- Game continues running smoothly
- Input lag stays low
- Frame rate remains stable
- Why: Game gets priority, download runs in background
Scenario 2: Compiling code while browsing the web
- With CFS (standard scheduler):
- Browser becomes laggy during compilation
- Scrolling stutters
- Page loading slows down
- Why: Compilation gets equal CPU time, slows down browser
- With BORE:
- Browser stays responsive
- Scrolling is smooth
- Page loading remains fast
- Why: Browser gets priority when you're using it
Scenario 3: System update while working
- With CFS (standard scheduler):
- Active applications lag during update
- Mouse/keyboard feel unresponsive
- System feels sluggish
- Why: Update process competes equally with active work
- With BORE:
- Active applications stay responsive
- Mouse/keyboard feel instant
- System feels snappy
- Why: Active work gets priority, update runs in background
Specific improvements:
- Mouse movement: Feels instant with BORE (no delay, immediate response)
- Game input: Lower latency, smoother gameplay (better control, less lag)
- Application switching: Faster, more responsive (Alt+Tab feels instant)
CachyOS offers multiple scheduler options for different use cases:
What it is:
- Earliest Eligible Virtual Deadline First
- Modern scheduler from Linux kernel 6.6+
- Fair and efficient
Best for:
- General desktop use
- Servers
- Balanced workloads
Characteristics:
- Fair CPU time distribution
- Good for multitasking
- Modern design
What it is:
- Extensible Scheduler Framework
- Allows custom scheduler implementations
- Experimental but powerful
Best for:
- Advanced users
- Custom scheduler development
- Research and experimentation
Characteristics:
- Highly customizable
- Can implement custom scheduling policies
- Requires kernel support
What it is:
- Another scheduler option
- Different scheduling algorithm
- Alternative to BORE
Best for:
- Users who want to try different schedulers
- Specific workloads that benefit from ECHO
What it is:
- Real-Time scheduler
- For time-critical applications
- Guarantees response times
Best for:
- Audio production
- Real-time applications
- Time-critical tasks
Characteristics:
- Predictable timing
- Low latency guarantees
- May affect other applications
Recommendations:
- Most users: BORE (default) - best for desktop and gaming
- Servers: EEVDF - fair and efficient
- Audio production: RT - time-critical
- Experimentation: sched-ext - customizable
- Not sure: Stick with BORE (default)
How to change scheduler:
- Select during installation
- Or change kernel package later:
# Install different kernel variant sudo pacman -S linux-cachyos-eevdf # For EEVDF
Link Time Optimization (LTO) is a compiler optimization technique that optimizes code across the entire program, not just individual files.
What is a compiler?
- Compiler: Software that converts source code (human-readable) into machine code (computer-readable)
- Optimization: Making code run faster and use resources more efficiently
- Traditional compilation: Each file is compiled and optimized separately
- LTO compilation: Entire program is analyzed and optimized together
How it works:
- Compiler analyzes entire program
- Looks at all source files together (not separately)
- Understands how different files interact
- Sees relationships between code in different files
- Benefit: Can make better optimization decisions
- Optimizes across file boundaries
- Can optimize code that spans multiple files
- Removes redundant code between files
- Better function inlining across files
- Benefit: More efficient code overall
- Removes unused code
- Identifies functions and code that's never called
- Removes dead code (code that can never execute)
- Benefit: Smaller programs, faster execution
- Inlines functions more aggressively
- Inlining: Replaces function calls with the actual function code
- Why it helps: Eliminates function call overhead
- LTO advantage: Can inline functions even when they're in different files
- Benefit: Faster execution (no function call overhead)
- Better register allocation
- Registers: Fast storage inside the CPU (much faster than RAM)
- Allocation: Deciding which variables go in registers
- LTO advantage: Can make better decisions across entire program
- Benefit: More variables in fast registers = faster execution
Think of it like:
- Without LTO: Optimizing each room of a house separately
- Each room optimized independently
- Doesn't consider how rooms connect
- May miss optimization opportunities
- Example: Two rooms both have heating - could share one system
- With LTO: Optimizing the entire house as a whole
- Considers entire house layout
- Optimizes connections between rooms
- Better overall optimization
- Example: One heating system for entire house (more efficient)
Performance improvements:
- 5-15% faster execution: Programs run noticeably faster
- Real-world impact: Applications feel more responsive
- Example: Web browser loads pages 10% faster
- Smaller binary sizes: Compiled programs are smaller
- Why: Unused code is removed
- Benefit: Less disk space, faster loading from disk
- Better code optimization: More efficient code generation
- Why: Compiler sees entire program
- Benefit: Better optimization decisions
- More efficient memory usage: Better memory access patterns
- Why: Optimized code layout
- Benefit: Better CPU cache usage
Real-world examples:
- Application startup: 10-20% faster launch times
- Game loading: Levels and assets load quicker
- Compilation: Development tools compile code faster
- System responsiveness: Overall system feels snappier
Trade-offs:
- Longer compilation time: Takes more time to compile packages
- Why: Compiler does more analysis (looks at entire program)
- Impact: Only affects package builders, not end users
- For you: You don't notice this (packages are pre-compiled)
- More memory during compilation: Needs more RAM when compiling
- Why: Analyzes entire program at once (needs more memory)
- Impact: Only affects package builders, not end users
- For you: No impact (you're not compiling packages)
- Slightly larger package repository: LTO packages may be slightly larger
- Why: Contains optimization metadata
- Impact: Minimal (usually offset by smaller final binaries)
- For you: Negligible impact on disk space
Is LTO worth it?
- For end users: Absolutely! You get faster programs with no downsides
- For package maintainers: Trade-off between build time and performance
- CachyOS choice: Uses LTO for better performance (worth the trade-off)
In CachyOS:
- Core system packages
- Frequently used applications
- Performance-critical software
Examples:
- Kernel
- Desktop environments
- System libraries
- Development tools
Profile-Guided Optimization (PGO) optimizes packages based on how they're actually used in real-world scenarios.
What is profiling?
- Profiling: Recording how a program runs in real use
- Data collected: Which functions are called, how often, which code paths are taken
- Purpose: Understand real-world usage patterns (not theoretical)
- Result: Optimize based on actual usage, not compiler guesses
How it works:
- Package is compiled with profiling enabled
- First compilation includes special profiling code
- Profiling code records execution information as program runs
- Creates "instrumented" binary (binary with tracking code built in)
- Think of it: Like adding sensors to a car to see how it's driven
- Package is used in typical scenarios
- Package is run in real-world situations
- People use it normally (browse web, edit documents, play games, etc.)
- Profiling code records what happens during use
- Think of it: Driving the car normally while sensors record data
- Profiling data is collected
- System gathers information about:
- Which functions are called most: Most-used functions identified
- Which code paths are taken: Common execution paths found
- How often operations occur: Frequency of different operations
- Execution time: How long different parts take
- Think of it: Analyzing the sensor data to see driving patterns
- Package is recompiled using profiling data
- Compiler reads the profiling data
- Understands how the program is actually used
- Optimizes based on real usage patterns (not guesses)
- Creates final optimized binary
- Think of it: Redesigning the car based on how it's actually driven
- Result: Optimized for actual usage patterns
- Code is optimized for how it's really used
- Frequently used code is optimized more (gets more attention)
- Rarely used code is optimized less (saves compilation time)
- Better overall performance for real-world use
- Think of it: Car is now optimized for actual driving conditions
Think of it like:
- Without PGO: Optimizing based on guesses
- Compiler guesses what's important
- May optimize wrong things (things rarely used)
- May miss optimization opportunities (things frequently used)
- Example: Optimizing a rarely-used feature instead of common one
- With PGO: Optimizing based on real usage data
- Compiler knows what's actually used
- Optimizes the right things (frequently used code)
- Better optimization decisions
- Example: Optimizing the feature people use most
Performance improvements:
- 10-30% faster for optimized packages: Significant speedup
- Real-world impact: Noticeably faster applications
- Example: Web browser renders pages 20% faster
- Better branch prediction: CPU predicts code paths better
- Branch prediction: CPU guessing which code path will be taken (if/else, loops)
- Better prediction: Less CPU stalls, faster execution
- Why: Code layout optimized based on actual paths taken
- More efficient code paths: Frequently used paths optimized more
- Code paths: Different ways code can execute (different branches)
- Optimization: Common paths get more optimization
- Benefit: Most-used code runs fastest
- Optimized for common operations: Common tasks run faster
- Common operations: Things people do most often
- Optimization: These get special attention
- Benefit: Everyday tasks are faster
Real-world examples:
- Web browsers: Faster page loading, smoother scrolling, quicker JavaScript execution
- Compilers: Faster code compilation (compilers compile themselves with PGO)
- Desktop environments: Smoother animations, faster window operations
- System libraries: Better performance for all applications using them
- Games: Faster loading, smoother gameplay
What gets optimized:
- Frequently used functions: Functions called often are optimized more
- Example: In a web browser, page rendering function gets optimized
- Hot code paths: Code paths taken frequently are optimized
- Example: Common user interactions get optimized
- Common operations: Everyday operations run faster
- Example: Opening files, saving documents, scrolling
- Real-world usage patterns: Optimized for how software is actually used
- Example: Optimized for typical user behavior, not edge cases
In CachyOS:
- Critical system components
- Frequently used applications
- Performance-sensitive software
Examples:
- Web browsers
- Compilers
- System libraries
- Desktop environments
BOLT stands for Binary Optimization and Layout Tool.
What it does:
- Optimizes compiled binaries at the binary level
- Rearranges code for better CPU cache usage
- Improves branch prediction
- Optimizes hot code paths
How it works:
- Analyzes binary execution
- Identifies hot (frequently used) code
- Rearranges code layout
- Optimizes for CPU cache
- Improves performance
Performance improvements:
- 5-20% faster for optimized binaries
- Better CPU cache utilization
- Improved branch prediction
- More efficient code layout
What gets optimized:
- Frequently executed code
- Hot functions
- Critical code paths
In CachyOS:
- Selected high-impact packages
- Performance-critical applications
- Frequently used software
Examples:
- Web browsers
- Development tools
- System utilities
linux-cachyos is CachyOS's custom-compiled Linux kernel with performance optimizations.
What's different:
- BORE scheduler (or other scheduler options)
- Optimized compilation flags
- LTO compilation
- Performance patches
- Modern CPU optimizations
Available kernels:
-
linux-cachyos- Default with BORE scheduler -
linux-cachyos-eevdf- EEVDF scheduler -
linux-cachyos-sched-ext- sched-ext scheduler -
linux-cachyos-rt- Real-time kernel
How to check your kernel:
# Check kernel version
uname -r
# Should show something like:
# 6.x.x-cachyosPerformance:
- Better scheduler (BORE)
- Optimized compilation
- Performance patches
- Modern optimizations
Features:
- Multiple scheduler options
- Better hardware support
- Performance improvements
Make sure you're using the highest optimization level your CPU supports:
# Check your CPU
lscpu
# Verify optimization level in package names
pacman -Q | grep cachyosFor most users:
- Use BORE scheduler (default)
- Best for desktop and gaming
For specific use cases:
- Servers: EEVDF
- Audio: RT
- Experimentation: sched-ext
Regular updates include performance improvements:
# Update system regularly
sudo pacman -SyuSSD provides:
- Faster boot times
- Faster application launches
- Better overall responsiveness
If using HDD:
- Consider upgrading to SSD
- Or use SSD for system, HDD for data
Lightweight DEs are faster:
- XFCE, LXQt: Lightweight
- KDE, GNOME: More features, more resources
Choose based on:
- Your hardware capabilities
- Your preferences
- Performance requirements
Reduce background processes:
# Check running services
systemctl list-units --type=service --state=running
# Disable services you don't need
sudo systemctl disable service-nameInstall proper drivers:
# Use chwd for hardware detection
sudo chwd -h
# Or install manually
# NVIDIA:
sudo pacman -S nvidia
# AMD: Usually works out of boxKeep an eye on resource usage:
# Check CPU and memory
htop
# Or
topIdentify resource hogs:
- Close unnecessary applications
- Optimize startup programs
- Manage browser tabs
CPU performance:
# Install benchmarking tools
sudo pacman -S sysbench
# Run CPU benchmark
sysbench cpu --threads=4 runDisk performance:
# Test disk speed
sudo pacman -S hdparm
# Test read speed
sudo hdparm -tT /dev/sdaMemory performance:
# Test memory
sysbench memory --threads=4 runApplication startup time:
- Time how long applications take to launch
- Compare before/after optimizations
Gaming performance:
- Use in-game FPS counters
- Monitor frame times
- Check input lag
System responsiveness:
- Test mouse/keyboard latency
- Check window animation smoothness
- Monitor system under load
Before/after comparisons:
- Benchmark before optimizations
- Apply optimizations
- Benchmark again
- Compare results
Note: Performance improvements vary by:
- Hardware
- Workload
- Applications used
- System configuration
- CachyOS Getting Started Guide - System overview
- CachyOS Installation Guide - Installation instructions
- CachyOS Tools Guide - System tools
- CachyOS Wiki: https://wiki.cachyos.org/
- BORE Scheduler: Information about BORE scheduler
- Arch Linux Performance: https://wiki.archlinux.org/title/Improving_performance
This guide covered:
- CPU instruction set optimizations - x86-64-v3, v4, Zen4
- BORE scheduler - Burst-Oriented Response Enhancer
- Other schedulers - EEVDF, sched-ext, RT, ECHO
- LTO - Link Time Optimization
- PGO - Profile-Guided Optimization
- BOLT - Binary Optimization and Layout Tool
- Custom kernel - linux-cachyos
- Performance tips - Getting the best performance
- Measuring performance - Benchmarking and testing
Key Takeaways:
- CachyOS uses multiple optimization techniques
- BORE scheduler improves responsiveness
- CPU instruction set optimizations provide significant gains
- LTO, PGO, and BOLT optimize packages further
- Custom kernel adds performance improvements
- Choose optimizations based on your hardware
This guide is based on the CachyOS Wiki and expanded with detailed explanations for beginners. For the most up-to-date performance information, always refer to the official CachyOS documentation.