Skip to content

Latest commit

 

History

History
30 lines (21 loc) · 1.69 KB

2021-04-28-Note-Mark2013Improving.md

File metadata and controls

30 lines (21 loc) · 1.69 KB
layout title date categories tags
post
Note: Improving restore speed for backup systems that use inline chunk-based deduplication.
2021-04-28 05:00:00 -0700
PaperReading
Deduplication

Reference

Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat. Improving restore speed for backup systems that use inline chunk-based deduplication. In Proc. of FAST 2013.

Problem Statement

  • Slow restore performance due to chunk fragmentation
    • Store chunks according to the order of appearance of unique chunks.
    • Restore the newer snapshot, the more random reads, the lower the performance.

Literature Summary

This paper provides a restore performance measurement method: container read per MB restored, which is the main source of expenses in the restore process. It simply addresses the restore performance issues by using the forward assembly area to take the place of the LRU cache. The main idea of the FAA is to load each container only once for each restore slice (Fill all the chunks from the container in the slice at one time). Then, they designed a container rewrite method, which performs deduplication to the income segment with the top T best old containers. It finds only part of the duplicate chunks but bound the container read per MB to T-related numbers.

Results

  • The less memory is used, the better the effect of FAA relative to LRU. When the memory size is not limited, the performance is the same.
  • FAA is less affected by container size than LRU.
  • The rewrite approach could bound the container read per MB to (T+5)/20, where container size is 4 MB, the segment size is 20 MB.