Skip to content

Commit

Permalink
Ext4: Uninitialized Block Groups
Browse files Browse the repository at this point in the history
In pass1 of e2fsck, every inode table in the fileystem is scanned and checked,
regardless of whether it is in use.  This is this the most time consuming part
of the filesystem check.  The unintialized block group feature can greatly
reduce e2fsck time by eliminating checking of uninitialized inodes.

With this feature, there is a a high water mark of used inodes for each block
group.  Block and inode bitmaps can be uninitialized on disk via a flag in the
group descriptor to avoid reading or scanning them at e2fsck time.  A checksum
of each group descriptor is used to ensure that corruption in the group
descriptor's bit flags does not cause incorrect operation.

The feature is enabled through a mkfs option

	mke2fs /dev/ -O uninit_groups

A patch adding support for uninitialized block groups to e2fsprogs tools has
been posted to the linux-ext4 mailing list.

The patches have been stress tested with fsstress and fsx.  In performance
tests testing e2fsck time, we have seen that e2fsck time on ext3 grows
linearly with the total number of inodes in the filesytem.  In ext4 with the
uninitialized block groups feature, the e2fsck time is constant, based
solely on the number of used inodes rather than the total inode count.
Since typical ext4 filesystems only use 1-10% of their inodes, this feature can
greatly reduce e2fsck time for users.  With performance improvement of 2-20
times, depending on how full the filesystem is.

The attached graph shows the major improvements in e2fsck times in filesystems
with a large total inode count, but few inodes in use.

In each group descriptor if we have

EXT4_BG_INODE_UNINIT set in bg_flags:
        Inode table is not initialized/used in this group. So we can skip
        the consistency check during fsck.
EXT4_BG_BLOCK_UNINIT set in bg_flags:
        No block in the group is used. So we can skip the block bitmap
        verification for this group.

We also add two new fields to group descriptor as a part of
uninitialized group patch.

        __le16  bg_itable_unused;       /* Unused inodes count */
        __le16  bg_checksum;            /* crc16(sb_uuid+group+desc) */

bg_itable_unused:

If we have EXT4_BG_INODE_UNINIT not set in bg_flags
then bg_itable_unused will give the offset within
the inode table till the inodes are used. This can be
used by fsck to skip list of inodes that are marked unused.

bg_checksum:
Now that we depend on bg_flags and bg_itable_unused to determine
the block and inode usage, we need to make sure group descriptor
is not corrupt. We add checksum to group descriptor to
detect corruption. If the descriptor is found to be corrupt, we
mark all the blocks and inodes in the group used.

Signed-off-by: Avantika Mathur <mathur@us.ibm.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
  • Loading branch information
Andreas Dilger authored and tytso committed Oct 17, 2007
1 parent 4074fe3 commit 717d50e
Show file tree
Hide file tree
Showing 7 changed files with 335 additions and 35 deletions.
1 change: 1 addition & 0 deletions fs/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,7 @@ config EXT4DEV_FS
tristate "Ext4dev/ext4 extended fs support development (EXPERIMENTAL)"
depends on EXPERIMENTAL
select JBD2
select CRC16
help
Ext4dev is a predecessor filesystem of the next generation
extended fs ext4, based on ext3 filesystem code. It will be
Expand Down
112 changes: 109 additions & 3 deletions fs/ext4/balloc.c
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
#include <linux/quotaops.h>
#include <linux/buffer_head.h>

#include "group.h"
/*
* balloc.c contains the blocks allocation and deallocation routines
*/
Expand All @@ -42,6 +43,94 @@ void ext4_get_group_no_and_offset(struct super_block *sb, ext4_fsblk_t blocknr,

}

/* Initializes an uninitialized block bitmap if given, and returns the
* number of blocks free in the group. */
unsigned ext4_init_block_bitmap(struct super_block *sb, struct buffer_head *bh,
int block_group, struct ext4_group_desc *gdp)
{
unsigned long start;
int bit, bit_max;
unsigned free_blocks, group_blocks;
struct ext4_sb_info *sbi = EXT4_SB(sb);

if (bh) {
J_ASSERT_BH(bh, buffer_locked(bh));

/* If checksum is bad mark all blocks used to prevent allocation
* essentially implementing a per-group read-only flag. */
if (!ext4_group_desc_csum_verify(sbi, block_group, gdp)) {
ext4_error(sb, __FUNCTION__,
"Checksum bad for group %u\n", block_group);
gdp->bg_free_blocks_count = 0;
gdp->bg_free_inodes_count = 0;
gdp->bg_itable_unused = 0;
memset(bh->b_data, 0xff, sb->s_blocksize);
return 0;
}
memset(bh->b_data, 0, sb->s_blocksize);
}

/* Check for superblock and gdt backups in this group */
bit_max = ext4_bg_has_super(sb, block_group);

if (!EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_META_BG) ||
block_group < le32_to_cpu(sbi->s_es->s_first_meta_bg) *
sbi->s_desc_per_block) {
if (bit_max) {
bit_max += ext4_bg_num_gdb(sb, block_group);
bit_max +=
le16_to_cpu(sbi->s_es->s_reserved_gdt_blocks);
}
} else { /* For META_BG_BLOCK_GROUPS */
int group_rel = (block_group -
le32_to_cpu(sbi->s_es->s_first_meta_bg)) %
EXT4_DESC_PER_BLOCK(sb);
if (group_rel == 0 || group_rel == 1 ||
(group_rel == EXT4_DESC_PER_BLOCK(sb) - 1))
bit_max += 1;
}

if (block_group == sbi->s_groups_count - 1) {
/*
* Even though mke2fs always initialize first and last group
* if some other tool enabled the EXT4_BG_BLOCK_UNINIT we need
* to make sure we calculate the right free blocks
*/
group_blocks = ext4_blocks_count(sbi->s_es) -
le32_to_cpu(sbi->s_es->s_first_data_block) -
(EXT4_BLOCKS_PER_GROUP(sb) * (sbi->s_groups_count -1));
} else {
group_blocks = EXT4_BLOCKS_PER_GROUP(sb);
}

free_blocks = group_blocks - bit_max;

if (bh) {
for (bit = 0; bit < bit_max; bit++)
ext4_set_bit(bit, bh->b_data);

start = block_group * EXT4_BLOCKS_PER_GROUP(sb) +
le32_to_cpu(sbi->s_es->s_first_data_block);

/* Set bits for block and inode bitmaps, and inode table */
ext4_set_bit(ext4_block_bitmap(sb, gdp) - start, bh->b_data);
ext4_set_bit(ext4_inode_bitmap(sb, gdp) - start, bh->b_data);
for (bit = le32_to_cpu(gdp->bg_inode_table) - start,
bit_max = bit + sbi->s_itb_per_group; bit < bit_max; bit++)
ext4_set_bit(bit, bh->b_data);

/*
* Also if the number of blocks within the group is
* less than the blocksize * 8 ( which is the size
* of bitmap ), set rest of the block bitmap to 1
*/
mark_bitmap_end(group_blocks, sb->s_blocksize * 8, bh->b_data);
}

return free_blocks - sbi->s_itb_per_group - 2;
}


/*
* The free blocks are managed by bitmaps. A file system contains several
* blocks groups. Each group contains 1 bitmap block for blocks, 1 bitmap
Expand Down Expand Up @@ -119,19 +208,32 @@ block_in_use(ext4_fsblk_t block, struct super_block *sb, unsigned char *map)
*
* Return buffer_head on success or NULL in case of failure.
*/
static struct buffer_head *
struct buffer_head *
read_block_bitmap(struct super_block *sb, unsigned int block_group)
{
int i;
struct ext4_group_desc * desc;
struct buffer_head * bh = NULL;
ext4_fsblk_t bitmap_blk;

desc = ext4_get_group_desc (sb, block_group, NULL);
desc = ext4_get_group_desc(sb, block_group, NULL);
if (!desc)
return NULL;
bitmap_blk = ext4_block_bitmap(sb, desc);
bh = sb_bread(sb, bitmap_blk);
if (desc->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)) {
bh = sb_getblk(sb, bitmap_blk);
if (!buffer_uptodate(bh)) {
lock_buffer(bh);
if (!buffer_uptodate(bh)) {
ext4_init_block_bitmap(sb, bh, block_group,
desc);
set_buffer_uptodate(bh);
}
unlock_buffer(bh);
}
} else {
bh = sb_bread(sb, bitmap_blk);
}
if (!bh)
ext4_error (sb, __FUNCTION__,
"Cannot read block bitmap - "
Expand Down Expand Up @@ -627,6 +729,7 @@ void ext4_free_blocks_sb(handle_t *handle, struct super_block *sb,
desc->bg_free_blocks_count =
cpu_to_le16(le16_to_cpu(desc->bg_free_blocks_count) +
group_freed);
desc->bg_checksum = ext4_group_desc_csum(sbi, block_group, desc);
spin_unlock(sb_bgl_lock(sbi, block_group));
percpu_counter_add(&sbi->s_freeblocks_counter, count);

Expand Down Expand Up @@ -1685,8 +1788,11 @@ ext4_fsblk_t ext4_new_blocks(handle_t *handle, struct inode *inode,
ret_block, goal_hits, goal_attempts);

spin_lock(sb_bgl_lock(sbi, group_no));
if (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))
gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT);
gdp->bg_free_blocks_count =
cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count)-num);
gdp->bg_checksum = ext4_group_desc_csum(sbi, group_no, gdp);
spin_unlock(sb_bgl_lock(sbi, group_no));
percpu_counter_sub(&sbi->s_freeblocks_counter, num);

Expand Down
27 changes: 27 additions & 0 deletions fs/ext4/group.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* linux/fs/ext4/group.h
*
* Copyright (C) 2007 Cluster File Systems, Inc
*
* Author: Andreas Dilger <adilger@clusterfs.com>
*/

#ifndef _LINUX_EXT4_GROUP_H
#define _LINUX_EXT4_GROUP_H

extern __le16 ext4_group_desc_csum(struct ext4_sb_info *sbi, __u32 group,
struct ext4_group_desc *gdp);
extern int ext4_group_desc_csum_verify(struct ext4_sb_info *sbi, __u32 group,
struct ext4_group_desc *gdp);
struct buffer_head *read_block_bitmap(struct super_block *sb,
unsigned int block_group);
extern unsigned ext4_init_block_bitmap(struct super_block *sb,
struct buffer_head *bh, int group,
struct ext4_group_desc *desc);
#define ext4_free_blocks_after_init(sb, group, desc) \
ext4_init_block_bitmap(sb, NULL, group, desc)
extern unsigned ext4_init_inode_bitmap(struct super_block *sb,
struct buffer_head *bh, int group,
struct ext4_group_desc *desc);
extern void mark_bitmap_end(int start_bit, int end_bit, char *bitmap);
#endif /* _LINUX_EXT4_GROUP_H */
Loading

0 comments on commit 717d50e

Please sign in to comment.