lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 29 Mar 2021 00:55:03 +0530
From:   Ritesh Harjani <ritesh.list@...il.com>
To:     Harshad Shirwadkar <harshadshirwadkar@...il.com>
Cc:     linux-ext4@...r.kernel.org
Subject: Re: Block allocator improvements

On 21/03/24 04:19PM, Harshad Shirwadkar wrote:
> This patch series improves cr 0 and cr 1 passes of the allocator
> signficantly. Currently, at cr 0 and 1, we perform linear lookups to
> find the matching groups. That's very inefficient for large file
> systems where there are millions of block groups. At cr 0, we only
> care about the groups that have the largest free order >= the
> request's order and at cr 1 we only care about groups where average
> fragment size > the request size. so, this patchset introduces new
> data structures that allow us to perform cr 0 lookup in constant time
> and cr 1 lookup in log (number of groups) time instead of linear.
>
> For cr 0, we add a list for each order and all the groups are enqueued
> to the appropriate list based on the largest free order in its buddy
> bitmap. This allows us to lookup a match at cr 0 in constant time.
>
> For cr 1, we add a new rb tree of groups sorted by largest fragment
> size. This allows us to lookup a match for cr 1 in log (num groups)
> time.
>
> These optimizations can be enabled by passing "mb_optimize_scan" mount
> option.
>
> These changes may result in allocations to be spread across the block
> device. While that would not matter some block devices (such as flash)
> it may be a cause of concern for other block devices that benefit from
> storing related content togetther such as disk. However, it can be
> argued that in high fragmentation scenrio, especially for large disks,
> it's still worth optimizing the scanning since in such cases, we get
> cpu bound on group scanning instead of getting IO bound. Perhaps, in
> future, we could dynamically turn this new optimization on based on
> fragmentation levels for such devices.
>
> Verified that there are no regressions in smoke tests (-g quick -c 4k).
>
> Also, to demonstrate the effectiveness for the patch series, following
> experiment was performed:
>
> Created a highly fragmented disk of size 65TB. The disk had no
> contiguous 2M regions. Following command was run consecutively for 3
> times:
>
> time dd if=/dev/urandom of=file bs=2M count=10
>
> Here are the results with and without cr 0/1 optimizations:
>
> |---------+------------------------------+---------------------------|
> |         | Without CR 0/1 Optimizations | With CR 0/1 Optimizations |
> |---------+------------------------------+---------------------------|
> | 1st run | 5m1.871s                     | 2m47.642s                 |
> | 2nd run | 2m28.390s                    | 0m0.611s                  |
> | 3rd run | 2m26.530s                    | 0m1.255s                  |
> |---------+------------------------------+---------------------------|
>
> The patch [3/6] "ext4: add mballoc stats proc file" is a modified
> version of the patch originally written by Artem Blagodarenko
> (artem.blagodarenko@...il.com). With that patch, I ran following
> command with and without optimizations.
>
> dd if=/dev/zero of=/mnt/file bs=2M count=2 conv=fsync
>
> Without optimizations:
>
> useless_c0_loops: 3
> useless_c1_loops: 39
> useless_c2_loops: 0
> useless_c3_loops: 0
>
> With optimizations:
>
> useless_c0_loops: 0
> useless_c1_loops: 0
> useless_c2_loops: 0
> useless_c3_loops: 0
>
> This shows that CR0 and CR1 optimizations get rid of useless CR0 and
> CR1 loops altogether thereby significantly reducing the number of
> groups that get considered.
>
> Changes from V4:
> ----------------
> - Only minor fixes, no significant changes
>
> Harshad Shirwadkar (6):
>   ext4: drop s_mb_bal_lock and convert protected fields to atomic
>   ext4: add ability to return parsed options from parse_options
>   ext4: add mballoc stats proc file
>   ext4: add MB_NUM_ORDERS macro
>   ext4: improve cr 0 / cr 1 group scanning
>   ext4: add proc files to monitor new structures
>
>  fs/ext4/ext4.h    |  30 ++-
>  fs/ext4/mballoc.c | 572 +++++++++++++++++++++++++++++++++++++++++++---
>  fs/ext4/mballoc.h |  22 +-
>  fs/ext4/super.c   |  79 +++++--
>  fs/ext4/sysfs.c   |   6 +
>  5 files changed, 652 insertions(+), 57 deletions(-)
>

Completed my review of this patch series.
Apart from the issue I mentioned in patch-5 of this v5 series. The rest
of the patches looks fine to me.

Please feel free to add:
Reviewed-by: Ritesh Harjani <ritesh.list@...il.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ