linux-ext4 - [Bug 217965] ext4(?) regression since 6.5.0 on sata hdd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date: Fri, 17 Nov 2023 15:39:32 +0000
From: bugzilla-daemon@...nel.org
To: linux-ext4@...r.kernel.org
Subject: [Bug 217965] ext4(?) regression since 6.5.0 on sata hdd

https://bugzilla.kernel.org/show_bug.cgi?id=217965

--- Comment #36 from Ojaswin Mujoo (ojaswin.mujoo@....com) ---
Hey Eyal,

So the trace data has given me an idea of what's going on. Basically in ext4 we
maintain a list of FS blocks groups (BGs) where each list will have BGs based
on the order of free blocks (BG with 64 free blocks goes in list order 6. 640
free blocks goes in order 9 list etc). In our case, we are trying to allocate
stripe size blocks at a time ie 640 blocks or roughly 2.5 KB and ext4 tries to
look at the order 9 list to find a BG that is fit to satisfy our request. 

Unfortunately there seems to be a lot of BGs in the order 9 list (> 1000) but
most of them dont have enough free blocks to satisfy the request so we keep
looping 
and trying to call ext4_mb_good_group() on each of them to see if anyone is
good enough. Once we do find a good enough BG, due to striping we actually try
to look for blocks which are specially aligned to stripe size and once we don't
find it we just start looping in the list again from the beginning (!!).

Although I have a good idea now, I'm not able to point my finger at the exact
change in 6.5 that might have caused this. We did change the allocator to some
extent and it might be related to this but we need to dig a bit more deeper to
confirm.

Would it be possible to share the same perf record again but this time I'm
adding a few more probes and removing -g so we can fit more in 5MBlimit and
also the commands for Linux 6.4 so we can compare whats changed:

Linux 6.5+:

Probe adding commands:

sudo perf probe -a "ext4_mb_find_good_group_avg_frag_lists order"
sudo perf probe -a "ext4_mb_find_good_group_avg_frag_lists:18 cr
iter->bb_group"
sudo perf probe -a "ext4_mb_good_group:20 free fragments ac->ac_g_ex.fe_len
ac->ac_2order"
sudo perf probe -a "ext4_mb_scan_aligned:26 i max"

Record command:

perf record -e probe:ext4_mb_find_good_group_avg_frag_lists_L18 -e
probe:ext4_mb_good_group_L20 -e probe:ext4_mb_find_good_group_avg_frag_lists -e
probe:ext4_mb_    scan_aligned_L26 -e ext4:ext4_mballoc_alloc -p <pid> sleep 20

Linux 6.4.x:

Probe adding commands:

sudo perf probe -a "ext4_mb_choose_next_group_cr1:25 i iter->bb_group"
sudo perf probe -a "ext4_mb_good_group:20 free fragments ac->ac_g_ex.fe_len
ac->ac_2order"
sudo perf probe -a "ext4_mb_scan_aligned:26 i max"

Record command:

sudo perf record -e probe:ext4_mb_choose_next_group_cr1_L25 -e
probe:ext4_mb_good_group_L20 -e probe:ext4_mb_scan_aligned_L26 -e
ext4:ext4_mballoc_alloc -p <pid> sleep 20

Thanks again for all your help on this!

Regards,
ojaswin

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.