lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 26 Mar 2020 15:07:23 -0600
From:   Andreas Dilger <adilger@...ger.ca>
To:     harshad shirwadkar <harshadshirwadkar@...il.com>
Cc:     linux-ext4 <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH 2/2] ext4: shrink directories on dentry delete

On Mar 26, 2020, at 1:49 PM, harshad shirwadkar <harshadshirwadkar@...il.com> wrote:
> 
> On Wed, Mar 25, 2020 at 3:06 AM Andreas Dilger <adilger@...ger.ca> wrote:
>> 
>> On Mar 25, 2020, at 3:37 AM, Harshad Shirwadkar <harshadshirwadkar@...il.com> wrote:
>>> But note that most of the shrinking happens during last 1-2% deletions
>>> in an average case. Therefore, the next step here is to merge dx nodes
>>> when possible. That can be achieved by storing the fullness index in
>>> htree nodes. But that's an on-disk format change. We can instead build
>>> on tooling added by this patch to perform reverse lookup on a dx
>>> node and then reading adjacent nodes to check their fullness.
>> 
>> Thank you for updating these patches again.  I haven't had a chance to look
>> at them yet, but I hope to review the patches in the near future.
>> 
>> As for storing the fullness on disk changing the on-disk format...  That is
>> true, but the original htree implementation anticipated this and reserved
>> space in the htree index to store the fullness, so it would not break the
>> ability of older kernels to access directories with the fullness information.
>> 
> Yeah, you are right, good to know that we have bits reserved already
> and that wouldn't break older kernels if we use these in future.
>> I think if you used just a few bits (maybe just 2) to store:
>> 0 = unset (every directory today)
>> 1 = under 20% full
>> 2 = under 40% full
>> 3 = under 60% full
>> 
>> or similar.  It doesn't matter if they are more full since they won't be
>> candidates for merging, and then lazily update the htree index fullness
>> as entries are removed, this will simplify the shrinking process, and will
>> avoid the need to repeatedly scan the leaf blocks to see if they are empty
>> enough for merging.  It wouldn't be any worse *not* to store these values
>> on disk after the first time a "0 = unset" entry was found and not merged,
>> or setting the fullness on the merged block if it is merged, and running
>> "e2fsck -D" can easily update the fullness values.
>> 
>> The benefit of using 20%, 40%, and 60% as the fullness markers is that it
>> is possible to either merge adjacent 60% and 40% blocks or alternately a
>> 60% and two adjacent 20% blocks.  Also, since these values are very coarse
>> they would not need to be updated frequently.  If the values are slightly
>> outdated, then it is again not worse than the "always scan" model (one scan
>> and the fullness would be updated), but more efficient than repeat scanning.
>> 
>> Using only two bits for fullness also leaves two bits free for future use.
> 
> Thanks Andreas, that makes sense. This kind of merging will require
> lot of tooling provided in this patch - for example swapping out freed
> block with last block to not leave any holes. So, my hope is that we
> get this patch in first and thereby get a step closer to coalescing
> solution.

Definitely I *do not* want to block the landing of these initial patches
until a "full featured" directory shrinking is complete.  These patches
at least provide some basic functionality, and will at least shrink a
large directory if it becomes totally empty so I'm in favour of that.

Cheers, Andreas






Download attachment "signature.asc" of type "application/pgp-signature" (874 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ