lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 1 Feb 2024 11:31:04 +0800
From: Baokun Li <libaokun1@...wei.com>
To: Jan Kara <jack@...e.cz>
CC: <linux-ext4@...r.kernel.org>, <tytso@....edu>, <adilger.kernel@...ger.ca>,
	<ritesh.list@...il.com>, <linux-kernel@...r.kernel.org>,
	<yi.zhang@...wei.com>, <yangerkun@...wei.com>, <yukuai3@...wei.com>,
	<stable@...nel.org>, Ojaswin Mujoo <ojaswin@...ux.ibm.com>, Baokun Li
	<libaokun1@...wei.com>
Subject: Re: [PATCH] ext4: correct best extent lstart adjustment logic

On 2024/1/31 20:46, Jan Kara wrote:
> [Added Ojaswin to CC as an author of the discussed patch]
>
> On Mon 22-01-24 20:33:32, Baokun Li wrote:
>> When yangerkun review commit 93cdf49f6eca ("ext4: Fix best extent lstart
>> adjustment logic in ext4_mb_new_inode_pa()"), it was found that the best
>> extent did not completely cover the original request after adjusting the
>> best extent lstart in ext4_mb_new_inode_pa() as follows:
>>
>>    original request: 2/10(8)
>>    normalized request: 0/64(64)
>>    best extent: 0/9(9)
>>
>> When we check if best ex can be kept at start of goal, ac_o_ex.fe_logical
>> is 2 less than the adjusted best extent logical end 9, so we think the
>> adjustment is done. But obviously 0/9(9) doesn't cover 2/10(8), so we
>> should determine here if the original request logical end is less than or
>> equal to the adjusted best extent logical end.

Hello Jan,

Thanks for the detailed explanation! 😉

> I'm sorry for a bit delayed reply. Why do you think it is a problem if the
> resulting extent doesn't cover the full original range?

We adjust lstart when ac_o_ex.fe_len < ac_b_ex.fe_len and
ac_b_ex.fe_len < ac->ac_orig_goal_len, in which case the length of
the allocation is greater than the length of the original request,
and we would normally assume that this allocation would satisfy
the request for the block allocation without the need for an
additional allocation.

      /* we can't allocate as much as normalizer wants.
       * so, found space must get proper lstart
       * to cover original request */

And the comment in the code states that we need to "cover original
request", but this logic is not fulfilled in the code below, so yangerkun
is very puzzled and presents the above counterexample, so we think
it's a problem.

> We must always
> cover the first block of the original extent so that the allocation makes
> forward progress. But otherwise we choose to align to the start / end of
> the goal range to reduce fragmentation even if we don't cover the whole
> requested range - the rest of the range will be covered by the next
> allocation.
Totally agree, for the example above, if we end up with a total of 64
blocks, then the final extent distribution might look like this:

Before:  [0/9(9)], [9/64(55)]
Patched: [0/2(2)], [2/11(9)], [11/64(53)]

So the question is really whether we expect fewer allocations currently
or fewer fragments later.
> Also there is a problem with trying to cover the whole original
> range described in [1]. Essentially the goal range does not need to cover
> the whole original range and if we try to align the allocated range to
> cover the whole original range, it may result in exceeding the goal range
> and thus overlapping preallocations and triggering asserts in the prealloc
> code.
>
> So if we decided we want to handle the case you describe in a better way,
> we'd need something making sure we don't exceed the goal range.
>
> 								Honza
>
> [1] https://lore.kernel.org/all/Y+UzQJRIJEiAr4Z4@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com/
goal_start          B    original_start   A              goal_end
   |-----------------|----------*----------|-----------------|
      best_ex_len                              best_ex_len

The current logic guarantees that the goal range will not be exceeded.
If original_start + best_ex_len > goal_end, then in case1 the ex_end
will be adjusted to align with the goal_end, and if the
goal_end < original_end, then another block allocation will be triggered,
which is fine. But in other cases, we can guarantee that the original
request will be covered by the adjusted best ex.

The problem is that in case2, when we aligned ex_fe_start with
goal_start, we exited the alignment as soon as we contained the
original_start, which may not have contained the original_end and
triggered an additional block allocation, but if we jumped to case3
we could cover the entire original request.

In general, this patch will not cause the goal range to be exceeded.
>> Moreover, the best extent len is not modified during the adjustment
>> process, and it is already checked by the previous assertion, so replace
>> the check for fe_len with a check for the best extent logical end.
>>
>> Cc: stable@...nel.org
>> Fixes: 93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
>> Signed-off-by: yangerkun <yangerkun@...wei.com>
>> Signed-off-by: Baokun Li <libaokun1@...wei.com>
>> ---
>>   fs/ext4/mballoc.c | 7 ++++---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index f44f668e407f..fa5977fe8d72 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -5146,6 +5146,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
>>   			.fe_len = ac->ac_orig_goal_len,
>>   		};
>>   		loff_t orig_goal_end = extent_logical_end(sbi, &ex);
>> +		loff_t o_ex_end = extent_logical_end(sbi, &ac->ac_o_ex);
>>   
>>   		/* we can't allocate as much as normalizer wants.
>>   		 * so, found space must get proper lstart
>> @@ -5161,7 +5162,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
>>   		 * 1. Check if best ex can be kept at end of goal (before
>>   		 *    cr_best_avail trimmed it) and still cover original start
>>   		 * 2. Else, check if best ex can be kept at start of goal and
>> -		 *    still cover original start
>> +		 *    still cover original end
>>   		 * 3. Else, keep the best ex at start of original request.
>>   		 */
>>   		ex.fe_len = ac->ac_b_ex.fe_len;
>> @@ -5171,7 +5172,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
>>   			goto adjust_bex;
>>   
>>   		ex.fe_logical = ac->ac_g_ex.fe_logical;
>> -		if (ac->ac_o_ex.fe_logical < extent_logical_end(sbi, &ex))
>> +		if (o_ex_end <= extent_logical_end(sbi, &ex))
>>   			goto adjust_bex;
>>   
>>   		ex.fe_logical = ac->ac_o_ex.fe_logical;
>> @@ -5179,7 +5180,7 @@ ext4_mb_new_inode_pa(struct ext4_allocation_context *ac)
>>   		ac->ac_b_ex.fe_logical = ex.fe_logical;
>>   
>>   		BUG_ON(ac->ac_o_ex.fe_logical < ac->ac_b_ex.fe_logical);
>> -		BUG_ON(ac->ac_o_ex.fe_len > ac->ac_b_ex.fe_len);
>> +		BUG_ON(o_ex_end > extent_logical_end(sbi, &ex));
>>   		BUG_ON(extent_logical_end(sbi, &ex) > orig_goal_end);
>>   	}
>>   
>> -- 
>> 2.31.1
>>
Cheers!
-- 
With Best Regards,
Baokun Li
.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ