linux-ext4 - Re: [PATCH] ext4: defer updating i

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Mon, 27 Mar 2023 19:17:48 +0800
From:   Zhang Yi <yi.zhang@...wei.com>
To:     Chung-Chiang Cheng <shepjeng@...il.com>, Jan Kara <jack@...e.cz>
CC:     Chung-Chiang Cheng <cccheng@...ology.com>,
        <linux-ext4@...r.kernel.org>, <tytso@....edu>,
        <adilger.kernel@...ger.ca>, <kernel@...heng.net>,
        Robbie Ko <robbieko@...ology.com>
Subject: Re: [PATCH] ext4: defer updating i_disksize until endio

On 2023/3/27 18:28, Chung-Chiang Cheng wrote:
> On Mon, Mar 27, 2023 at 5:29 PM Jan Kara <jack@...e.cz> wrote:
>>
>> As Zhang Yi already noted in his review, this is expected at least with
>> data=writeback mount option. With data=ordered this should not happen
>> though as the commit of the transaction with i_disksize update will wait
>> for page writeback to complete (this is exactly the reason why data=ordered
>> exists after all). Are you able to observe this problem with data=ordered
>> mount option?
>>
>>                                                                 Honza
> 
> It's a pity that this issue also occurs with data=ordered due to delayed
> allocation being enabled by default. If delayed allocation were disabled,
> it would not be as easy to reproduce.
> 
> This is because if data is written to the end of a file and the block is
> allocated, the new i_disksize will be immediately committed to the journal
> at ext4_da_write_end(), but the writeback procedure is not yet triggered.
> By default, ext4 commits the journal every 5 seconds, but a dirty page may
> not be written back until 30 seconds later. This is not a short time window,
> and any improper shutdown during this time may lead to the issue :(
> 

It seems that the case you've mentioned is intra-block append write (no?),
current data=ordered mount option doesn't work in this case because
ext4_map_blocks() doesn't attach inode to the t_inode_list of the running
transaction. If delayed allocation were disabled, the lose data window is still
there, because ext4_write_end()->ext4_update_inode_size() is also updating
i_disksize before writing data back. This is at least guarantee no store data.
We had discussed this in [1].

[1]. https://lore.kernel.org/linux-ext4/1554370192-113254-1-git-send-email-yi.zhang@huawei.com/

Thanks,
Yi.