linux-ext4 - Re: Kernel Benchmarking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Sat, 12 Sep 2020 15:37:04 +0100
From:   Matthew Wilcox <willy@...radead.org>
To:     Michael Larabel <Michael@...haellarabel.com>
Cc:     Amir Goldstein <amir73il@...il.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Ted Ts'o <tytso@...gle.com>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        Jan Kara <jack@...e.cz>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: Kernel Benchmarking

On Sat, Sep 12, 2020 at 05:32:11AM -0500, Michael Larabel wrote:
> On 9/12/20 2:28 AM, Amir Goldstein wrote:
> > On Sat, Sep 12, 2020 at 1:40 AM Michael Larabel
> > <Michael@...haellarabel.com> wrote:
> > > On 9/11/20 5:07 PM, Linus Torvalds wrote:
> > > > On Fri, Sep 11, 2020 at 9:19 AM Linus Torvalds
> > > > <torvalds@...ux-foundation.org> wrote:
> > > > > Ok, it's probably simply that fairness is really bad for performance
> > > > > here in general, and that special case is just that - a special case,
> > > > > not the main issue.
> > > > Ahh. It turns out that I should have looked more at the fault path
> > > > after all. It was higher up in the profile, but I ignored it because I
> > > > found that lock-unlock-lock pattern lower down.
> > > > 
> > > > The main contention point is actually filemap_fault(). Your apache
> > > > test accesses the 'test.html' file that is mmap'ed into memory, and
> > > > all the threads hammer on that one single file concurrently and that
> > > > seems to be the main page lock contention.
> > > > 
> > > > Which is really sad - the page lock there isn't really all that
> > > > interesting, and the normal "read()" path doesn't even take it. But
> > > > faulting the page in does so because the page will have a long-term
> > > > existence in the page tables, and so there's a worry about racing with
> > > > truncate.
> > > > 
> > > > Interesting, but also very annoying.
> > > > 
> > > > Anyway, I don't have a solution for it, but thought I'd let you know
> > > > that I'm still looking at this.
> > > > 
> > > >                   Linus
> > > I've been running your EXT4 patch on more systems and with some
> > > additional workloads today. While not the original problem, the patch
> > > does seem to help a fair amount for the MariaDB database sever. This
> > > wasn't one of the workloads regressing on 5.9 but at least with the
> > > systems tried so far the patch does make a meaningful improvement to the
> > > performance. I haven't run into any apparent issues with that patch so
> > > continuing to try it out on more systems and other database/server
> > > workloads.
> > > 
> > Michael,
> > 
> > Can you please add a reference to the original problem report and
> > to the offending commit? This conversation appeared on the list without
> > this information.
> > 
> > Are filesystems other than ext4 also affected by this performance
> > regression?
> > 
> > Thanks,
> > Amir.
> 
> On Linux 5.9 Git, Apache HTTPD, Redis, Nginx, and Hackbench appear to be the
> main workloads that are running measurably slower than on Linux 5.8 and
> prior on multiple systems.
> 
> The issue was bisected to 2a9127fcf2296674d58024f83981f40b128fffea. The
> Kernel Test Robot also previously was triggered by the commit in question
> with mixed Hackbench results. In looking at the problem Linus had a hunch
> when looking at the perf data that it may have had an adverse reaction with
> the EXT4 locking behavior to which he sent out that patch. That EXT4 patch
> didn't end up addressing the performance issue with the original workloads
> in question (though in testing other workloads it seems to have benefit for
> MariaDB at least depending upon the system there can be slightly better
> performance).

Based on this limited amount of information, I would suspect there would
also be a problem with XFS, and that would be even _more_ sad because
XFS already excludes a truncate-vs-mmap race with the MMAPLOCK_SHARED in
__xfs_filemap_fault vs MMAPLOCK_EXCL ... somewhere in the truncate path,
I'm sure.  It's definitely there for the holepunch.

So maybe XFS should have its own implementation of filemap_fault,
or we should have a filemap_fault_locked() for filesystems which have
their own locking that excludes truncate.