lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 16 Sep 2014 13:05:25 +0200
From:	Jan Kara <jack@...e.cz>
To:	Yuanhan Liu <yuanhan.liu@...ux.intel.com>
Cc:	Jan Kara <jack@...e.cz>, linux-ext4@...r.kernel.org,
	Andi Kleen <ak@...ux.intel.com>
Subject: Re: [PATCH] jbd2: remove __jbd2_journal_clean_checkpoint_list to
 solve scalability issue

On Mon 15-09-14 11:38:52, Yuanhan Liu wrote:
> On Fri, Sep 12, 2014 at 11:40:54AM +0200, Jan Kara wrote:
> > On Thu 11-09-14 23:45:39, Yuanhan Liu wrote:
> > > On Thu, Sep 11, 2014 at 03:51:47PM +0200, Jan Kara wrote:
> > > > On Thu 11-09-14 18:09:53, Yuanhan Liu wrote:
> > > > > Firstly, if I read the code correctly(I have a feeling I missed something),
> > > > > __jbd2_journal_clean_checkpoint_list is not necessary. jbd2_log_do_checkpoint
> > > > > will remove them for the checkpointed transaction. In another word,
> > > > > __jbd2_journal_clean_checkpoint_list can't drop an transaction if it's not
> > > > > checkpointed.
> > > >   Yes, that's correct. __jbd2_journal_clean_checkpoint_list() is there only
> > > > to free up buffers that were already checkpointed.
> > > 
> > > Yes, and that might be something I missed: if a transaction is
> > > checkpointed, those buffers attached to the transaction should also be
> > > released, right? If that's ture, what __jbd2_journal_clean_checkpoint_list()
> > > is for?
> >   They aren't released unless memory reclaim wants to free the
> > corresponding pagecache page. So they need to be detached from the
> > transaction somehow once they are written so that the transaction can be
> > eventually freed which frees the space in the journal. Currently we do this
> > detaching either in __jbd2_journal_clean_checkpoint_list() or in
> > jbd2_log_do_checkpoint(). However the latter function gets called only when
> > we really run out of space in the journal and thus everything in the
> > filesystem waits for some space to be reclaimed. So relying on that function
> > isn't good for performance either...
> > 
> > One possibility is to remove buffer from transaction on IO completion.
> 
> Yes, that's the same thing I thought of. b_end_io hook is the first
> thing I thought of, and I tried it. Badly, it was never being invoked.
> 
> I then realised that it should be written by page cache(writeback or
> page reclaim as you mentioned), and that's the stuff I missed before.
> So, my patch was wrong and sorry for that.
  No problem :).

> > However that's likely going to bounce j_list_lock between CPUs badly. So
> 
> I checked __jbd2_journal_remove_checkpoint(jh) again, which is the
> entrance to free a journal buffer. It has two main parts:
> 
> 	__buffer_unlink(jh)
> 
> 	drop_transaction() if all jh are written
> 
> The two are all protected by j_list_lock. IMO, we can split it, say
> let __buffer_unlink(jh) be protected by a per-transaction spin lock,
> and let drop_transaction() be protected by by j_list_lock as it was.
> 
> As drop_transaction() doesn't happen frequently as __buffer_unlink(jh),
> it should alleviate the lock contention a bit.
  Yes, drop_transaction() isn't really an issue. But when doing
__buffer_unlink() during IO completion, making the checkpoint list lock
per-transaction one won't help the contention for the common case where
transactions are reasonably large.

I was thinking about it for a while. I was thinking about various clever
solutions but in the end what I think would be the best is just
to change __jbd2_journal_clean_checkpoint_list() to bail out once it spots
a transaction it cannot free. That will deal with the pointless scanning of
checkpointing lists you are hitting with your workload, it will also do all
the cleanup that is necessary for transactions to be eventually cleaned up
from the journal without having to wait in log_do_checkpoint().

Attached are two patches - lightly tested only. Can you try whether they
fix the contention for you? Thanks!

								Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR

View attachment "0001-jbd2-Avoid-pointless-scanning-of-checkpoint-lists.patch" of type "text/x-patch" (3627 bytes)

View attachment "0002-jbd2-Simplify-calling-convention-around-__jbd2_journ.patch" of type "text/x-patch" (4380 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ