linux-ext4 - Re: [RFC PATCH v3 02/61] e2fsck: copy context when using multi-thread fsck

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Fri, 18 Dec 2020 09:13:25 +0800
From:   Wang Shilong <wangshilong1991@...il.com>
To:     "Darrick J. Wong" <darrick.wong@...cle.com>
Cc:     Saranya Muruganandam <saranyamohan@...gle.com>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        "Theodore Ts'o" <tytso@....edu>, adilger.kernel@...ger.ca,
        Li Xi <lixi@....com>, Wang Shilong <wshilong@....com>
Subject: Re: [RFC PATCH v3 02/61] e2fsck: copy context when using multi-thread fsck

On Fri, Dec 18, 2020 at 8:01 AM Darrick J. Wong <darrick.wong@...cle.com> wrote:
>
> On Wed, Nov 18, 2020 at 07:38:48AM -0800, Saranya Muruganandam wrote:
> > From: Li Xi <lixi@....com>
> >
> > This patch only copy the context to a new one when -m is enabled.
> > It doesn't actually start any thread. When pass1 test finishes,
> > the new context is copied back to the original context.
> >
> > Since the signal handler only changes the original context, so
> > add global_ctx in "struct e2fsck_struct" and use that to check
> > whether there is any signal of canceling.
> >
> > This patch handles the long jump properly so that all the existing
> > tests can be passed even the context has been copied. Otherwise,
> > test f_expisize_ea_del would fail when aborting.
> >
> > Signed-off-by: Li Xi <lixi@....com>
> > Signed-off-by: Wang Shilong <wshilong@....com>
> > Signed-off-by: Saranya Muruganandam <saranyamohan@...gle.com>
> > ---
> >  e2fsck/pass1.c | 114 +++++++++++++++++++++++++++++++++++++++++++++----
> >  e2fsck/unix.c  |   1 +
> >  2 files changed, 107 insertions(+), 8 deletions(-)
> >
> > diff --git a/e2fsck/pass1.c b/e2fsck/pass1.c
> > index 8eecd958..64d237d3 100644
> > --- a/e2fsck/pass1.c
> > +++ b/e2fsck/pass1.c
> > @@ -1144,7 +1144,22 @@ static int quota_inum_is_reserved(ext2_filsys fs, ext2_ino_t ino)
> >       return 0;
> >  }
> >
> > -void e2fsck_pass1(e2fsck_t ctx)
> > +static int e2fsck_should_abort(e2fsck_t ctx)
> > +{
> > +     e2fsck_t global_ctx;
> > +
> > +     if (ctx->flags & E2F_FLAG_SIGNAL_MASK)
> > +             return 1;
> > +
> > +     if (ctx->global_ctx) {
> > +             global_ctx = ctx->global_ctx;
> > +             if (global_ctx->flags & E2F_FLAG_SIGNAL_MASK)
> > +                     return 1;
> > +     }
> > +     return 0;
> > +}
> > +
> > +void e2fsck_pass1_thread(e2fsck_t ctx)
> >  {
> >       int     i;
> >       __u64   max_sizes;
> > @@ -1360,7 +1375,7 @@ void e2fsck_pass1(e2fsck_t ctx)
> >               if (ino > ino_threshold)
> >                       pass1_readahead(ctx, &ra_group, &ino_threshold);
> >               ehandler_operation(old_op);
> > -             if (ctx->flags & E2F_FLAG_SIGNAL_MASK)
> > +             if (e2fsck_should_abort(ctx))
> >                       goto endit;
> >               if (pctx.errcode == EXT2_ET_BAD_BLOCK_IN_INODE_TABLE) {
> >                       /*
> > @@ -1955,7 +1970,7 @@ void e2fsck_pass1(e2fsck_t ctx)
> >               if (process_inode_count >= ctx->process_inode_size) {
> >                       process_inodes(ctx, block_buf);
> >
> > -                     if (ctx->flags & E2F_FLAG_SIGNAL_MASK)
> > +                     if (e2fsck_should_abort(ctx))
> >                               goto endit;
> >               }
> >       }
> > @@ -2068,6 +2083,89 @@ endit:
> >       else
> >               ctx->invalid_bitmaps++;
> >  }
> > +
> > +static errcode_t e2fsck_pass1_thread_prepare(e2fsck_t global_ctx, e2fsck_t *thread_ctx)
> > +{
> > +     errcode_t       retval;
> > +     e2fsck_t        thread_context;
> > +
> > +     retval = ext2fs_get_mem(sizeof(struct e2fsck_struct), &thread_context);
>
> Hm, so I guess the strategy here is that parallel e2fsck makes
> per-thread copies of the ext2_filsys and e2fsck_t global contexts?
> And then after the threaded parts complete, each thread merges its
> per-thread contexts back into the global one, right?

Yes.

>
> This means that we have to be careful to track which fields in those
> cloned contexts have been updated by the thread so that we can copy them
> back and not lose any data.
>
> I'm wondering if for future maintainability it would be better to track
> the per-thread data in a separate structure to make it very explicit
> which data (sub)structures are effectively per-thread and hence don't
> require locking?

Maybe use a per-thread structure is better maintained, but i am not sure
we could remove locking completely.

Locking is mostly used for fix, because fixing is serialized now
and for some global structure which could be used seldomly
but could simplify codes.

>
> (I ask that mostly because I'm having a hard time figuring out which
> fields are supposed to be shared and which ones aren't...)
>
> --D
>
> > +     if (retval) {
> > +             com_err(global_ctx->program_name, retval, "while allocating memory");