lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 6 Mar 2017 18:59:48 +0200
From:   Avi Kivity <avi@...lladb.com>
To:     Jens Axboe <axboe@...nel.dk>, Jan Kara <jack@...e.cz>
Cc:     Goldwyn Rodrigues <rgoldwyn@...e.de>, jack@...e.com,
        hch@...radead.org, linux-fsdevel@...r.kernel.org,
        linux-block@...r.kernel.org, linux-btrfs@...r.kernel.org,
        linux-ext4@...r.kernel.org, linux-xfs@...r.kernel.org
Subject: Re: [PATCH 0/8 v2] Non-blocking AIO



On 03/06/2017 06:08 PM, Jens Axboe wrote:
> On 03/06/2017 08:59 AM, Avi Kivity wrote:
>> On 03/06/2017 05:38 PM, Jens Axboe wrote:
>>> On 03/06/2017 08:29 AM, Avi Kivity wrote:
>>>> On 03/06/2017 05:19 PM, Jens Axboe wrote:
>>>>> On 03/06/2017 01:25 AM, Jan Kara wrote:
>>>>>> On Sun 05-03-17 16:56:21, Avi Kivity wrote:
>>>>>>>> The goal of the patch series is to return -EAGAIN/-EWOULDBLOCK if
>>>>>>>> any of these conditions are met. This way userspace can push most
>>>>>>>> of the write()s to the kernel to the best of its ability to complete
>>>>>>>> and if it returns -EAGAIN, can defer it to another thread.
>>>>>>>>
>>>>>>> Is it not possible to push the iocb to a workqueue?  This will allow
>>>>>>> existing userspace to work with the new functionality, unchanged. Any
>>>>>>> userspace implementation would have to do the same thing, so it's not like
>>>>>>> we're saving anything by pushing it there.
>>>>>> That is not easy because until IO is fully submitted, you need some parts
>>>>>> of the context of the process which submits the IO (e.g. memory mappings,
>>>>>> but possibly also other credentials). So you would need to somehow transfer
>>>>>> this information to the workqueue.
>>>>> Outside of technical challenges, the API also needs to return EAGAIN or
>>>>> start blocking at some point. We can't expose a direct connection to
>>>>> queue work like that, and let any user potentially create millions of
>>>>> pending work items (and IOs).
>>>> You wouldn't expect more concurrent events than the maxevents parameter
>>>> that was supplied to io_setup syscall; it should have reserved any
>>>> resources needed.
>>> Doesn't matter what limit you apply, my point still stands - at some
>>> point you have to return EAGAIN, or block. Returning EAGAIN without
>>> the caller having flagged support for that change of behavior would
>>> be problematic.
>> Doesn't it already return EAGAIN (or some other error) if you exceed
>> maxevents?
> It's a setup thing. We check these limits when someone creates an IO
> context, and carve out the specified entries form our global pool. Then
> we free those "resources" when the io context is freed.
>
> Right now I can setup an IO context with 1000 entries on it, yet that
> number has NO bearing on when io_submit() would potentially block or
> return EAGAIN.
>
> We can have a huge gap on the intent signaled by io context setup, and
> the reality imposed by what actually happens on the IO submission side.

Isn't that a bug?  Shouldn't that 1001st incomplete io_submit() return 
EAGAIN?

Just tested it, and maxevents is not respected for this:

io_setup(1, [0x7fc64537f000])           = 0
io_submit(0x7fc64537f000, 10, [{pread, fildes=3, buf=0x1eb4000, 
nbytes=4096, offset=0}, {pread, fildes=3, buf=0x1eb4000, nbytes=4096, 
offset=0}, {pread, fildes=3, buf=0x1eb4000, nbytes=4096, offset=0}, 
{pread, fildes=3, buf=0x1eb4000, nbytes=4096, offset=0}, {pread, 
fildes=3, buf=0x1eb4000, nbytes=4096, offset=0}, {pread, fildes=3, 
buf=0x1eb4000, nbytes=4096, offset=0}, {pread, fildes=3, buf=0x1eb4000, 
nbytes=4096, offset=0}, {pread, fildes=3, buf=0x1eb4000, nbytes=4096, 
offset=0}, {pread, fildes=3, buf=0x1eb4000, nbytes=4096, offset=0}, 
{pread, fildes=3, buf=0x1eb4000, nbytes=4096, offset=0}]) = 10

which is unexpected, to me.



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ