linux-ext4 - Re: Plans to evaluate the reliability and integrity of ext4 against power failures.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Thu, 02 Jul 2009 07:21:28 -0400
From:	Ric Wheeler <rwheeler@...hat.com>
To:	Jamie Lokier <jamie@...reable.org>
CC:	Michael Rubin <mrubin@...gle.com>,
	Chris Worley <worleys@...il.com>,
	Shaozhi Ye <yeshao@...gle.com>, linux-fsdevel@...r.kernel.org,
	linux-ext4@...r.kernel.org
Subject: Re: Plans to evaluate the reliability and integrity of ext4 against
 power failures.

On 07/01/2009 10:12 PM, Jamie Lokier wrote:
> Ric Wheeler wrote:
>> One way to test this with reasonable, commodity hardware would be
>> something like the following:
>>
>> (1) Get an automated power kill setup to control your server
>
> etc.  Good plan.
>
> Another way to test the entire software stack, but not the physical
> disks, is to run the entire test using VMs, and simulate hard disk
> write caching and simulated power failure in the VM.  KVM would be a
> great candidate for that, as it runs VMs as ordinary processes and the
> disk I/O emulation is quite easy to modify.

Certainly, that could be useful to test some level of the stack. Historically, 
the biggest issues that I have run across have been focused on the volatile 
write cache on the storage targets.  Not only can it lose data that has been 
acked all the back to the host, it can also potentially reorder that data in 
challenging ways that will make file system recovery difficult....

>
> As most issues probably are software issues (kernel, filesystems, apps
> not calling fsync, or assuming barrierless O_DIRECT/O_DSYNC are
> sufficient, network fileserver protocols, etc.), it's surely worth a look.
>
> It could be much faster than the physical version too, in other words
> more complete testing of the software stack given available resources.
>
> With the ability to "fork" a running VM's state by snapshotting it and
> continuing, it would even be possible to simulate power failure cache
> loss scenarios at many points in the middle of a stress test, with the
> stress test continuing to run - no full reboot needed at every point.
> That way, maybe deliberate trace points could be placed in the
> software stack at places where power failure cache loss seems likely
> to cause a problem.
>
> -- Jamie

I do agree that this testing would also be very useful, especially so since you 
can do this almost in any environment.

Regards,

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html