linux-ext4 - Re: Query FSCK Errors on ext4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 19 Nov 2013 13:27:49 -0700
From:	Andreas Dilger <adilger@...ger.ca>
To:	Stephen Elliott <techweb@...world.com>
Cc:	Zheng Liu <gnehzuil.liu@...il.com>,
	David Jeffery <djeffery@...hat.com>,
	"<linux-ext4@...r.kernel.org>" <linux-ext4@...r.kernel.org>,
	Bernd Schubert <bernd.schubert@...m.fraunhofer.de>,
	Eric Whitney <enwlinux@...il.com>
Subject: Re: Query FSCK Errors on ext4

It definitely shouldn't be possible for any application to corrupt the filesystem, so regardless of what is being run this is a kernel bug. 

Cheers, Andreas

On 2013-11-19, at 10:35, "Stephen Elliott" <techweb@...world.com> wrote:

> Hi Andreas,
> 
> I have read the replies given, I am just questioning some of the analysis
> and have follow up questions.
> 
> You will notice that I previously mentioned in this mail thread that I had
> this issue prior to running e2fsck 1.42.8 on e2fsck 1.42.3 too so not
> entirely convinced that the aforementioned patch is applicable.
> 
> My main question is around why this issue seems to occur when the MS access
> DB being  open (over Samba) on client workstations when the server is
> reloaded. I would possibly expect DB corruption due to this but not FS
> corruption.
> 
> Many Thanks
> Stephen Elliott
> 
> -----Original Message-----
> From: Andreas Dilger [mailto:adilger@...ger.ca] 
> Sent: 19 November 2013 16:47
> To: Stephen Elliott
> Cc: Zheng Liu; David Jeffery; <linux-ext4@...r.kernel.org>; Bernd Schubert;
> Eric Whitney
> Subject: Re: Query FSCK Errors on ext4
> 
> As previously written in earlier comments, the bug is likely in the ext4
> code of your appliance, and could possibly be fixed by the patch that was
> pointed our at that time.
> 
> If you ask for help, you actually need to read the replies that are given. 
> 
> Cheers, Andreas
> 
> On 2013-11-19, at 5:44, "Stephen Elliott" <techweb@...world.com> wrote:
> 
>> Hi Guys,
>> 
>> Did you have any further feedback on this? It is purely curiosity for me:
>> 
>> I have theorised that the problem comes from the MS access DB being 
>> open (over Samba) on client workstations when the server is reloaded.
>> 
>> Since ensuring these are closed prior to reloading, I have not seen 
>> further FSCK errors on reload. Is there an explanation for this? I can 
>> see why this may corrupt DB but not the filesystem.
>> 
>> Many Thanks
>> Stephen Elliott
>> 
>> -----Original Message-----
>> From: Stephen Elliott [mailto:techweb@...world.com]
>> Sent: 28 October 2013 21:18
>> To: 'Andreas Dilger'
>> Cc: 'Zheng Liu'; 'David Jeffery'; 'linux-ext4@...r.kernel.org List'; 
>> 'Bernd Schubert'; 'Eric Whitney'
>> Subject: RE: Query FSCK Errors on ext4
>> 
>> Ultimately I am not too worried about this problem (now I know the 
>> cause) but I am intrigued to know what actually caused the issue in 
>> the first place. As you can see there is some history around the problem.
>> 
>> Also was that defect / bug actually confirmed?
>> 
>> -----Original Message-----
>> From: Andreas Dilger [mailto:adilger@...ger.ca]
>> Sent: 28 October 2013 20:54
>> To: Stephen Elliott
>> Cc: Zheng Liu; David Jeffery; linux-ext4@...r.kernel.org List; Bernd 
>> Schubert; Eric Whitney
>> Subject: Re: Query FSCK Errors on ext4
>> 
>> On Oct 28, 2013, at 3:00 AM, Stephen Elliott <techweb@...world.com> wrote:
>>> Thanks for the reply guys...
>>> 
>>> The device in question is a ReadyNAS Pro 6, which happens to be 
>>> running
>> Linux :) I actually saw some issues with e2fsck 1.42.3 earlier this year:
>> 
>> So it looks like your next course of action is to contact ReadyNAS to 
>> see if they have the patch that Zheng mentioned below in their kernel.
>> 
>> Cheers, Andreas
>> 
>>> ***** File system check forced at Fri Apr 26 20:08:38 WEST 2013 ***** 
>>> fsck 1.41.14 (22-Dec-2010) e2fsck 1.42.3 (14-May-2012) Pass 1:
>>> Checking inodes, blocks, and sizes Inode 4195619, i_blocks is 
>>> 3135728, should be 3135904. Fix? yes
>>> 
>>> Running additional passes to resolve blocks claimed by more than one
>> inode...
>>> Pass 1B: Rescanning for multiply-claimed blocks Multiply-claimed
>>> block(s) in inode 4195619: 167904376 167904377 167904378 167904379
>>> 167904380 167904381 167904382 167904383 167904384 167904385 167904386
>>> 167949296 167949297 167949298 167949299 167949300 167949301 167949302
>>> 167949303 167949304 167949305 167949306 Pass 1C: Scanning directories 
>>> for inodes with multiply-claimed blocks Pass 1D: Reconciling 
>>> multiply-claimed blocks (There are 1 inodes containing 
>>> multiply-claimed blocks.)
>>> 
>>> File /PREMIER/Premier Automation Purchase OrdersApp V18.5.mdb (inode 
>>> #4195619, mod time Fri Apr 26 20:07:42 2013) has 22 multiply-claimed
>> block(s), shared with 0 file(s):
>>> Multiply-claimed blocks already reassigned or cloned.
>>> 
>>> Pass 2: Checking directory structure
>>> Pass 3: Checking directory connectivity Pass 4: Checking reference 
>>> counts Pass 5: Checking group summary information
>>> 
>>> /dev/c/c: ***** FILE SYSTEM WAS MODIFIED *****
>>> /dev/c/c: 615898/30212096 files (13.6% non-contiguous),
>>> 62353456/483393536 blocks
>>> 
>>> After deleting the file (MS Access DB, and re-creating from backup, 
>>> the file system got mounted read only and the following errors were 
>>> logged:]
>>> 
>>> May 8 14:58:15 despair kernel: EXT4-fs error (device dm-0: 
>>> mb_free_blocks:1411: group 5124block 167904376:freeing already freed 
>>> block
>> (bit 1144 May 8 14:58:15 despair kernel: Aborting journal on device
> dm-0-8.
>>> May 8 14:58:15 despair kernel: EXT4-fs (dm-0: Remounting filesystem 
>>> read-only May 8 14:58:15 despair kernel: EXT4-fs error (device dm-0:
>>> mb_free_blocks:1411: group 5124block 167904377:freeing already freed 
>>> block (bit 1145 May 8 14:58:15 despair kernel: EXT4-fs error (device
>>> dm-0: mb_free_blocks:1411: group 5124block 167904378:freeing already 
>>> freed block (bit 1146 May 8 14:58:15 despair kernel: EXT4-fs error 
>>> (device dm-0: mb_free_blocks:1411: group 5124block 167904379:freeing 
>>> already freed block (bit 1147 May 8 14:58:15 despair kernel: EXT4-fs 
>>> error (device dm-0: mb_free_blocks:1411: group 5124block 
>>> 167904380:freeing already freed block (bit 1148 May 8 14:58:15 
>>> despair
>>> kernel: EXT4-fs error (device dm-0: mb_free_blocks:1411: group 
>>> 5124block 167904381:freeing already freed block (bit 1149 May 8
>>> 14:58:15 despair kernel: EXT4-fs error (device dm-0: 
>>> mb_free_blocks:1411: group 5124block 167904382:freeing already freed 
>>> block (bit 1150 May 8 14:58:16 despair kernel: EXT4-fs error (device
>>> dm-0: mb_free_blocks:1411: group 5124block 167904383:freeing already 
>>> freed block (bit 1151 May 8 14:58:16 despair kernel: EXT4-fs error 
>>> (device dm-0: mb_free_blocks:1411: group 5124block 167904384:freeing 
>>> already freed block (bit 1152 May 8 14:58:16 despair kernel: EXT4-fs 
>>> error (device dm-0: mb_free_blocks:1411: group 5124block 
>>> 167904385:freeing already freed block (bit 1153 May 8 14:58:16 
>>> despair
>>> kernel: EXT4-fs error (device dm-0: mb_free_blocks:1411: group 
>>> 5124block 167904386:freeing already freed block (bit 1154 May 8
>>> 14:58:16 despair kernel: EXT4-fs error (device dm-0: 
>>> mb_free_blocks:1411: group 5125block 167949296:freeing already freed 
>>> block (bit 13296 May 8 14:58:16 despair kernel: EXT4-fs error (device
>>> dm-0: mb_free_blocks:1411: group 5125block 167949297:freeing already 
>>> freed block (bit 13297 May 8 14:58:16 despair kernel: EXT4-fs error 
>>> (device dm-0: mb_free_blocks:1411: group 5125block 167949298:freeing 
>>> already freed block (bit 13298 May 8 14:58:16 despair kernel: EXT4-fs 
>>> error (device dm-0: mb_free_blocks:1411: group 5125block 
>>> 167949299:freeing already freed block (bit 13299 May 8 14:58:17 
>>> despair kernel: EXT4-fs error (device dm-0: mb_free_blocks:1411: 
>>> group 5125block 167949300:freeing already freed block (bit 13300 May 
>>> 8
>>> 14:58:17 despair kernel: EXT4-fs error (device dm-0: 
>>> mb_free_blocks:1411: group 5125block 167949301:freeing already freed 
>>> block (bit 13301 May 8 14:58:17 despair kernel: EXT4-fs error (device
>>> dm-0: mb_free_blocks:1411: group 5125block 167949302:freeing already 
>>> freed block (bit 13302 May 8 14:58:17 despair kernel: EXT4-fs error 
>>> (device dm-0: mb_free_blocks:1411: group 5125block 167949303:freeing 
>>> already freed block (bit 13303 May 8 14:58:17 despair kernel: EXT4-fs 
>>> error (device dm-0: mb_free_blocks:1411: group 5125block 
>>> 167949304:freeing already freed block (bit 13304 May 8 14:58:17 
>>> despair kernel: EXT4-fs error (device dm-0: mb_free_blocks:1411: 
>>> group 5125block 167949305:freeing already freed block (bit 13305 May 
>>> 8
>>> 14:58:17 despair kernel: EXT4-fs error (device dm-0: 
>>> mb_free_blocks:1411: group 5125block 167949306:freeing already freed 
>>> block (bit 13306
>>> 
>>> 
>>> These are the same blocks slated as multiply claimed
>>> 
>>> And then running an FSCK, we got the following:
>>> 
>>> ***** File system check forced at Wed May 8 15:16:50 WEST 2013 ***** 
>>> fsck 1.41.14 (22-Dec-2010 e2fsck 1.42.3 (14-May-2012
>>> /dev/c/c: recovering journal
>>> Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory
>> structure Pass 3: Checking directory connectivity Pass 4: Checking 
>> reference counts Pass 5: Checking group summary information Free 
>> blocks count wrong for group #5124 (28170, counted=28159.
>>> Fix? yes
>>> 
>>> Free blocks count wrong for group #5125 (25861, counted=25850.
>>> Fix? yes
>>> 
>>> Free blocks count wrong (420683133, counted=420644972.
>>> Fix? yes
>>> 
>>> Free inodes count wrong (29595347, counted=29595271.
>>> Fix? yes
>>> 
>>> 
>>> /dev/c/c: ***** FILE SYSTEM WAS MODIFIED *****
>>> /dev/c/c: 616825/30212096 files (13.6% non-contiguous,
>>> 62748564/483393536 blocks
>>> 
>>> Then later in the year I reloaded the server with the database open 
>>> from several client machines
>>> 
>>> ***** File system check forced at Tue Jul 23 21:02:13 WEST 2013 ***** 
>>> fsck
>> 1.42.8 (20-Jun-2013) e2fsck 1.42.8 (20-Jun-2013) Pass 1: Checking 
>> inodes, blocks, and sizes Inode 4195619, end of extent exceeds allowed 
>> value
>>>              (logical block 64907, physical block 11435403, len 16) 
>>> Clear? yes
>>> 
>>> Inode 4195619, i_blocks is 1337216, should be 1337176.  Fix? yes
>>> 
>>> Pass 2: Checking directory structure
>>> Pass 3: Checking directory connectivity Pass 4: Checking reference 
>>> counts Pass 5: Checking group summary information Block bitmap
>>> differences:  -(11435403--11435407) Fix? yes
>>> 
>>> Free blocks count wrong for group #348 (2130, counted=2135).
>>> Fix? yes
>>> 
>>> Free blocks count wrong (417470107, counted=417470112).
>>> Fix? yes
>>> 
>>> 
>>> /dev/c/c: ***** FILE SYSTEM WAS MODIFIED *****
>>> /dev/c/c: 625785/30212096 files (13.6% non-contiguous),
>>> 65923424/483393536 blocks
>>> 
>>> Again related to the same file, which is only an MS Access DB open 
>>> from
>> several client machines over SMB when the server is rebooted. Moving 
>> forward I ensure all instances are closed when reloading but even so I 
>> am surprised that a clean reload causes corruption at the filesystem
> level.
>>> 
>>> Since ensuring the DB is closed before reload, I have seen no further
>> issues like this.
>>> 
>>> Many Thanks
>>> Stephen Elliott
>>> 
>>> -----Original Message-----
>>> From: Zheng Liu [mailto:gnehzuil.liu@...il.com]
>>> Sent: 28 October 2013 06:39
>>> To: Andreas Dilger
>>> Cc: Stephen Elliott; David Jeffery; linux-ext4@...r.kernel.org List; 
>>> Bernd Schubert; Eric Whitney
>>> Subject: Re: Query FSCK Errors on ext4
>>> 
>>> [Cc Eric Whitney to confirm this problem]
>>> 
>>> Hi Andreas,
>>> 
>>> If I remember correctly, this patch might can fix this problem [1].
>>> 
>>> 1. http://www.spinics.net/lists/linux-ext4/msg39485.html
>>> 
>>> Regards,
>>>                                              - Zheng
>>> 
>>> On Mon, Oct 28, 2013 at 12:13:26AM -0600, Andreas Dilger wrote:
>>>> The error reported here is a relatively new one.  It only appeared 
>>>> in e2fsck 1.42.8, and wasn t in the code that I m using locally 
>>>> (1.42.7) so I wasn t sure what it actually meant without looking at it.
>>>> 
>>>> It looks like some kind of overflow of the extent tree, which causes 
>>>> e2fsck to chop off the last 5 disk blocks (40 sectors), though I m 
>>>> not sure exactly why.  From your comments, this can be reproduced 
>>>> with your database usage?  Does it use fallocate() or any other 
>>>> strange IO operations that might be causing this?
>>>> 
>>>> Have you tried updating your kernel?  If there is repeated 
>>>> corruption appearing in the filesystem, then it is either a bug in 
>>>> the kernel or in e2fsck.  Not really sure which one to blame at this
> point.
>>>> 
>>>> Cheers, Andreas
>>>> 
>>>> On Oct 18, 2013, at 9:45 AM, Stephen Elliott <techweb@...world.com>
>> wrote:
>>>> 
>>>>> Any feedback on this guys??? Would really appreciate somebody 
>>>>> taking a
>> look over this.
>>>>> 
>>>>> From: Stephen Elliott [mailto:techweb@...world.com]
>>>>> Sent: 22 September 2013 20:13
>>>>> To: linux-ext4@...r.kernel.org; linux-fsdevel@...r.kernel.org; 
>>>>> Andreas
>> Dilger (adilger@...ger.ca); 'Bernd Schubert'
>>>>> Subject: Query FSCK Errors on ext4
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I have theorised that the problem comes from the MS access DB being 
>>>>> open
>> (over Samba) on client workstations when the server is reloaded.
>>>>> 
>>>>> Since ensuring these are closed prior to reloading, I have not seen
>> further FSCK errors on reload. Is there an explanation for this? I can 
>> see why this may corrupt DB but not the filesystem.
>>>>> 
>>>>> Just as a primer, I used a ReadyNAS NV+ for many years which was 
>>>>> running
>> ext3 and never had this issue. However, since using ext4 on a ReadyNAS 
>> Pro, I now see this issue.
>>>>> 
>>>>> Many Thanks
>>>>> Stephen Elliott
>>>>> 
>>>>> From: Stephen Elliott [mailto:techweb@...world.com]
>>>>> Sent: 23 July 2013 22:02
>>>>> To: linux-ext4@...r.kernel.org; linux-fsdevel@...r.kernel.org; 
>>>>> Andreas
>> Dilger (adilger@...ger.ca); 'Bernd Schubert'
>>>>> Subject: RE: FSCK Errors on ext4
>>>>> 
>>>>> If it helps guys, the same file as before is causing the issue with
>> inode 4195610, a very large MS access DB.
>>>>> 
>>>>> From: Stephen Elliott [mailto:techweb@...world.com]
>>>>> Sent: 23 July 2013 21:52
>>>>> To: linux-ext4@...r.kernel.org; linux-fsdevel@...r.kernel.org; 
>>>>> Andreas
>> Dilger (adilger@...ger.ca); 'Bernd Schubert'
>>>>> Subject: FSCK Errors on ext4
>>>>> 
>>>>> Hi Andreas / Bernd / all,
>>>>> 
>>>>> You may recall advising me on another batch of FSCK errors a few 
>>>>> months
>> back.
>>>>> 
>>>>> The same device on an ext4 file system has produced the following 
>>>>> errors
>> after a clean reload. It seems to be fine now but wanted your input on
> this.
>> No bad blocks are reported on the devices etc.
>>>>> 
>>>>> ***** File system check forced at Tue Jul 23 21:02:13 WEST 2013 
>>>>> *****
>> fsck 1.42.8 (20-Jun-2013) e2fsck 1.42.8 (20-Jun-2013) Pass 1: Checking 
>> inodes, blocks, and sizes Inode 4195619, end of extent exceeds allowed 
>> value
>>>>>              (logical block 64907, physical block 11435403, len
>>>>> 16) Clear? yes
>>>>> 
>>>>> Inode 4195619, i_blocks is 1337216, should be 1337176.  Fix? yes
>>>>> 
>>>>> Pass 2: Checking directory structure Pass 3: Checking directory 
>>>>> connectivity Pass 4: Checking reference counts Pass 5: Checking 
>>>>> group summary information Block bitmap differences:
>>>>> -(11435403--11435407) Fix? yes
>>>>> 
>>>>> Free blocks count wrong for group #348 (2130, counted=2135).
>>>>> Fix? yes
>>>>> 
>>>>> Free blocks count wrong (417470107, counted=417470112).
>>>>> Fix? yes
>>>>> 
>>>>> 
>>>>> /dev/c/c: ***** FILE SYSTEM WAS MODIFIED *****
>>>>> /dev/c/c: 625785/30212096 files (13.6% non-contiguous),
>>>>> 65923424/483393536 blocks
>>>>> 
>>>>> Many Thanks
>>>>> Stephen Elliott
>>>> 
>>>> 
>>>> Cheers, Andreas
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" 
>>>> in the body of a message to majordomo@...r.kernel.org More majordomo 
>>>> info at  http://vger.kernel.org/majordomo-info.html
>> 
>> 
>> Cheers, Andreas
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html