Champion of Cyrodiil: May 2018

This morning I noticed that my USB backup drive would not mount properly. The first thing I did was check the dmesg log from Debian/Linux...

[1962217.824829] usb 1-5.1: New USB device found, idVendor=0bc2, idProduct=3300

[1962217.824832] usb 1-5.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3

[1962217.824834] usb 1-5.1: Product: Desktop

[1962217.824835] usb 1-5.1: Manufacturer: Seagate

[1962217.824837] usb 1-5.1: SerialNumber: 2GHNRSVE

[1962217.825589] usb-storage 1-5.1:1.0: USB Mass Storage device detected

[1962217.825927] scsi host4: usb-storage 1-5.1:1.0

[1962218.832372] scsi 4:0:0:0: Direct-Access Seagate Desktop 0130 PQ: 0 ANSI: 4

[1962218.833253] sd 4:0:0:0: Attached scsi generic sg1 type 0

[1962218.833370] sd 4:0:0:0: [sdb] 3907029168 512-byte logical blocks: (2.00 TB/1.82 TiB)

[1962218.833691] sd 4:0:0:0: [sdb] Write Protect is off

[1962218.833694] sd 4:0:0:0: [sdb] Mode Sense: 2f 08 00 00

[1962218.833991] sd 4:0:0:0: [sdb] No Caching mode page found

[1962218.833996] sd 4:0:0:0: [sdb] Assuming drive cache: write through

[1962218.842262] sdb: sdb1

[1962218.843668] sd 4:0:0:0: [sdb] Attached SCSI disk

[1962219.029068] sd 4:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_SENSE

[1962219.029072] sd 4:0:0:0: [sdb] tag#0 Sense Key : Hardware Error [current] [descriptor]

[1962219.029074] sd 4:0:0:0: [sdb] tag#0 Add. Sense: No additional sense information

[1962219.029077] sd 4:0:0:0: [sdb] tag#0 CDB: ATA command pass through(16) 85 06 20 00 00 00 00 00 00 00 00 00 00 00 e5 00

[1962219.111836] sd 4:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_SENSE

[1962219.111839] sd 4:0:0:0: [sdb] tag#0 Sense Key : Hardware Error [current] [descriptor]

[1962219.111841] sd 4:0:0:0: [sdb] tag#0 Add. Sense: No additional sense information

[1962219.111844] sd 4:0:0:0: [sdb] tag#0 CDB: ATA command pass through(12)/Blank a1 06 20 da 00 00 4f c2 00 b0 00 00

[1962251.479639] JBD2: Invalid checksum recovering block 144 in log

[1962251.574260] JBD2: recovery failed

[1962251.574265] EXT4-fs (dm-4): error loading journal

[1962258.650930] JBD2: Invalid checksum recovering block 144 in log

[1962258.749283] JBD2: recovery failed

[1962258.749288] EXT4-fs (dm-4): error loading journal

This generally means that there were issues with the unmount and the "checksum" for block 144 in the journal does not match the actual checksum from disk. This means there is an error reading the journal, and to prevent further corruption EXT4-fs will not mount via device mapper.

So how do we fix this? If this was not encrypted you could google this error and find out to run FSCK and hopefully be done with it.

Lets start with a quick reminder of the utility `lsblk`.

root@CLCFQ92:~# lsblk --fs /dev/sdb

NAME FSTYPE LABEL UUID

sdb

└─sdb1 crypto_LUKS ce6cebbc-5026-4f47-9a22-da4aecfd26ad

└─luks-ce6cebbc-5026-4f47-9a22-da4aecfd26ad ext4 Backup e7bab3dc-87ce-4a7d-b758-34e2839b51f0

This 'console' output does not render well above using the blog defaults, so I'll have to modify it for clarity, but the thing to notice is that the File System type for sdb1 is NOT ext4, it is crypto_LUKS. There is then an extended partition that is 'ext4' called luks-. This luks partition is the one you want to run fsck on. NOT /dev/sdb1 (or /dev/sdb).

So lets first open the encrypted fs...

root@CLCFQ92:~# cryptsetup luksOpen /dev/sdb1 corrupted

Enter passphrase for /dev/sdb1:

Now, lets recover the device via the name ("corrupted") we mapped to the luks partition...

root@CLCFQ92:~# fsck /dev/mapper/corrupted

fsck from util-linux 2.29.2

e2fsck 1.43.4 (31-Jan-2017)

Backup: recovering journal

JBD2: Invalid checksum recovering block 144 in log

Journal checksum error found in Backup

Backup was not cleanly unmounted, check forced.

Pass 1: Checking inodes, blocks, and sizes

Pass 2: Checking directory structure

Pass 3: Checking directory connectivity

Pass 4: Checking reference counts

Pass 5: Checking group summary information

Free blocks count wrong (436669919, counted=433657941).

Fix? yes

Free inodes count wrong (121861163, counted=121856397).

Fix? yes

Backup: ***** FILE SYSTEM WAS MODIFIED *****

Backup: 245363/122101760 files (0.4% non-contiguous), 54719555/488377496 blocks

Next, disable/close the mapped name to the luks partition:

root@CLCFQ92:~# cryptsetup luksClose /dev/mapper/corrupted

Then mount/open the filesystem the way you normally would using gnome or whatever. Check dmesh logs for confirmation!

At this point it may be good to look at using the `smartctl` utility, as your disk may be old/dying...

root@CLCFQ92:~# smartctl -a /dev/sdb

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 118 100 006 Pre-fail Always - 200743680
3 Spin_Up_Time 0x0003 094 093 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 72
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 073 063 030 Pre-fail Always - 24115411
9 Power_On_Hours 0x0032 077 077 000 Old_age Always - 20273
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 39
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 053 043 045 Old_age Always In_the_past 47 (Min/Max 42/47 #179)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 19
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 767
194 Temperature_Celsius 0x0022 047 057 000 Old_age Always - 47 (0 19 0 0 0)
195 Hardware_ECC_Recovered 0x001a 022 021 000 Old_age Always - 200743680
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 12697 (213 75 0)
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 3453343066
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 2807657092

Based on the output from the SMART controller, this disk is having a lot of errors and should be backed up and disposed of.

Search This Blog

Monday, May 21, 2018

Repairing Corrupted LUKS Encrypted Filesystem