磁盘出现sector_error的修复

坏快的模拟操作

1
2
3
4
[root@lab102 ~]# hdparm --yes-i-know-what-i-am-doing --make-bad-sector 5555  /dev/sdb

/dev/sdb:
Corrupting sector 5555 (WRITE_UNC_EXT as pseudo): succeeded

注意这个操作如果操作了后面记得恢复,不然留在磁盘上面忘记处理,后面就是坏快在那里的,这个是人为的注入一个坏块的操作的

检查坏快的情况

(我的环境一个block = 2个sector)

1
2
3
4
5
6
7
8
[root@lab102 ~]# badblocks -v -s /dev/sdb 3333
Checking blocks 0 to 3333
Checking for bad blocks (read-only test): 277629% done, 0:06 elapsed. (0/0/0 errors)
277732% done, 0:08 elapsed. (1/0/0 errors)
277835% done, 0:10 elapsed. (2/0/0 errors)
277938% done, 0:12 elapsed. (3/0/0 errors)
done
Pass completed, 4 bad blocks found. (4/0/0 errors)

用这个检测,发现了四个error ,正好是4个block,8个sector的损坏

可以看到报错的

1
2
3
4
5
6
7
8
9
[20831.254978] blk_update_request: critical medium error, dev sdb, sector 5552
[20831.257006] Buffer I/O error on dev sdb, logical block 694, async page read
[20833.271865] sd 0:0:18:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[20833.271882] sd 0:0:18:0: [sdb] tag#0 Sense Key : Medium Error [current]
[20833.271890] sd 0:0:18:0: [sdb] tag#0 Add. Sense: Unrecovered read error
[20833.271898] sd 0:0:18:0: [sdb] tag#0 CDB: Read(16) 88 00 00 00 00 00 00 00 15 b0 00 00 00 08 00 00
[20833.271905] blk_update_request: critical medium error, dev sdb, sector 5552
[20833.274074] Buffer I/O error on dev sdb, logical block 694, async page read
[20849.631237] sdb: sdb1

显示的是 dev sdb, sector 5552

1
2
3
4
5
[root@lab102 ~]# cat zp
2776
2777
2778
2779

5552 - 5558 应该都损坏了,我们先正常修复提示的

修复坏快

1
2
3
4
[root@lab102 ~]# hdparm --yes-i-know-what-i-am-doing --repair-sector 5552 /dev/sdb

/dev/sdb:
re-writing sector 5552: succeeded

提示成功了,再次检测,提示损坏了 sector 5553

1
2
3
4
[root@lab102 ~]# hdparm --yes-i-know-what-i-am-doing --read-sector 5553 /dev/sdb

/dev/sdb:
reading sector 5553: FAILED: Input/output error

验证读取确实出错了

1
2
3
4
5
6
7
8
9
[root@lab102 ~]# hdparm --yes-i-know-what-i-am-doing --repair-sector 5553 /dev/sdb

/dev/sdb:
re-writing sector 5553: succeeded
[root@lab102 ~]# hdparm --yes-i-know-what-i-am-doing --read-sector 5553 /dev/sdb

/dev/sdb:
reading sector 5553: succeeded
0000 0000 0000 0000 0000 0000 0000 0000

确认修复成功了

修复后就可以读取了

1
2
3
4
[root@lab102 ~]# badblocks -v -s /dev/sdb 3333 -o zp
Checking blocks 0 to 3333
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found. (0/0/0 errors)

修复完毕后,就没有提示了

这里的修复是把sector标记为0了,避免read error完全读取不了的情况,软件会因为文件在,但是读取不到,直接崩溃,并且删除也无法删除,这个标记为0后,删除以后,也就是丢失了对应的文件,而不是完全无法用,这个可以权衡再操作

其它知识

检查指定区间的方法

1
2
3
4
[root@lab102 ~]# badblocks -s -v /dev/sdb  2000 1000
Checking blocks 1000 to 2000
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found. (0/0/0 errors)

查看sector总数的方法

1
2
3
4
[root@lab102 ~]# fdisk -l /dev/sdb
WARNING: fdisk GPT support is currently new, and therefore in an experimental phase. Use at your own discretion.

Disk /dev/sdb: 4000.8 GB, 4000787030016 bytes, 7814037168 sectors

查看blocks的方法

1
2
[root@lab102 ~]# badblocks -s -v /dev/sdb
Checking blocks 0 to 3907018583

可以通过计算,算出block和sector的关系后,如果定位到sector的错误,可以通过算出block的位置来检查磁盘的block的错误,我们这里没有使用badblock的修复方法,上面的那个修复方法是sector的级别的,更好一点,这里可以用于检查使用