nvme电源控制引起的nvme磁盘离线

故障信息

1
2
3
4
5
6
7
8
9
10
Nov 26 07:20:12 node03 kernel: nvme nvme0: I/O 24 QID 0 timeout, reset controller
Nov 26 07:20:22 node03 kernel: nvme nvme0: I/O 146 QID 4 timeout, aborting
Nov 26 07:20:25 node03 kernel: nvme nvme0: I/O 719 QID 1 timeout, reset controller
Nov 26 07:21:37 node03 kernel: nvme nvme0: Device not ready; aborting reset
Nov 26 07:21:37 node03 kernel: nvme nvme0: Abort status: 0x7
Nov 26 07:21:37 node03 kernel: nvme nvme0: Abort status: 0x7
Nov 26 07:21:57 node03 kernel: nvme nvme0: Device not ready; aborting reset
Nov 26 07:21:57 node03 kernel: nvme nvme0: Removing after probe failure status: -19
Nov 26 07:21:57 node03 kernel: nvme0n1: detected capacity change from 4000787030016 to 0
Nov 26 07:21:59 node03 kernel: nvme nvme0: failed to set APST feature (-19)

故障信息就是这个磁盘离线了,最后面有个信息是 APST feature 无法设置

相关资料

某些 NVMe 设备可能会出现与省电 (APST) 相关的问题。这是 Kingston A2000 [8](自固件S5Z42105起)的已知问题,之前已在 Samsung NVMe 驱动器(Linux v4.10)[9] [10]上报告过。某些 WesternDigital/Sandisk 设备也报告过此问题[11]。
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184
https://askubuntu.com/questions/905710/ext4-fs-error-after-ubuntu-17-04-upgrade/906105#906105
相关的bug
自 2021 年 3 月起,金士顿推出了固件更新9。由于金士顿仅支持 Windows,因此可以通过heise.de或github找到 Linux 的下载。预计只要内核解决方法到位,固件更新就不会有太大作用,因为无论如何都不会达到最深的省电状态。

参考这个做处理
https://wiki.archlinux.org/title/Solid_state_drive/NVMe#Troubleshooting

重点信息

这个就是有几个触发条件

  • 三星nvme ssd (其它可能也会,这个确实是出现了)
  • 4.x的内核
1
2
[root@node27 ~]# uname -a
Linux node27 4.14.113-1.el7.x86_64 #1

三星990 pro的nvme ssd

这两个条件都具备,并且显示的信息一致

处理方法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
vim drivers/nvme/host/core.c
static int nvme_configure_apst(struct nvme_ctrl *ctrl)
{
/*
* APST (Autonomous Power State Transition) lets us program a
* table of power state transitions that the controller will
* perform automatically. We configure it with a simple
* heuristic: we are willing to spend at most 2% of the time
* transitioning between power states. Therefore, when running
* in any given state, we will enter the next lower-power
* non-operational state after waiting 50 * (enlat + exlat)
* microseconds, as long as that state's exit latency is under
* the requested maximum latency.
*
* We will not autonomously enter any non-operational state for
* which the total latency exceeds ps_max_latency_us. Users
* can set ps_max_latency_us to zero to turn off APST.
*/

可以看到可以通过设置ps_max_latency_us 为0 来关闭APST

关闭这个电源控制的地方

1
2
[root@node65 ~]# cat  /sys/module/nvme_core/parameters/default_ps_max_latency_us
100000

默认是这个值,我们需要改成0

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[root@node65 ~]# nvme get-feature -f 0x0c -H /dev/nvme0n1|head -n 20
get-feature:0xc (Autonomous Power State Transition), Current value:0x000001
Autonomous Power State Transition Enable (APSTE): Enabled
Auto PST Entries .................
Entry[ 0]
.................
Idle Time Prior to Transition (ITPT): 200 ms
Idle Transition Power State (ITPS): 3
.................
Entry[ 1]
.................
Idle Time Prior to Transition (ITPT): 200 ms
Idle Transition Power State (ITPS): 3
.................
Entry[ 2]
.................
Idle Time Prior to Transition (ITPT): 200 ms
Idle Transition Power State (ITPS): 3
.................
Entry[ 3]
.................

这个也可以查询到

调整grub文件

1
linuxefi /vmlinuz-4.14.113-1.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8 nvme_core.default_ps_max_latency_us=0

nvme_core.default_ps_max_latency_us=0
调整为0后重启

检查

1
2
[root@node27 ~]# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-4.14.113-1.el7.x86_64 root=/dev/mapper/centos-root ro crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet LANG=en_US.UTF-8 nvme_core.default_ps_max_latency_us=0
1
2
[root@node27 ~]# cat  /sys/module/nvme_core/parameters/default_ps_max_latency_us
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@node27 ~]# nvme get-feature -f 0x0c -H /dev/nvme0n1|head -n 20
get-feature:0xc (Autonomous Power State Transition), Current value:00000000
Autonomous Power State Transition Enable (APSTE): Disabled
Auto PST Entries .................
Entry[ 0]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................
Entry[ 1]
.................
Idle Time Prior to Transition (ITPT): 0 ms
Idle Transition Power State (ITPS): 0
.................

上面的任意一种方式查询都可以

总结

就是固件的电源管理这块,跟内核的功能不匹配,固件也不知道什么版本能解决,这个地方直接在内核里面禁用掉这个功能即可,网上可以搜到很多相关的问题,处理方式这样是最简单的