NVMe over IB, kernel page allocation error followed by call trace is observed in messages log
Applies to
- E-Series Storage Array
- NVMe over IB protocol
- RHEL 7.x
Issue
A call trace similar to below may be observed in /var/log/messages:
Feb 11 22:45:09 ictm1604s01h2 kernel: kworker/u32:1: page allocation failure: order:9, mode:0x80d0
Feb 11 22:45:09 ictm1604s01h2 kernel: CPU: 12 PID: 1916 Comm: kworker/u32:1 Kdump: loaded Tainted: G ------------ T 3.10.0-1001.el7.x86_64 #1
Feb 11 22:45:09 ictm1604s01h2 kernel: Workqueue: ib_addr process_one_req [ib_core]
Feb 11 22:45:09 ictm1604s01h2 kernel: Call Trace:
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffbb968c1b>] dump_stack+0x19/0x1b
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffbb3be230>] warn_alloc_failed+0x110/0x180
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffbb96420d>] __alloc_pages_slowpath+0x6b6/0x724
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffbb3c28e4>] __alloc_pages_nodemask+0x404/0x420
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffbb23217f>] dma_generic_alloc_coherent+0x8f/0x140
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffbb26b981>] x86_swiotlb_alloc_coherent+0x21/0x50
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffc0751ee4>] mlx5_dma_zalloc_coherent_node+0xb4/0x110 [mlx5_core]
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffc075234d>] mlx5_buf_alloc_node+0x4d/0xc0 [mlx5_core]
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffc07523d4>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffc0aadadd>] create_kernel_qp.isra.63+0x43a/0x73e [mlx5_ib]
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffbb45b956>] ? _d_rehash+0x36/0x40
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffbb41def5>] ? kmem_cache_alloc+0x35/0x1f0
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffbb45fb71>] ? alloc_inode+0x51/0xa0
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffc0a9d6c7>] create_qp_common+0x7b7/0x13a0 [mlx5_ib]
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffbb4dddd3>] ? __create_file+0xf3/0x2c0
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffbb41e286>] ? kmem_cache_alloc_trace+0x1d6/0x200
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffc0a9e3fa>] mlx5_ib_create_qp+0x14a/0x830 [mlx5_ib]
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffbb41cf0f>] ? __slab_free+0x9f/0x320
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffc074b60b>] ? mlx5_debug_cq_add+0x4b/0x70 [mlx5_core]
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffc07504ae>] ? mlx5_core_create_cq+0x1ce/0x250 [mlx5_core]
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffbb3d79c5>] ? kvfree+0x35/0x40
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffc03abb3b>] ib_create_qp+0x8b/0x360 [ib_core]
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffc040d974>] rdma_create_qp+0x34/0xb0 [rdma_cm]
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffc03f4ba0>] nvme_rdma_create_qp.constprop.54+0x80/0xb0 [nvme_rdma]
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffc03f4870>] ? nvme_rdma_memreg_done+0x30/0x30 [nvme_rdma]
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffc03f775d>] nvme_rdma_cm_handler+0x1bd/0x799 [nvme_rdma]
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffc040e2cb>] ? cma_acquire_dev_by_src_ip+0x14b/0x230 [rdma_cm]
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffc0410b7a>] addr_handler+0xaa/0x1d0 [rdma_cm]
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffc03b6cbc>] process_one_req+0x3c/0x130 [ib_core]
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffbb2ba0af>] process_one_work+0x17f/0x440
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffbb2bad76>] worker_thread+0x126/0x3c0
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffbb2bac50>] ? manage_workers.isra.25+0x2a0/0x2a0
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffbb2c2011>] kthread+0xd1/0xe0
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffbb2c1f40>] ? insert_kthread_work+0x40/0x40
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffbb97bcb7>] ret_from_fork_nospec_begin+0x21/0x21
Feb 11 22:45:09 ictm1604s01h2 kernel: [<ffffffffbb2c1f40>] ? insert_kthread_work+0x40/0x40
Feb 11 22:45:09 ictm1604s01h2 kernel: Mem-Info:
Feb 11 22:45:09 ictm1604s01h2 kernel: active_anon:77845 inactive_anon:42916 isolated_anon:0#012 active_file:1775552 inactive_file:388096 isolated_file:0#012 unevictable:4454 dirty:393 writeback:0 unstable:0#012 slab_reclaimable:77384 slab_unreclaimable:72099#012 mapped:38104 shmem:19155 pagetables:3123 bounce:0#012 free:860445 free_pcp:64 free_cma:0
Feb 11 22:45:09 ictm1604s01h2 kernel: Node 1 DMA free:15868kB min:68kB low:84kB high:100kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15984kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Feb 11 22:45:09 ictm1604s01h2 kernel: lowmem_reserve[]: 0 1297 15360 15360
Feb 11 22:45:09 ictm1604s01h2 kernel: Node 1 DMA32 free:234676kB min:5700kB low:7124kB high:8548kB active_anon:33936kB inactive_anon:9424kB active_file:665196kB inactive_file:131784kB unevictable:792kB isolated(anon):0kB isolated(file):0kB present:1918164kB managed:1328308kB mlocked:792kB dirty:184kB writeback:0kB mapped:9176kB shmem:2668kB slab_reclaimable:27688kB slab_unreclaimable:25880kB kernel_stack:1024kB pagetables:1224kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Feb 11 22:45:09 ictm1604s01h2 kernel: lowmem_reserve[]: 0 0 14063 14063
Feb 11 22:45:09 ictm1604s01h2 kernel: Node 1 Normal free:3191236kB min:61812kB low:77264kB high:92716kB active_anon:277444kB inactive_anon:162240kB active_file:6437012kB inactive_file:1420600kB unevictable:17024kB isolated(anon):0kB isolated(file):0kB present:14680064kB managed:14400640kB mlocked:17036kB dirty:1388kB writeback:0kB mapped:143240kB shmem:73952kB slab_reclaimable:281848kB slab_unreclaimable:262484kB kernel_stack:10464kB pagetables:11268kB unstable:0kB bounce:0kB free_pcp:256kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Feb 11 22:45:09 ictm1604s01h2 kernel: lowmem_reserve[]: 0 0 0 0
Feb 11 22:45:09 ictm1604s01h2 kernel: Node 1 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15868kB
Feb 11 22:45:09 ictm1604s01h2 kernel: Node 1 DMA32: 595*4kB (UEM) 434*8kB (UEM) 160*16kB (UEM) 94*32kB (UEM) 63*64kB (UEM) 44*128kB (UEM) 15*256kB (UM) 10*512kB (UEM) 198*1024kB (M) 1*2048kB (M) 0*4096kB = 234844kB
Feb 11 22:45:09 ictm1604s01h2 kernel: Node 1 Normal: 1098*4kB (UEM) 1998*8kB (UEM) 417*16kB (UEM) 780*32kB (UEM) 1219*64kB (UM) 1015*128kB (UM) 828*256kB (UEM) 813*512kB (UEM) 2247*1024kB (UM) 2*2048kB (M) 0*4096kB = 3193192kB
Feb 11 22:45:09 ictm1604s01h2 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Feb 11 22:45:09 ictm1604s01h2 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Feb 11 22:45:09 ictm1604s01h2 kernel: 2184655 total pagecache pages
Feb 11 22:45:09 ictm1604s01h2 kernel: 0 pages in swap cache
Feb 11 22:45:09 ictm1604s01h2 kernel: Swap cache stats: add 0, delete 0, find 0/0
Feb 11 22:45:09 ictm1604s01h2 kernel: Free swap = 4194300kB
Feb 11 22:45:09 ictm1604s01h2 kernel: Total swap = 4194300kB
Feb 11 22:45:09 ictm1604s01h2 kernel: 4153553 pages RAM
Feb 11 22:45:09 ictm1604s01h2 kernel: 0 pages HighMem/MovableOnly
Feb 11 22:45:09 ictm1604s01h2 kernel: 217341 pages reserved