问题
在使用apisix中,我们有一个java plugin做加解密,命令行配置如下:
ext-plugin:
cmd: ['java', '-jar', '-XX:+HeapDumpOnOutOfMemoryError' ,'-XX:HeapDumpPath=/usr/local/apisix/logs/apisixrunner_oomdump.hprof','-XX:ErrorFile=/usr/local/apisix/logs/apisixrunner_hs_error_dump.log', '-Xmx1g', '-Xms512m', '-Dspring.profiles.active=develop', '/usr/apisix/apisix-runner-bin/apisix-java-plugin-runner.jar']
但是发现这个jvm偶尔会自动重启。遂开始调查。
调查步骤
1,先对比了下java日志,发现没有异常,就是自动重启(上下并无明显异常):
[]06-17 04:30:38.540.+0000[] [INFO ][main][o.a.a.p.r.PluginRunnerApplication.logStarted:61]Started PluginRunnerApplication in 3.652 seconds (JVM running for 5.473)
2,查看是否有oom日志,上面的heapdump中没有hprof生成。
3,问AI APISIX用lua写的,所以问问ai它是如何管理插件系统的:
于是过滤apisix容器日志,发现真的有respawn相关日志:

可以看到确实是被kill -9了,
4,于是查看为啥被kill了,使用的如下命令:
# 检查 OOM Killer 详细日志
dmesg -T | grep -i "out of memory\|killed process" -B20 -A20 # 显示上下文
查看到如下内容:
[Tue Jun 17 05:51:40 2025] [61337] 99 61337 286571 12845 81 0 958 openresty
[Tue Jun 17 05:51:40 2025] [60265] 99 60265 286378 12718 81 0 958 openresty
[Tue Jun 17 05:51:40 2025] [42939] 99 42939 284377 10724 75 0 958 openresty
[Tue Jun 17 05:51:40 2025] [64084] 0 64084 3816 259 13 0 958 bash
[Tue Jun 17 05:51:40 2025] [ 596] 0 596 1395294 133765 380 0 958 java
[Tue Jun 17 05:51:40 2025] [54493] 0 54493 1954 99 10 0 958 tail
[Tue Jun 17 05:51:40 2025] Memory cgroup out of memory: Kill process 59314 (epollEventLoopG) score 1213 or sacrifice child
[Tue Jun 17 05:51:40 2025] Killed process 596 (java), UID 0, total-vm:5581176kB, anon-rss:524188kB, file-rss:10872kB, shmem-rss:0kB
[Tue Jun 17 05:51:40 2025] openresty invoked oom-killer: gfp_mask=0x50, order=0, oom_score_adj=958
[Tue Jun 17 05:51:40 2025] openresty cpuset=docker-a71969200f6221395c742fe44dd198730358fe11d1a668980c0eb7410e42a79d.scope mems_allowed=0
[Tue Jun 17 05:51:40 2025] CPU: 6 PID: 53681 Comm: openresty Kdump: loaded Tainted: G ------------ T 3.10.0-1160.el7.x86_64 #1
[Tue Jun 17 05:51:40 2025] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
[Tue Jun 17 05:51:40 2025] Call Trace:
[Tue Jun 17 05:51:40 2025] [<ffffffff9f381340>] dump_stack+0x19/0x1b
[Tue Jun 17 05:51:40 2025] [<ffffffff9f37bc60>] dump_header+0x90/0x229
[Tue Jun 17 05:51:40 2025] [<ffffffff9edc1b26>] ? find_lock_task_mm+0x56/0xc0
[Tue Jun 17 05:51:40 2025] [<ffffffff9edc208d>] oom_kill_process+0x2cd/0x490
[Tue Jun 17 05:51:40 2025] [<ffffffff9ee40cdc>] mem_cgroup_oom_synchronize+0x55c/0x590
[Tue Jun 17 05:51:40 2025] [<ffffffff9ee40140>] ? mem_cgroup_charge_common+0xc0/0xc0
[Tue Jun 17 05:51:40 2025] [<ffffffff9edc2974>] pagefault_out_of_memory+0x14/0x90
[Tue Jun 17 05:51:40 2025] [<ffffffff9f37a16c>] mm_fault_error+0x6a/0x157
[Tue Jun 17 05:51:40 2025] [<ffffffff9f38e8d1>] __do_page_fault+0x491/0x500
[Tue Jun 17 05:51:40 2025] [<ffffffff9f38ea26>] trace_do_page_fault+0x56/0x150
[Tue Jun 17 05:51:40 2025] [<ffffffff9f38dfa2>] do_async_page_fault+0x22/0xf0
[Tue Jun 17 05:51:40 2025] [<ffffffff9f38a7a8>] async_page_fault+0x28/0x30
[Tue Jun 17 05:51:40 2025] Task in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6962c5d3_918f_425a_b560_2b977138f026.slice/docker-a71969200f6221395c742fe44dd198730358fe11d1a668980c0eb7410e42a79d.scope killed as a result of limit of /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6962c5d3_918f_425a_b560_2b977138f026.slice
[Tue Jun 17 05:51:40 2025] memory: usage 2097152kB, limit 2097152kB, failcnt 564269
[Tue Jun 17 05:51:40 2025] memory+swap: usage 2097152kB, limit 9007199254740988kB, failcnt 0
这里面有看到kill的原因是因为 Memory cgroup out of memory, 然后选择牺牲掉子进程(也就是java)。
可以看到该任务是k8s启动,于是联想到k8s的request resource limit.
5, 解决办法:可以修改apisix的resource limit 或者调低 java的xms和xmx。
总结
1,善用ai,尤其是deepwiki这种,可以给不太熟悉的代码仓库一些大的逻辑参考。
2,oom killer grep command for dmesg