This is a general guide for detecting and debugging kernel space memory leak. Since the drivers and firmwares in real products vary a lot, this post cannot cover the specific issues.
1. Detect Memory Leak
Memory leak can be detected by monitoring the free memory periodically. Command free
can be used to show rough memory usage. A more detailed way to analyze memroy is cat /proc/meminfo
and cat /proc/slabinfo
. /proc/meminfo
contains info about total memory, free memory, total highmem, free highmem, total lowmem, free lowmem and etc. Usually highmem is user-space memory, while lowmem is kernel-space memory. If free highmem(HighFree) or free lowmem(LowFree) is continuously decreasing, in most cases it means user space or kernel space memory leaking.
Once detected memory leaking, we need to determine which slab(s) is(are) leaking. This can be done by monitoring /proc/slabinfo
. If the number of a slab’s active objects(column 2) or total objects(column 3) keep increasing, then this slab is very likely leaking memory.
Following script can be used to monitor both meminfo and slabinfo:
#!/bin/sh
MAX_SIZE=20000000
MON_FILE=/path/to/monitor_output_$(uname -n)
while true
do
date >> $MON_FILE
cat /proc/meminfo >> $MON_FILE
echo "----------------------" >> $MON_FILE
cat /proc/slabinfo >> $MON_FILE
echo "----------------------" >> $MON_FILE
ps -o pid,comm,stat,time,rss,vsz >> $MON_FILE
echo "++++++++++++++++++++++" >> $MON_FILE
fsize=`ls -l $MON_FILE | awk '{print $5}'`
if [ $fsize -gt $MAX_SIZE ]; then
# upload file to TFTP server
suffix=`cat /proc/uptime | cut -d" " -f1`
mv $MON_FILE $MON_FILE.$suffix
# upload to cloud
fi
# kmemleak
kmemleak=/sys/kernel/debug/kmemleak
[ -r $kmemleak ] && cat $kmemleak > /storage/kmemleak_$(uname -n).out
sleep 300
done
After continuously monitoring for hours, simply grep some keywords(e.g. HighFree, LowFree, kmalloc-192, etc.) could find the trend of memory usage. Following data is extracted from a real memory leak monitoring log:
LowFree: 599392 kB
LowFree: 571072 kB
LowFree: 544484 kB
LowFree: 516832 kB
LowFree: 489232 kB
LowFree: 462280 kB
LowFree: 433680 kB
LowFree: 405244 kB
LowFree: 378572 kB
LowFree: 350136 kB
LowFree: 322648 kB
LowFree: 295824 kB
LowFree: 267532 kB
LowFree: 238272 kB
LowFree: 210856 kB
LowFree: 181148 kB
LowFree: 153652 kB
LowFree: 123148 kB
LowFree: 599392 kB
LowFree: 571072 kB
LowFree: 544484 kB
LowFree: 181148 kB
LowFree: 153652 kB
LowFree: 123148 kB
LowFree: 94548 kB
LowFree: 94548 kB
[update 02/14/2016] To plot the trends of free/unreclaimed memory usage and slab allocation, a tool memleak_plot.py is developed to visualize the memory leak.
2. Debug Memory Leak
After finding out the leaking slab, more info could be detected by tracing that slab. If the kernel was built with option CONFIG_SLUB_DEBUG
, the simplest way is to issue command echo 1 > /sys/kernel/slab/<leaking_slab>/trace
. Then the memory allocation trace for this slab will be printed to the console:
[ 375.201468] TRACE kmalloc-4096 alloc 0xe6be6000 inuse=8 fp=0x (null)
[ 375.207872] Backtrace:
[ 375.210309] [<c0012378>] (dump_backtrace+0x0/0x114) from [<c03a0a5c>] (dump_stack+0x18/0x1c)
[ 375.218712] r6:ef300480 r5:e6be6000 r4:c0d40c00 r3:c07004c4
[ 375.224367] [<c03a0a44>] (dump_stack+0x0/0x1c) from [<c03a2738>] (alloc_debug_processing+0xc8/0x164)
[ 375.233489] [<c03a2670>] (alloc_debug_processing+0x0/0x164) from [<c03a2d0c>] (__slab_alloc.isra.50.constprop.56+0x538/0x5dc)
[ 375.244767] r7:80080008 r6:00080007 r5:e6be6000 r4:c0d40c00
[ 375.250421] [<c03a27d4>] (__slab_alloc.isra.50.constprop.56+0x0/0x5dc) from [<c00e872c>] (__kmalloc_track_caller+0xbc/0x190)
[ 375.261605] [<c00e8670>] (__kmalloc_track_caller+0x0/0x190) from [<c031f96c>] (__alloc_skb+0x58/0xf4)
[ 375.270790] [<c031f914>] (__alloc_skb+0x0/0xf4) from [<c03200d4>] (dev_alloc_skb+0x40/0x64)
[ 375.279131] [<c0320094>] (dev_alloc_skb+0x0/0x64) from [<bf6bb504>] (__adf_nbuf_alloc+0x24/0xa4 [adf])
[ 375.288409] r4:ea5c8600 r3:00000004
[ 375.292158] [<bf6bb4e0>] (__adf_nbuf_alloc+0x0/0xa4 [adf]) from [<bf8d5f40>] (htt_rx_ring_fill_n+0x34/0x108 [umac])
[ 375.302405] r7:00000000 r6:000005b1 r5:ea5c8720 r4:ea5c8600
[ 375.308372] [<bf8d5f0c>] (htt_rx_ring_fill_n+0x0/0x108 [umac]) from [<bf8d6888>] (htt_rx_msdu_buff_replenish+0x54/0x6c [umac])
[ 375.319400] r8:bf927c04 r7:eaaee9c0 r6:eb5b3c00 r5:ea5c8720 r4:ea5c8600
[ 375.326429] [<bf8d6834>] (htt_rx_msdu_buff_replenish+0x0/0x6c [umac]) from [<bf8c6b24>] (ol_rx_indication_handler+0x7bc/0x8cc [umac])
[ 375.338081] r5:ea5c8600 r4:00000000
[ 375.341955] [<bf8c6368>] (ol_rx_indication_handler+0x0/0x8cc [umac]) from [<bf8d770c>] (htt_t2h_msg_handler_fast+0xac/0x280 [umac])
[ 375.353764] [<bf8d7660>] (htt_t2h_msg_handler_fast+0x0/0x280 [umac]) from [<bf8c02dc>] (CE_per_engine_service_each+0x178/0x4b4 [umac])
[ 375.365823] [<bf8c0164>] (CE_per_engine_service_each+0x0/0x4b4 [umac]) from [<bf8c3634>] (ath_tasklet+0x68/0x128 [umac])
[ 375.376507] [<bf8c35cc>] (ath_tasklet+0x0/0x128 [umac]) from [<c0064478>] (tasklet_action+0xa0/0x11c)
[ 375.385567] r6:e8b88000 r5:c435ef44 r4:c435ef40
[ 375.390159] [<c00643d8>] (tasklet_action+0x0/0x11c) from [<c006495c>] (__do_softirq+0x140/0x34c)
[ 375.398937] [<c006481c>] (__do_softirq+0x0/0x34c) from [<c0064d38>] (do_softirq+0x4c/0x58)
[ 375.407185] [<c0064cec>] (do_softirq+0x0/0x58) from [<c0064dd0>] (local_bh_enable_ip+0x8c/0xcc)
[ 375.415870] r4:e8b88000 r3:0000004a
[ 375.419431] [<c0064d44>] (local_bh_enable_ip+0x0/0xcc) from [<c03aa68c>] (_raw_spin_unlock_bh+0x54/0x58)
[ 375.428865] r5:00000304 r4:e9662c00
[ 375.432427] [<c03aa638>] (_raw_spin_unlock_bh+0x0/0x58) from [<c038a47c>] (packet_poll+0xa4/0xe4)
[ 375.441299] [<c038a3d8>] (packet_poll+0x0/0xe4) from [<c0316e34>] (sock_poll+0x24/0x28)
[ 375.449265] r7:ea9ffe40 r6:00000000 r5:e8b89c4c r4:e8b89c04
[ 375.454920] [<c0316e10>] (sock_poll+0x0/0x28) from [<c00fc914>] (do_sys_poll+0x20c/0x3e8)
[ 375.463074] [<c00fc708>] (do_sys_poll+0x0/0x3e8) from [<c00fcbb0>] (sys_poll+0x64/0xd0)
[ 375.471071] [<c00fcb4c>] (sys_poll+0x0/0xd0) from [<c000e7c0>] (ret_fast_syscall+0x0/0x30)
[ 375.479318] r6:0007a120 r5:00000000 r4:6b3f60a0
The first line could only be “TRACE kmalloc-4096 alloc” or “free”, which logs the entry address of this slab. So if the memory leak is very fast, it is possible to monitor all of the alloc/free slabs before the system out of memory. Then find out addresses that never freed, analyze the call traces, and hopefull we could detect the problematic module or functions.
3. kmemleak
kmemleak
is a kernel space package to detect potential memory leak and periodically print the call stack. To build it into kernel, CONFIG_DEBUG_KMEMLEAK
in “Kernel hacking” has to be enabled. For some systems kmemleak
may not work due to the small buffer size for early log, so you also need to set CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE
to a larger number, e.g. 20000.
kmemleak
also relies on debugfs
. To automatically detect memory leak after boot, you need to make sure debugfs
is mounted by default - you can add following line to /etc/init.d/S10boot
:
grep -q debugfs /proc/filesystems && mount -t debugfs none /sys/kernel/debug
To check if kmemleak
is working, cat /sys/kernel/debug/kmemleak
. By default kmemleak
scans memory every 10 minutes and prints the number of new unreferenced objects found. To trigger an intermediate memory scan, echo scan > /sys/kernel/debug/kmemleak
.