#进程占用内存 There are a lot of different ways in which process memory can grow. Most interesting cases will be related to a few common cases: process leaks (as in, you’re leaking processes), specific processes leaking their memory, and so on. It’s possible there’s more than one cause, so multiple metrics are worth investigating. Note that the process count itself is skipped and has been covered before.
进程内存增长也有各种各样的途径。大部分有意思的原因都是以下这几种情况引起的:进程泄露(就和字面意思一样,你在泄露进程),特定的进程内存泄漏等。可能不只一个原因,所以你可以多个指标一起分析。注意,我们现在跳过了进程数的分析,因为前面已讲过啦。Is the global process count indicative of a leak? If so, you may need to investigate unlinked processes, or peek inside supervisors’ children lists to see what may be weird-looking.
Finding unlinked (and unmonitored) processes is easy to do with a few basic commands:
使用下面的命令可以非常容易就能找到unlinked(或unmonitored)的进程:
-----------------------------------------------------
1> [P || P <- processes(),
[{_,Ls},{_,Ms}] <- [process_info(P, [links,monitors])],
[]==Ls, []==Ms].
-----------------------------------------------------
This will return a list of processes with neither. For supervisors, just fetching
supervisor: count_children(SupervisorPid Or Name) and seeing what looks normal can
be a good pointer.
The per-process memory model is briefly described in Subsection 7.3.2, but generally speaking, you can find which individual processes use the most memory by looking for their
memory attribute. You can look things up either as absolute terms or as a sliding window.
For memory leaks, unless you’re in a predictable fast increase, absolute values are usually those worth digging into first:
-----------------------------------------------------
1> recon:proc_count(memory, 3).
[{<0.175.0>,325276504,
[myapp_stats,
{current_function,{gen_server,loop,6}},
{initial_call,{proc_lib,init_p,5}}]},
{<0.169.0>,73521608,
[myapp_giant_sup,
{current_function,{gen_server,loop,6}},
{initial_call,{proc_lib,init_p,5}}]},
{<0.72.0>,4193496,
[gproc,
{current_function,{gen_server,loop,6}},
{initial_call,{proc_lib,init_p,5}}]}]
-----------------------------------------------------
Attributes that may be interesting to check other than memory may be any other fields in Subsection 5.2.1, including message_queue_len, but memory will usually encompass all other types.
It is very well possible that a process uses lots of memory, but only for short periods of time.
For long-lived nodes with a large overhead for operations, this is usually not a problem, but
whenever memory starts being scarce, such spiky behaviour might be something you want
to get rid of.
Monitoring all garbage collections in real-time from the shell would be costly. Instead,
setting up Erlang’s system monitor 7 might be the best way to go at it.
Erlang’s system monitor will allow you to track information such as long garbage collection periods and large process heaps, among other things. A monitor can temporarily
be set up as follows:
在shell里面实时监测所有垃圾回收情况是代价是非常大的。取而代之,设置Erlang系统本身的monitor7可能最处理它的最好方式。Erlang系统监控会允许你追踪长周期的垃圾回收和大进程的堆情况,此外,监控可以用以下方式来临时设置:
-----------------------------------------------------
1> erlang:system_monitor().
undefined
2> erlang:system_monitor(self(), [{long_gc, 500}]).
undefined
3> flush().
Shell got {monitor,<4683.31798.0>,long_gc,
[{timeout,515},
{old_heap_block_size,0},
{heap_block_size,75113},
{mbuf_size,0},
{stack_size,19},
{old_heap_size,0},
{heap_size,33878}]}
5> erlang:system_monitor(undefined).
{<0.26706.4961>,[{long_gc,500}]}
6> erlang:system_monitor().
undefined
-----------------------------------------------------
The first command checks that nothing (or nobody else) is using a system monitor yet — you don’t want to take this away from an existing application or coworker.
The second command will be notified every time a garbage collection takes over 500 milliseconds. The result is flushed in the third command. Feel free to also check for
{large_heap, NumWords} if you want to monitor such sizes. Be careful to start with large
values at first if you’re unsure. You don’t want to flood your process’ mailbox with a bunch
of heaps that are 1-word large or more, for example.
Command 5 unsets the system monitor (exiting or killing the monitor process also frees
it up), and command 6 validates that everything worked.
You can then find out if such monitoring messages tend to coincide with the memory increases that seem to result in leaks or overuses, and try to catch culprits before things are too bad. Quickly reacting and digging into the process (possibly with recon:info/1)
may help find out what’s wrong with the application.
第二个命令作用:设置垃圾回收每500ms就通知一下shell,shell得到结果要通过第三个命令刷新一下才能看到。如果你想检测大小,那么可以随时检查**{large_heap, NumWords}**。如果你不确定,那么就不要在开始时就监测lager values。你肯定也不想你的信箱被大量的大于1-word的堆撑爆吧。
第五个命令作用:释放system monitory(退出或杀掉你的进程也可以释放),第六个命令作用:验证下一切工作正常。 通过这样监视消息,找到内存增长原因往往是内存泄漏或过度使用,试图在一切变得糟糕之前就找出源头。快速地反应并深入到进程当中(可能会使用到recon:info/1)可能会有助于找到application出了什么问题。
[7] http://www.erlang.org/doc/man/erlang.html#system_monitor-2