Profiling in Erlang

Wed, May 25, 2016

The list of available instruments:

fprof, eprof, cover, cprof (shipped with erlang)
top-like utilities: third party entop и etop
eep
recon
eflame and eflame2
eper
timer:tc/3, statistics/1
latency_histogram_tracer.erl (requires folsom_metrics)

Note: here are not given the tools, which are used rarely (percept), or on the basis of which more powerful things are built (ttb).

Introduction

An overview of all the tools would be very time-consuming. And I doubt that normal Erlang programmer may need all of them anyway.

Far more important is to know four or five, and based on the results, obtained during profiling, be able to find bottlenecks and decide, whenever they should be fixed or not.

fprof, cprof

Like it or not, you need to know one or two standard tools.

The best place to start: http://erlang.org/doc/efficiency_guide/profiling.html

fprof’s output can be converted into the format suitable for kcachegrind (see below) using the erlgrind.

1> fprof:start().
2> fprof:trace([start, {procs, all}]). % trace all the processes (significantly slows down the system)
3> fprof:trace([stop]).
4> fprof:profile().
5> fprof:analyse([totals, {dest, "fprof.analysis"}]).
6> fprof:stop().

You can find explanations for each command above in the standard docs and also here.

1> cprof:start().
2> cprof:pause().
3> cprof:analyse().
4> cprof:analyse(cprof). % analysing calls of a specific module
5> cprof:stop().

eep

The sequence of steps is quite simple and described in the project’s README.

We start the tracing:

1> eep:start_file_tracing("appname"), timer:sleep(10000), eep:stop_tracing().

Then copy the resulting file: appname.trace to our machine (before that we gzip it because of it’s size). On our machine, we transform it into the text format, suitable for kcachegrind.

1> eep:convert_tracing("appname").

If you still do not have kcachegrind, go install it:

In Ubuntu:

$ sudo apt-get install kcachegrind

After the conversion has finished, we may begin analyzing the results.

$ kcachegrind callgrind.out.appname

3 important points:

group by “Source File” (or by “ELF Object” if you target is a particular PID);
percentage (“Relative”) is better than absolute time;
percentage relative to its parent (“Relative to Parent”) is even better, otherwise, you can’t understand the damn thing.

“No Grouping” and “Function Cycle” do not give a clear picture of what is happening. The first option leads to the top lines: gen_server:decode_msg, gen_server:loop, and only indicates that more time spent in gen_server:* - thank you, captain! I didn’t understand the results of the second option.

eflame2

1> eflame2:write_trace(global_calls_plus_new_procs, "/tmp/ef.test.0", all, 10*1000).
2> eflame2:format_trace("/tmp/ef.test.0", "/tmp/ef.test.0.out").

According to the docs, we need to run tracing in a separate process. But when I’d tried to do so, I got “Segmentation fault”.

$ cat /tmp/ef.test.0.out | ./flamegraph.riak-color.pl > output.svg

timer:tc

What is there to talk? Recently, I came across an unexpected behavior. Take a look at the following code:

{Time, Result} = timer:tc(fun() ->
    {Time1, Result1} = timer_tc(fun() -> do_smth1() end)
    {Time2, Result2} = timer_tc(fun() -> do_smth2() end)
end)

Time should be equal to Time1 + Time2, right?

Well, not in reality. Time is often greater than Time1 + Time2 (even if I measure it, say 100 times). I think it is because timer:tc uses os:timestamp and Time includes some switching (when scheduler runs other process).

Useful links:

Profiling in Erlang

Introduction

fprof, cprof

eep

eflame2

timer:tc

Comments