Skip to content

Latest commit

 

History

History
211 lines (160 loc) · 10.1 KB

LinuxTraceLogCapture.md

File metadata and controls

211 lines (160 loc) · 10.1 KB

Overview - Linux Trace and Log Capture

This provides a quick start on how to capture logs on Linux.

Logs:

  • LTTng system trace (requires customized image for boot scenario)
  • perf
  • cloud-init.log
    • Automatically logged by cloud-init to /var/log/cloud-init.log
  • dmesg.iso.log
    • Standard auto dmesg log doesn't output in absolute time needed for correlation with other logs
    • dmesg --time-format iso > dmesg.iso.log
  • waagent.log
    • Automatically logged by WaLinuxAgent to /var/log/waagent.log
    • LogLevel can be turned more verbose in custom image
    • /etc/waagent.conf
    • Logs.Verbose=y
  • AndroidLogcat
    • Default log format should be supported
    • Basic durations are supported/parsed on production builds / logs.
      • E.g. "SurfaceFlinger: Boot is finished (9550 ms)"
    • Enhanced durations are supported/parsed, if the logging includes init timing or *Timing logs such as SystemServerTiming. Perf durations are available in-development with userdebug builds.
      • E.g "SystemServerTimingAsync: InitThreadPoolExec:prepareAppData took to complete: 149ms"
    • Logcat can optionally log the year but defaults to not. If the year is not provided the year is assumed to be the current year on the analysis machine.
      • If this is incorrect, for example trace was captured in 2021, but analyzed in 2022 then the year will be interpreted incorrectly.
      • This applies only if you need correct absolute timestamps, as relative timestamps will still be good.
      • Manual workaround: In the logcat log search/replace to add year. E.g. "12-21" -> "12-21-2021"
    • Logs are logged in local time zone and default assumed to be loaded in same time zone as captured
    • To provide a hint on the timezone if in a different zone
      • Place a "utcoffset.txt" file in the same folder as the trace. Place the UTC+Offset in the file as a double in hours.
      • E.g For India Standard Time (IST) offset is UTC+5.5 so place "5.5" in the file. If logs are in UTC place 0 in the file

LTTng

LTTng (Kernel CPU scheduling, Processes, Threads, Block IO/Disk, Syscalls, File events, etc)

LTTng Docs LTTng is an open source tracing framework for Linux. Installation instructions for your Linux distro can be found in the docs.

Supports:

  • Threads and Processes
  • Context Switches / CPU Usage
  • Syscalls
  • File related events
  • Block IO / Disk Activity
  • Diagnostic Messages

Once you have everything set up you just need to decide what kind of information you are looking for and begin tracing.

In this example we are looking at process scheduler events. We might use this to determine process lifetime and identify dependencies. You can learn more about what kind of "events" you can enable here.

 root@xenial:~/tracing# lttng list --kernel # Gives a list of all Kernel events you can trace
 root@xenial:~/tracing# lttng list --kernel --syscall # Gives a list of all traceable Linux system calls

 root@xenial:~/tracing# lttng create my-kernel-session --output=/tmp/my-kernel-trace
 root@xenial:~/tracing# lttng enable-event --kernel sched_process*
 root@xenial:~/tracing# lttng start
 root@xenial:~/tracing# lttng stop
 root@xenial:~/tracing# lttng destroy

Recommended LTTng Tracing

Install the tracing software:

Example on Ubuntu:

$ sudo apt-get install lttng-tools lttng-modules-dkms liblttng-ust-dev

For more examples see LTTng Download docs

Create a session:

$ sudo lttng create my-kernel-session --output=lttng-kernel-trace

Add the desired events to be recorded:

$ sudo lttng enable-event --kernel block_rq_complete,block_rq_insert,block_rq_issue,printk_console,sched_wak*,sched_switch,sched_process_fork,sched_process_exit,sched_process_exec,lttng_statedump*
$ sudo lttng enable-event --kernel --syscall �-all

Add context fields to the channel:

$ sudo lttng add-context --kernel --channel=channel0 --type=tid
$ sudo lttng add-context --kernel --channel=channel0 --type=pid
$ sudo lttng add-context --kernel --channel=channel0 --type=procname

Start the recording:

$ sudo lttng start

Save the session:

$ sudo lttng regenerate statedump <- Better correlation / info in Microsoft-Performance-Tools-Linux
$ sudo lttng stop
$ sudo lttng destroy

Perf.data

You can collect and view Linux kernel-mode and user-mode tracepoints in the perf.data file format.

  • Select existing tracepoints that you want to collect, or write your own programs that generate tracepoint events.
  • Use the Linux perf tool or some other tool to collect tracepoint events into perf.data files.
  • Use the PerfData extension to view the tracepoints.

Selecting tracepoints

The Linux kernel and the kernel modules generate many useful tracepoints that you can collect. Look in /sys/kernel/tracing/events for the tracepoints that are available to you.

In addition, Linux 6.4 adds support for generating user_events tracepoints from user-mode programs.

Collecting tracepoints

The Linux perf tool supports collecting tracepoint events using perf record.

  • Install perf from the linux-perf or linux-tools package.
    • Note that some perf packages use a wrapper script to help you match the running kernel version with a version-specific build of the perf tool, e.g. perf_VERSION. For collecting tracepoints, the version doesn't need to match. If you have version mismatch problems, you can safely bypass the wrapper script and directly use the perf_VERSION tool.
  • Install the libtraceevent1 package to enable perf support for tracepoints.
  • Use perf record to collect traces, e.g. perf record -o MyFile.perf.data -k monotonic -e "event1_group:event1_name,event2_group:event2_name"
    • Use -k monotonic to include clock offset information in the data file.

You can also use other tools that generate perf.data-compatible files.

  • libtracepoint-control includes a library for configuring tracepoint collection sessions and collecting perf.data files.
  • tracepoint-collect is a simple tool that collects tracepoint events into perf.data files.

Perf.data.txt

Perf is used to collect CPU Sampling (cpu-clock) events as LTTng doesn't support capturing these yet. Note: Stacks may require symbol setup.

The perf CPU Sampling analysis plugin uses perf.data.txt files as input.

perf CPU Sampling(cpu-clock)

If you want to trace .NET Core then you need perfcollect which capture CPU sampling and more

Perf Install

$ sudo apt-get install linux-tools-common

User-Mode (UM) Symbols Install

KM symbols are automatically resolved. If you wish to resolve UM cpu sample functions and stacks, you may need to install debug packages for the binary you are profiling

For example, Debug Symbol Packages on Ubuntu

Record a trace

$ sudo /usr/bin/perf record -g -a -F 999 -e cpu-clock,sched:sched_stat_sleep,sched:sched_switch,sched:sched_process_exit -o perf_cpu.data

Stop the Trace

$ Ctrl-C

Convert trace to text format

This is to useful along-side the CTF trace to resolve UM IP/Symbols. Similar to what perfcollect uses

$ sudo perf inject -v -s -i perf_cpu.data -o perf.data.merged

# There is a breaking change where the capitalization of the -f parameter changed.
$ sudo perf script -i perf.data.merged -F comm,pid,tid,cpu,time,period,event,ip,sym,dso,trace > perf.data.txt

if [ $? -ne 0 ]
then
    $ sudo perf script -i perf.data.merged -f comm,pid,tid,cpu,time,period,event,ip,sym,dso,trace > perf.data.txt
fi

# If the dump file is zero length, try to collect without the period field, which was added recently.
if [ ! -s perf.data.txt ]
then
    $ sudo perf script -i perf.data.merged -f comm,pid,tid,cpu,time,event,ip,sym,dso,trace > perf.data.txt
fi

Capture trace timestamp start

Perf.data.txt only contains relative timestamps. If you want correct absolute timestamps in UI then you will need to know the trace start time.

$ sudo perf report --header-only -i perf_cpu.data | grep "captured on"

Place the "captured on" timestamp for example "Thu Oct 17 15:37:36 2019" in a timestamp.txt file next to the trace folder. The timestamp will be interpreted as UTC

Transferring the files to Windows UI (optional)

You then need to transfer the perf files to a Windows box where WPA runs. The most important file is perf.data.txt

$ sudo chmod 777 -R perf*
  • Copy files from Linux to Windows box with WinSCP/SCP OR
$ tar -czvf perf_cpu.tar.gz perf*
  • (Optional if you want absolute timestamps) Place timestamp.txt next to perf.data.txt
  • Open perf.data.txt with WPA

Presentations

If you want to see a demo or get more in-depth info on using these tools check out a talk given at the Linux Tracing Summit:

Linux & Windows Perf Analysis using WPA, (slides) (video)