Profiling a .NET Core Application on Linux | All Your Base Are Belong To Us

标签: | 发表时间:2018-10-21 16:23 | 作者:
出处:http://blogs.microsoft.co.il

In the same vein of  my previous post on analyzing core dumps of .NET Core applications on Linux, let’s take a look at what it takes to do some basic performance profiling. When starting out, here are a few things I wrote down that would be nice to do:

  • CPU profiling (sampling) to see where the CPU bottlenecks are
  • Grabbing stacks for interesting system events (file accesses, network, forks, etc.)
  • Tracing memory management activity such as GCs and object allocations
  • Identifying blocked time and the block and wake-up reasons

With this task list in mind, let’s get started!

Collecting Call Stacks of .NET Core Processes

Generally speaking, a .NET Core application runs as a regular Linux process. There’s nothing particularly fancy involved, which means we can use  perf and ftrace and even  BPF-based tools to monitor performance. There’s just one catch: resolving symbols for call stacks. Here’s what happens when we profile a CPU-intensive application, running with defaults, using perf:

# perf record -F 97 -ag
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.364 MB perf.data (789 samples) ]
# perf report

As you can see, debugging symbols are missing for pretty much everything under the dotnet process, so we only get addresses rather than method names. Fortunately, .NET Core ships with a knob that can be turned in order to get a perf map file generated in /tmp, which perf can then find and use for symbols. To turn on the knob,  export COMPlus_PerfMapEnabled=1:

$ export COMPlus_PerfMapEnabled=1
$ dotnet run &
[1] 23503

$ ls /tmp/perf*
/tmp/perf-23503.map  /tmp/perf-23517.map  /tmp/perfinfo-23503.map  /tmp/perfinfo-23517.map

$ head -2 /tmp/perfinfo-23517.map
ImageLoad;/usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/System.Private.CoreLib.ni.dll;{14b5688c-fe9a-4a0d-a0d1-b3af5439e23b};
ImageLoad;/home/vagrant/Runny/bin/Debug/netcoreapp1.1/Runny.dll;{ebb3ede4-dc41-44f4-93d3-152cd0b54ac0};

$ head -2 /tmp/perf-23517.map
00007FABB90D4480 2e instance bool [System.Private.CoreLib] dynamicClass::IL_STUB_UnboxingStub()
00007FABB90D44D0 2e instance System.__Canon /* MT: 0x00007FABB8F60318 */ [System.Private.CoreLib] dynamicClass::IL_STUB_UnboxingStub()

Equipped with these files, we can repeat the perf recording and then the report looks a bit better, with symbols starting to appear, such as  ConsoleApplication.Primes::CountPrimes. Note that because the .NET process wrote the perf map file, you might need to tell perf to ignore the fact that it’s not owned by root by using the -f switch ( perf report -f), or simply chown it.

Although, who reads perf reports anyway — let’s generate a flame graph!

Getting a Flame Graph

Well,  a flame graph is a flame graph, nothing special about .NET Core here once we have the right data in our perf files. Let’s go:

# git clone --depth=1 https://github.com/BrendanGregg/FlameGraph
...
# perf script | FlameGraph/stackcollapse-perf.pl | FlameGraph/flamegraph.pl > flame.svg

Here’s a part of the generated flame graph, looking pretty good:

If you look closely, you’ll notice that some symbols are still missing — notably, we don’t have any symbols for libcoreclr.so. And that’s just the way it is:

$ objdump -t $(find /usr/share/dotnet -name libcoreclr.so)
/usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/libcoreclr.so:     file format elf64-x86-64

SYMBOL TABLE:
no symbols

If you build .NET Core from source, you can build with debug information, but that’s not what we get by default from the  Microsoft package repository.

Stacks For Other Events

Now that we have the necessary building blocks for getting symbols resolved, we can of course move on to other events (and use other tools, too).  For example, let’s trace context switches to see where our threads are getting blocked:

# perf record -e sched:sched_switch -ag
...
# perf report -f

(This is a fairly typical stack for where the thread gets preempted to let another thread run, even though it hasn’t called any blocking API.)

Or, let’s try some of my favorite tools from  BCC. For example, let’s trace file opens:

# opensnoop
PID    COMM               FD ERR PATH
1      systemd            17   0 /proc/955/cgroup
24675  dotnet              3   0 /etc/ld.so.cache
24675  dotnet              3   0 /lib/x86_64-linux-gnu/libdl.so.2
24675  dotnet              3   0 /lib/x86_64-linux-gnu/libpthread.so.0
24675  dotnet              3   0 /usr/lib/x86_64-linux-gnu/libstdc++.so.6
...
24689  dotnet             47   0 /home/vagrant/Runny/perfcollect
24689  dotnet             47   0 /home/vagrant/Runny/opens.txt
24689  dotnet             47   0 /home/vagrant/Runny/project.lock.json
24689  dotnet             47   0 /home/vagrant/Runny/.Program.cs.swp
24689  dotnet             47   0 /home/vagrant/Runny/Program.cs
24689  dotnet             -1  13 /home/vagrant/Runny/perf.data.old

We can conclude that everything more or less works. I dare say this is even a little easier than the JVM situation, where we need an external agent to generate debugging symbols. On the other hand, you have to run the .NET Core process with the  COMPlus_PerfMapEnabled environment variable at initialization time — you can’t generate the debugging information after the process has already started without it.

But then I tried one more thing. Let’s try to aggregate file read stacks by using the  stackcount tool from  BCC to probe  read in libpthread (which is where .NET Core’s syscalls are routed through on my box). The result is not very pretty:

$ stackcount pthread:read -p 29751
Tracing 1 functions for "pthread:read"... Hit Ctrl-C to end.
  read
  [unknown]
  [unknown]
  [unknown]
  [unknown]
  void [Runny] ConsoleApplication.Program::Main(string[])
  [unknown]
  [unknown]
  [unknown]
  [unknown]
  [unknown]
  coreclr_execute_assembly
  coreclr::execute_assembly(void*, unsigned int, int, char const**, char const*, unsigned int*)
  run(arguments_t const&)
  corehost_main
... snipped for brevity ...
    16

The [unknown] frames prior to Main are not very surprising — this is libcoreclr.so, and we already know it doesn’t ship with debuginfo. But the top-most frames are disappointing — this is a managed assembly, with managed frames, and there’s no reason why we shouldn’t be able to trace them.

To figure out where these frames are coming from, I’m going to need addresses. With the -v switch,  stackcountprints addresses in addition to symbols:

# stackcount pthread:read -v -p 29751
Tracing 1 functions for "pthread:read"... Hit Ctrl-C to end.
^C
  7f77b10b1680     read
  7f773651f267     [unknown]
  7f773651e8d5     [unknown]
  7f773651e880     [unknown]
  7f773651846a     [unknown]  7f77364bfb5d     void [Runny] ConsoleApplication.Program::Main(string[])
...

All right, so which module is 7f773651f267 in, for example? Let’s take a look at the loaded modules (I’m keeping only executable regions):

$ cat /proc/29751/maps | grep 'xp '
...
7f77364bf000-7f77364c6000 rwxp 00000000 00:00 0
7f7736502000-7f7736530000 r-xp 00003000 fd:00 787585                     /usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/System.Console.dll
7f7736534000-7f7736564000 r-xp 00003000 fd:00 787603                     /usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/System.IO.FileSystem.dll
7f7736577000-7f7736578000 r-xp 00002000 fd:00 787665                     /usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/System.Threading.Thread.dll
7f773657a000-7f7736587000 r-xp 00002000 fd:00 787606                     /usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/System.IO.dll
7f773658a000-7f773659a000 r-xp 00002000 fd:00 787668                     /usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/System.Threading.dll
7f773659d000-7f773659e000 r-xp 00002000 fd:00 787658                     /usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/System.Text.Encoding.dll
...

OK, so we seem to be making progress — the desired address is clearly in the range that belongs to System.Console.dll. But, being a managed assembly, we’re not going to find any debug information in it:

$ file /usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/System.Console.dll
/usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/System.Console.dll: PE32+ executable (DLL) (console) Mono/.Net assembly, for MS Windows

$ objdump -tT /usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/System.Console.dll
objdump: /usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/System.Console.dll: File format not recognized

Hmpf. So how are we supposed to get symbolic information for these addresses?

If you look online, you’ll find that there’s a tool on the .NET Core repos called perfcollect — essentially a Bash script for collecting performance information from .NET Core processes running on Linux. Let’s take a look.

The perfcollect Tool

The  perfcollect tool is fairly self-contained, and installs its own dependencies, most notably perf and  lttng — .NET Core on Linux uses LTTng to generate various events, including garbage collections, object allocations, thread starts, assembly loads, and many others. Then, perfcollect follows your instructions and runs perf and lttng to collect CPU sampling events, package them up to a big zip file, and hand that to you.

What are you supposed to do with that zip file?  Open it on Windows, apparently, using PerfView. Now,  I love PerfView, but a face palm is the only reasonable reaction to hearing this. What’s more, perfcollect does a bunch of work that you don’t really need if you plan to analyze the results on the same machine. But there’s  one thing it does which sounds very relevant:

WriteStatus "Generating native image symbol files"

# Get the list of loaded images and use the path to libcoreclr.so to find crossgen.
# crossgen is expected to sit next to libcoreclr.so.
local buildidList=`$perfcmd buildid-list | grep libcoreclr.so | cut -d ' ' -f 2`

That definitely sounds good! Turns out that .NET Core writes out an additional map file, named /tmp/perfinfo-$PID.map, which contains a list of image load events for your application’s assemblies. perfcollect then parses that list and invokes the crossgen tool to generate an additional perf map for each assembly, which can be fed into PerfView on the Windows side. Here’s what the perfinfo file looks like:

$ head -4 /tmp/perfinfo-29751.map
ImageLoad;/usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/System.Private.CoreLib.ni.dll;{14b5688c-fe9a-4a0d-a0d1-b3af5439e23b};
ImageLoad;/home/vagrant/Runny/bin/Debug/netcoreapp1.1/Runny.dll;{319d161b-f17e-44f6-a210-f297df920194};
ImageLoad;/usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/System.Runtime.dll;{819d412e-d773-4dbb-8d01-20d412b6cf09};
ImageLoad;/usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/mscorlib.dll;{080dac22-6a0e-41ae-85fb-fb79cc07911b};

Now, that’s what crossgen is  supposed to do. And according to the comment above, crossgen is also  supposed to be in the same folder as libcoreclr.so. But it isn’t:

$ find /usr/share/dotnet -name crossgen

That’s right, no results. Looking online, it seems that crossgen is generated as part of a .NET Core build, and part of the CoreCLR runtime NuGet package, but it’s not part of the pre-packaged binaries you get from the Microsoft package repositories. But with a little effort  borrowed from the corefx repo, we can fetch our own crossgen:

$ export CoreClrVersion=1.1.0
$ export Rid=$(dotnet --info | sed -n -e 's/^.*RID:[[:space:]]*//p')
$ echo "{\"frameworks\":{\"netcoreapp1.1\":{\"dependencies\":{\"Microsoft.NETCore.Runtime.CoreCLR\":\"$CoreClrVersion\", \"Microsoft.NETCore.Platforms\": \"$CoreClrVersion\"}}},\"runtimes\":{\"$Rid\":{}}}" > project.json
$ dotnet restore ./project.json --packages .
... output omitted for brevity ...
$ ls ./runtime.$Rid.Microsoft.NETCore.Runtime.CoreCLR/$CoreClrVersion/tools
crossgen

All right! So we have crossgen, at which point we can try it out to generate debug information for System.Console.dll, or any other assembly we need, really. Here goes:

$ crossgen /Platform_Assemblies_Paths /usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0 \
           /CreatePerfMap . /usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/System.Console.dll

Microsoft (R) CoreCLR Native Image Generator - Version 4.5.22220.0
Copyright (c) Microsoft Corporation.  All rights reserved.

Successfully generated perfmap for native assembly '/usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/System.Console.dll'.

What does this perfmap file look like? The same as any other perfmap, except the addresses are not absolute — they are offsets from the load address of that module:

$ head -4 System.Console.ni.\{3b33b403-e8c1-44af-a7fb-369b2603f2a3\}.map
0000000000017590 58 void [System.Console] Interop::ThrowExceptionForIoErrno(valuetype Interop/ErrorInfo,string,bool,class [System.Runtime]System.Func`2<valuetype Interop/ErrorInfo,valuetype Interop/ErrorInfo>)
00000000000175F0 4d void [System.Console] Interop::CheckIo(valuetype Interop/Error,string,bool,class [System.Runtime]System.Func`2<valuetype Interop/ErrorInfo,valuetype Interop/ErrorInfo>)
0000000000017640 82 int64 [System.Console] Interop::CheckIo(int64,string,bool,class [System.Runtime]System.Func`2<valuetype Interop/ErrorInfo,valuetype Interop/ErrorInfo>)
00000000000176D0 17 int32 [System.Console] Interop::CheckIo(int32,string,bool,class [System.Runtime]System.Func`2<valuetype Interop/ErrorInfo,valuetype Interop/ErrorInfo>)

Well, let’s see if we can at least resolve our desired address by using this approach. If you go back above, we were chasing the address 7f773651f267, loaded into System.Console.dll. First, let’s find the base address where System.Console.dll is loaded:

$ cat /proc/29751/maps | grep System.Console.dll | head -1
7f77364ff000-7f7736500000 r--p 00000000 fd:00 787585                     /usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/System.Console.dll

The offset, then, is:

$ echo 'ibase=16;obase=10;7F773651F267-7F77364FF000' | bc
20267

So now we need to look for this offset in the System.Console map file. The closest match is here:

0000000000020150 286 instance valuetype System.ConsoleKeyInfo [System.Console] System.IO.StdInReader::ReadKey(bool&)

With that, we have one frame resolved! There are only a few more  �� This process begs to be automated. It would be great to automatically run crossgen, generate the map files with the relative addresses, convert them to absolute addresses, and merge them with the main /tmp/perf-PID.map file that other tools know and love. Read on!

dotnet-mapgen.py

Well, I wrote  a small script called dotnet-mapgen.py that automates the above steps and produces a single, unified map file that contains both JIT-compiled addresses and addresses that lie in crossgen’d (AOT-compiled) modules, such as System.Console.dll. The script has two modes:

$ ./dotnet-mapgen.py generate $(pgrep -n dotnet)
couldn't find crossgen, trying to fetch it automatically...
crossgen succesfully downloaded and placed in libcoreclr's dir
crossgen map generation: 15 succeeded, 2 failed

In the “generate” mode, the script first locates crossgen (downloading it if necessary, using the NuGet restore approach shown above), and then runs crossgen on all the managed assemblies loaded into the target process. The 2 failures in the above output are for assemblies that weren’t AOT-compiled. Note that this generation step can be done once, and the map files retained for subsequent runs — unless you change the set of AOT-compiled assemblies loaded into your process.

$ ./dotnet-mapgen.py merge $(pgrep -n dotnet)
perfmap merging: 14 succeeded, 3 failed

In the “merge” mode, the script calculates absolute addresses for all the symbols generated in the previous step, and concatenates this information to the main /tmp/perf-PID.map file for the target process.

There’s just one final problem. Turns out, perf refuses to use the map file for symbols that are in memory regions that belong to a module (in our case above, System.Console.dll). And there’s no way to convince perf that it should try to resolve such addresses using the map file. Fortunately, I have a bit more control over BCC tools, so I proposed a  PR for retrying symbol resolution using a map file if the symbol wasn’t resolved using the original module. With this patch, here’s  stackcount‘s output:

# stackcount ... pthread:read
Tracing 1 functions for "pthread:read"... Hit Ctrl-C to end.
^C
  read
  instance valuetype System.ConsoleKeyInfo [System.Console] System.IO.StdInReader::ReadKey(bool&)
  instance string [System.Console] System.IO.StdInReader::ReadLine(bool)
  instance string [System.Console] System.IO.StdInReader::ReadLine()
  string [System.Console] System.Console::ReadLine()
  void [Runny] ConsoleApplication.Program::Main(string[])
...
  16

Note how all symbols are now resolved to managed frames: the JIT-compiled Program::Main, and the AOT-compiled Console::ReadLine, StdInReader::ReadLine, and everything else.

Once this support lands in BCC, we can also do full-fidelity profiling with the  profile tool, stack tracing with  traceand  stackcount, blocked time analysis using  offcputime/ offwaketime, and a variety of other tools. For most purposes, the perf-based workflow shown in the beginning of the post is a poorer alternative, if you can run a recent-enough kernel with BPF support.

So Where Are We?

  • We can use a variety of Linux performance tools to monitor .NET Core processes on Linux, including perf and BCC tools
  • To resolve stacks and symbols in general, the COMPlus_PerfMapEnabled environment variable needs to be set to 1 prior to running the .NET Core process
  • Some binaries still ship out of the box with no debug information (notably libcoreclr.so)
  • Some managed assemblies aren’t included in the dynamic /tmp/perf-PID.map file because they were compiled ahead-of-time (using crossgen), and don’t contain debugging information
  • For these assemblies, crossgen can generate map files that are sort-of useful, but can’t be used directly with perf
  • The dotnet-mapgen script can automate the process of generating map files for AOT-compiled assemblies and merging them into the main map file for analysis
  • BCC tools will be updated to support this scenario and enable full-fidelity tracing

In a subsequent post, I also plan to explore the LTTng traces to see if we can trace garbage collections, object allocations, managed exceptions, and other events of interest.


相关 [profiling net core] 推荐:

Profiling a .NET Core Application on Linux | All Your Base Are Belong To Us

- -
In the same vein of  my previous post on analyzing core dumps of .NET Core applications on Linux, let’s take a look at what it takes to do some basic performance profiling.

Debugging .NET Core on Linux with LLDB | RayDBG

- -
The LLDB debugger is conceptually similar to the native Windows debugging tools in that it is a low level and command live driven debugger. Part of the reason the .NET Core team chose the LLDB debugger was for its extensibility points that allowed them to create the SOS plugin which can be used to debug .NET core applications.

.Net Core 全局性能诊断工具

- - IT瘾-dev
现在.NET Core 上线后,不可避免的会出现各种问题,如内存泄漏、CPU占用高、接口处理耗时较长等问题. 这个时候就需要快速准确的定位问题,并解决. 这时候就可以使用.NET Core 为开发人员提供了一系列功能强大的诊断工具. 接下来就详细了解下:.NET Core 全局诊断工具. dotnet-counters 是一个性能监视工具,用于初级运行状况监视和性能调查.

Analyzing a .NET Core Core Dump on Linux | All Your Base Are Belong To Us

- -
I thought this walkthrough might be useful if you find yourself in the same boat, because, to be quite honest, I didn’t find it trivial.. A lot of distros will have something preconfigured, but the simplest approach is to just put a file name in the /proc/sys/kernel/core_pattern file:.

Debugging .NET Core app from a command line on Linux - Dots and Brackets: Code Blog

- -
Million years ago, way before the ice age, I was preparing small C++ project for “Unix Programming” university course and at some point had to debug it via command line.

.Net Core in Docker - 在容器内编译发布并运行 - Agile.Zhou - 博客园

- -
Docker可以说是现在微服务,DevOps的基础,咱们.Net Core自然也得上Docker. .Net Core发布到Docker容器的教程网上也有不少,但是今天还是想来写一写. 你搜.Net core程序发布到Docker网上一般常见的有两种方案:. 1、在本地编译成Dll文件后通过SCP命令或者WinSCP等工具上传到服务器上,然后构建Docker镜像再运行容器.

为什么 web 开发人员需要迁移到. NET Core, 并使用 ASP.NET Core MVC 构建 web 和 webservice/API - 张善友 - 博客园

- -
2018 .NET开发者调查报告: .NET Core 是怎么样的状态,这里我们看到了还有非常多的.net开发人员还在观望,本文给大家一个建议. 这仅代表我的个人意见, 我有充分的理由推荐.net 程序员使用. 有些人可能不同意我的观点, 但是分享想法和讨论它是好的. .net 程序员或他们所在的团队总有各种理由说他们的系统还在使用旧系统, 这显然是企业开发人员的事情.

【实验手册】使用Visual Studio Code 开发.NET Core应用程序 - 张善友 - 博客园

- -
开源和跨平台开发是Microsoft 的当前和将来至关重要的策略. .NET Core已开源,同时开发了其他项来使用和支持新的跨平台策略. .NET Core 2.0 目前已经正式发布,是适用于针对 Web 和云构建跨平台应用程序的最新开源技术,可在 Linux、Mac OS X 和 Windows 上运行.

KISSY Core 预览版

- MArCoRQ - 岁月如歌
KISSY 是淘宝新一代前端 UI 类库,陆陆续续经过大半年的开发,终于完成了核心部分. KISSY 借鉴了 YUI3 的代码组织思想,尝试融合 jQuery/YUI2/ExtJS 等类库的优点. 目前才刚起步,下面是相关话题:. 请先看个 ppt, 或许能解答你的疑惑:前端_UI_类库_KISSY_赛马竞标书.pptx.

[MySQL FAQ]系列 — profiling中要关注哪些信息

- - MySQL中文网
利用MySQL的PROFILE功能,我们可以很方便的查看一个SQL具体的执行代价是怎样的,尤其是可以分析它的最大瓶颈在哪里. 目前PROFILE功能可提供除了内存以外的其他资源消耗统计,例如CPU、I/O、CONTEXT、SWAP等. PROFILE功能只能在SESSION级别使用,还做不到像SQL Server那样可以全局开启,收集一段时间后再关闭,这点有待改进.