Analyzing a .NET Core Core Dump on Linux | All Your Base Are Belong To Us

标签: | 发表时间:2018-12-19 09:10 | 作者:
出处:http://blogs.microsoft.co.il

Recently, I had to open a core dump of a .NET Core application on Linux. I thought this walkthrough might be useful if you find yourself in the same boat, because, to be quite honest, I didn’t find it trivial.

Configure Linux to Generate Core Dumps

Before you begin, you need to configure your Linux box to generate core dumps in the first place. A lot of distros will have something preconfigured, but the simplest approach is to just put a file name in the /proc/sys/kernel/core_pattern file:

# echo core > /proc/sys/kernel/core_pattern

Additionally, there’s a system limit maximum size for the generated core file.  ulimit -c unlimited removes that limit. Now, whenever your .NET Core process (or any other process) crashes, you’ll get a core file generated in the same directory. By the way, .NET Core on Linux x86_64 reserves a pretty gigantic address space, so expect your core files to be pretty big. But compression helps — I had a 6.5GB core dump compress into a 59MB gzip file.

Installing LLDB

To open the core dump, you’ll need LLDB built with the same architecture as your CoreCLR. Here’s how I found out what I needed:

$ find /usr/share/dotnet -name libsosplugin.so
/usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/libsosplugin.so

$ ldd $(find /usr/share/dotnet -name libsosplugin.so) | grep lldb
lib   lldb-3.5.so.1 => /usr/lib/x86_64-linux-gnu/lib   lldb-3.5.so.1 (0x00007f0a6b2d8000)

Seeing that LLDB 3.5 was required, I installed it with  sudo apt install lldb-3.5, but YMMV on other distros, of course.

Opening The Core File And Loading SOS

Now you’re ready to open the core file in LLDB. If you’re doing this on a different box, you’ll need the same version of .NET Core installed — that’s where the dotnet binary, SOS itself, and the DAC (debugger data access component) are coming from. You could also copy the /usr/share/dotnet/shared/Microsoft.NETCore.App/nnnn directory over, of course.

$ lldb $(which dotnet) --core ./core

Once inside LLDB, you’ll need to load the SOS plugin. It’s the one we found earlier:

(lldb) plugin load /usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.1/libsosplugin.so

Now, if everything went well, the SOS plugin needs the DAC (libmscordaccore.so), so you’ll need to tell it where to look:

(lldb) setclrpath /usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.1

With that, SOS should be loaded and ready for use.

Running Analysis

You’d think you can just start running the SOS commands you know and love, but there’s one final hurdle. Here’s what happened when I opened a core file generated from a crash, and tried to get the exception information (note that you should prefix SOS commands with ‘sos’):

(lldb) sos PrintException
The current thread is unmanaged

… which is kind of odd, right? Considering that the process crashed as a result of a managed exception. Looking at the docs, it looks like SOS and LLDB have trouble communicating around the current thread’s identity. So first, let’s find the thread that encountered an exception:

(lldb) sos Threads
ThreadCount:      13
UnstartedThread:  0
BackgroundThread: 11
PendingThread:    0
DeadThread:       0
Hosted Runtime:   no
                                                                                                        Lock
       ID OSID ThreadOBJ           State GC Mode     GC Alloc Context                  Domain           Count Apt Exception
XXXX    1 57ff 0000000000C2B380  2020020 Preemptive  (nil):(nil)                       0000000000C195C0 0     Ukn
XXXX    2 5807 0000000000CAAF80    21220 Preemptive  0x7f5ad2fcbc40:0x7f5ad2fcdae0     0000000000C195C0 0     Ukn (Finalizer)
XXXX    4 580a 0000000000DC2730    21220 Preemptive  (nil):(nil)                       0000000000C195C0 0     Ukn
XXXX    6 580d 0000000000EC1D70    21220 Preemptive  0x7f5ad576b4d0:0x7f5ad576cf58     0000000000C195C0 0     Ukn
XXXX    7 5a13 00007F5ABC0292A0  1021220 Preemptive  0x7f5ad5888d30:0x7f5ad5888fd0     0000000000C195C0 0     Ukn (Threadpool Worker)
XXXX    8 5a15 00007F5AC006A3F0    21020 Preemptive  0x7f5ad594dd10:0x7f5ad594ece8     0000000000C195C0 0     Ukn System.IO.FileNotFoundException 00007f5ad593fa80 (nested exceptions)
XXXX    9 5a16 00007F5AC00916A0    21220 Preemptive  (nil):(nil)                       0000000000C195C0 0     Ukn
XXXX   10 5a17 00007F5AC80015D0  1021220 Preemptive  0x7f5ad593a9a0:0x7f5ad593b978     0000000000C195C0 0     Ukn (Threadpool Worker)
XXXX    5 5a18 00007F5AC0814DF0    21220 Preemptive  0x7f5ad50ed1b8:0x7f5ad50eefd0     0000000000C195C0 0     Ukn
XXXX    3 5a19 00007F5C54000A00  1020220 Preemptive  (nil):(nil)                       0000000000C195C0 0     Ukn (Threadpool Worker)
XXXX   11 5a1a 00007F5C50019270  1021220 Preemptive  0x7f5ad58a5710:0x7f5ad58a6fd0     0000000000C195C0 0     Ukn (Threadpool Worker)
XXXX   12 5a1b 00007F5AC0831B80  1021220 Preemptive  0x7f5ad58fcf68:0x7f5ad58fd000     0000000000C195C0 0     Ukn (Threadpool Worker)
XXXX   13 5a1c 0000000000E8F720  1021220 Preemptive  0x7f5ad593bc80:0x7f5ad593d978     0000000000C195C0 0     Ukn (Threadpool Worker)

Thread #8 looks suspicious, what with the System.IO.FileNotFoundException in the Exception column. Now, let’s see all the LLDB threads:

(lldb) thread list
Process 0 stopped
* thread #1: tid = 0, 0x00007f5c5d83b7ef libc.so.6`__GI_raise(sig=2) + 159 at raise.c:58, name = 'dotnet', stop reason = signal SIGABRT
  thread #2: tid = 1, 0x00007f5c5e482510 libpthread.so.0`__pthread_cond_wait + 256, stop reason = signal SIGABRT
  thread #3: tid = 2, 0x00007f5c5d907d29 libc.so.6`syscall + 25, stop reason = signal SIGABRT
  thread #4: tid = 3, 0x00007f5c5d907d29 libc.so.6`syscall + 25, stop reason = signal SIGABRT
... more threads snipped for brevity ...

Here, it looks like thread 1 is the one with the exception being raised. So we have to map the OS thread ID from the first command, to the LLDB thread id from the second command:

(lldb) setsostid 5a15 1
Mapped sos OS tid 0x5a15 to lldb thread index 1

And now, we’re ready to roll:

(lldb) sos PrintException
Exception object: 00007f5ad593fa80
Exception type:   System.IO.FileNotFoundException
Message:          Could not load the specified file.
InnerException:   <none>
StackTrace (generated):    SP               IP               Function
    00007F5C45D227C0 00007F5BE37412E7 System.Private.CoreLib.ni.dll!System.Runtime.Loader.AssemblyLoadContext.ResolveUsingEvent(System.Reflection.AssemblyName)+0x20ab07
    00007F5C45D227F0 00007F5BE353664F System.Private.CoreLib.ni.dll!System.Runtime.Loader.AssemblyLoadContext.ResolveUsingResolvingEvent(IntPtr, System.Reflection.AssemblyName)+0x4f

StackTraceString: <none>
HResult: 80070002

Nested exception -------------------------------------------------------------
Exception object: 00007f5ad593dea0
Exception type:   System.InvalidOperationException
Message:          Authorization cannot be requested before logging in.
InnerException:   <none>
StackTrace (generated):
    SP               IP               Function
    00007F5C45D29890 00007F5BE63002FE kitt3ns.dll!WebApplication.Controllers.AuthorizationBackgroundWorker.VerifyAuthorized(System.String)+0xae
    00007F5C45D298D0 00007F5BE630022B kitt3ns.dll!WebApplication.Controllers.AuthorizationBackgroundWorker.RequestAuthorization()+0x2b
    00007F5C45D298E0 00007F5BE55BC31C kitt3ns.dll!WebApplication.Controllers.AuthorizationBackgroundWorker+<>c.<Authorize>b__0_0()+0x4c
    00007F5C45D29910 00007F5BE33BDF11 System.Private.CoreLib.ni.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+0x111

StackTraceString: <none>
HResult: 80131509

(lldb) sos ClrStack
OS Thread Id: 0x5a15 (1)
        Child SP               IP Call Site
00007F5C45D272C8 00007f5c5d83b7ef [HelperMethodFrame: 00007f5c45d272c8]
00007F5C45D273E0 00007F5BE33BDF11 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
00007F5C45D29770 00007f5c5cbe9bad [HelperMethodFrame: 00007f5c45d29770]
00007F5C45D29890 00007F5BE63002FE WebApplication.Controllers.AuthorizationBackgroundWorker.VerifyAuthorized(System.String) [/home/vagrant/kitt3ns/Controllers/AccountController.cs @ 37]
00007F5C45D298D0 00007F5BE630022B WebApplication.Controllers.AuthorizationBackgroundWorker.RequestAuthorization() [/home/vagrant/kitt3ns/Controllers/AccountController.cs @ 30]
00007F5C45D298E0 00007F5BE55BC31C WebApplication.Controllers.AuthorizationBackgroundWorker+<>c.<Authorize>b__0_0() [/home/vagrant/kitt3ns/Controllers/AccountController.cs @ 24]
00007F5C45D29910 00007F5BE33BDE71 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
00007F5C45D29B50 00007f5c5cbfb207 [GCFrame: 00007f5c45d29b50] 
00007F5C45D29D30 00007f5c5cbfb207 [DebuggerU2MCatchHandlerFrame: 00007f5c45d29d30] 

This gives us the exception information and the thread’s current stack, if we want it. We could similarly inspect other threads by mapping the OS thread id to the LLDB thread id, but for a thread that didn’t have an exception, where do you get that clue that connects the OS thread id to the debugger thread ID? Well, it seems that GDB is using the same numbering as LLDB, but in GDB you can actually see the LWP id (on Linux, GDB LWP = kernel pid = thread) using ‘info threads’:

$ gdb $(which dotnet) --core ./core
...

(gdb) info threads
  Id   Target Id         Frame
* 1    Thread 0x7f5c45d2a700 (LWP 23061) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
  2    Thread 0x7f5c5eaab740 (LWP 22527) 0x00007f5c5e482510 in pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:219
  3    Thread 0x7f5c5b411700 (LWP 22529) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  4    Thread 0x7f5c5ac10700 (LWP 22530) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  5    Thread 0x7f5c5a40f700 (LWP 22531) 0x00007f5c5d9020bd in poll () at ../sysdeps/unix/syscall-template.S:84
  6    Thread 0x7f5c59c0e700 (LWP 22532) 0x00007f5c5e485d8d in __pause_nocancel () at ../sysdeps/unix/syscall-template.S:84
  7    Thread 0x7f5c5940d700 (LWP 22533) 0x00007f5c5e482510 in pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:219
  8    Thread 0x7f5c589b2700 (LWP 22534) 0x00007f5c5e482510 in pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:219
  9    Thread 0x7f5c498ae700 (LWP 22535) 0x00007f5c5e4828b9 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:258
  10   Thread 0x7f5c454ef700 (LWP 22538) 0x00007f5c5e4856ed in __close_nocancel () at ../sysdeps/unix/syscall-template.S:84
  11   Thread 0x7f5ad2324700 (LWP 22540) 0x00007f5c5e4856ed in __close_nocancel () at ../sysdeps/unix/syscall-template.S:84
  12   Thread 0x7f5ad1b23700 (LWP 22541) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
  13   Thread 0x7f5ad2b25700 (LWP 23059) 0x00007f5c5e4828b9 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
    at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:258
... more output snipped for brevity ...

So, for example, suppose we wanted to know what managed thread #6 (OS thread id 0x580d from the ‘sos Threads’ output above) was doing when the dump file was generated. 0x580d = 22541, which is thread #12 in the output above. Going back to LLDB (note the hex notation for both thread ids):

(lldb) setsostid 580d c
Mapped sos OS tid 0x580d to lldb thread index 12

(lldb) clrstack
OS Thread Id: 0x580d (12)
        Child SP               IP Call Site
00007F5AD1B227F8 00007f5c5d907d29 [InlinedCallFrame: 00007f5ad1b227f8] Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.Libuv+NativeMethods.uv_run(Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvLoopHandle, Int32)
00007F5AD1B227F8 00007f5be45cea3a [InlinedCallFrame: 00007f5ad1b227f8] Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.Libuv+NativeMethods.uv_run(Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvLoopHandle, Int32)
00007F5AD1B227E0 00007F5BE45CEA3A DomainBoundILStubClass.(Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvLoopHandle, Int32)
00007F5AD1B22890 00007F5BE45CE968 Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.Libuv.run(Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvLoopHandle, Int32)
00007F5AD1B228B0 00007F5BE45CBCFF Microsoft.AspNetCore.Server.Kestrel.Internal.KestrelThread.ThreadStart(System.Object)
00007F5AD1B22910 00007F5BE33BDE71 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
00007F5AD1B22B50 00007f5c5cbfb207 [GCFrame: 00007f5ad1b22b50]
00007F5AD1B22D30 00007f5c5cbfb207 [DebuggerU2MCatchHandlerFrame: 00007f5ad1b22d30]

Other SOS commands that don’t depend on thread context (e.g. listing assemblies, heap objects, finalization queues and so on) do not require any fiddling with thread ids, and you can just run them directly.

Summary

So, what we had to do in order to open a .NET Core core dump from a Linux system was:

  • Set up the Linux system to generate core dumps on crash
  • Copy or install the right version of .NET Core on the analysis machine
  • Install the version of LLDB matching your .NET Core’s SOS plugin
  • Load the SOS plugin in LLDB and tell it where to find the DAC
  • Set the debugger thread id for SOS thread-sensitive commands to work
  • Run  sos PrintException or any other commands to analyze the crash

相关 [analyzing net core] 推荐:

Analyzing a .NET Core Core Dump on Linux | All Your Base Are Belong To Us

- -
I thought this walkthrough might be useful if you find yourself in the same boat, because, to be quite honest, I didn’t find it trivial.. A lot of distros will have something preconfigured, but the simplest approach is to just put a file name in the /proc/sys/kernel/core_pattern file:.

Debugging .NET Core on Linux with LLDB | RayDBG

- -
The LLDB debugger is conceptually similar to the native Windows debugging tools in that it is a low level and command live driven debugger. Part of the reason the .NET Core team chose the LLDB debugger was for its extensibility points that allowed them to create the SOS plugin which can be used to debug .NET core applications.

Profiling a .NET Core Application on Linux | All Your Base Are Belong To Us

- -
In the same vein of  my previous post on analyzing core dumps of .NET Core applications on Linux, let’s take a look at what it takes to do some basic performance profiling.

.Net Core 全局性能诊断工具

- - IT瘾-dev
现在.NET Core 上线后,不可避免的会出现各种问题,如内存泄漏、CPU占用高、接口处理耗时较长等问题. 这个时候就需要快速准确的定位问题,并解决. 这时候就可以使用.NET Core 为开发人员提供了一系列功能强大的诊断工具. 接下来就详细了解下:.NET Core 全局诊断工具. dotnet-counters 是一个性能监视工具,用于初级运行状况监视和性能调查.

Debugging .NET Core app from a command line on Linux - Dots and Brackets: Code Blog

- -
Million years ago, way before the ice age, I was preparing small C++ project for “Unix Programming” university course and at some point had to debug it via command line.

.Net Core in Docker - 在容器内编译发布并运行 - Agile.Zhou - 博客园

- -
Docker可以说是现在微服务,DevOps的基础,咱们.Net Core自然也得上Docker. .Net Core发布到Docker容器的教程网上也有不少,但是今天还是想来写一写. 你搜.Net core程序发布到Docker网上一般常见的有两种方案:. 1、在本地编译成Dll文件后通过SCP命令或者WinSCP等工具上传到服务器上,然后构建Docker镜像再运行容器.

为什么 web 开发人员需要迁移到. NET Core, 并使用 ASP.NET Core MVC 构建 web 和 webservice/API - 张善友 - 博客园

- -
2018 .NET开发者调查报告: .NET Core 是怎么样的状态,这里我们看到了还有非常多的.net开发人员还在观望,本文给大家一个建议. 这仅代表我的个人意见, 我有充分的理由推荐.net 程序员使用. 有些人可能不同意我的观点, 但是分享想法和讨论它是好的. .net 程序员或他们所在的团队总有各种理由说他们的系统还在使用旧系统, 这显然是企业开发人员的事情.

【实验手册】使用Visual Studio Code 开发.NET Core应用程序 - 张善友 - 博客园

- -
开源和跨平台开发是Microsoft 的当前和将来至关重要的策略. .NET Core已开源,同时开发了其他项来使用和支持新的跨平台策略. .NET Core 2.0 目前已经正式发布,是适用于针对 Web 和云构建跨平台应用程序的最新开源技术,可在 Linux、Mac OS X 和 Windows 上运行.

KISSY Core 预览版

- MArCoRQ - 岁月如歌
KISSY 是淘宝新一代前端 UI 类库,陆陆续续经过大半年的开发,终于完成了核心部分. KISSY 借鉴了 YUI3 的代码组织思想,尝试融合 jQuery/YUI2/ExtJS 等类库的优点. 目前才刚起步,下面是相关话题:. 请先看个 ppt, 或许能解答你的疑惑:前端_UI_类库_KISSY_赛马竞标书.pptx.

是否该用 Core Data?

- kezhuw - jjgod / blog
Core Data 是 Cocoa 里面一套非常受欢迎的框架,从 Mac OS X 10.4 提供以来,在 10.5 中引入了完善的 schema 迁移机制,再到 iPhone OS 3.0 时被引入 Cocoa Touch,这套完善的框架都被认为是管理大量结构化数据所首选的 Cocoa 框架,尤其是因为使用 Core Data 能大大减少需要手工编写的代码量,就使它更受开发者欢迎了.