Analyzing a .NET Core Core Dump on Linux | All Your Base Are Belong To Us
Recently, I had to open a core dump of a .NET Core application on Linux. I thought this walkthrough might be useful if you find yourself in the same boat, because, to be quite honest, I didn’t find it trivial.
Configure Linux to Generate Core Dumps
Before you begin, you need to configure your Linux box to generate core dumps in the first place. A lot of distros will have something preconfigured, but the simplest approach is to just put a file name in the /proc/sys/kernel/core_pattern file:
# echo core > /proc/sys/kernel/core_pattern
Additionally, there’s a system limit maximum size for the generated core file. ulimit -c unlimited removes that limit. Now, whenever your .NET Core process (or any other process) crashes, you’ll get a core file generated in the same directory. By the way, .NET Core on Linux x86_64 reserves a pretty gigantic address space, so expect your core files to be pretty big. But compression helps — I had a 6.5GB core dump compress into a 59MB gzip file.
Installing LLDB
To open the core dump, you’ll need LLDB built with the same architecture as your CoreCLR. Here’s how I found out what I needed:
$ find /usr/share/dotnet -name libsosplugin.so /usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.0/libsosplugin.so $ ldd $(find /usr/share/dotnet -name libsosplugin.so) | grep lldb lib lldb-3.5.so.1 => /usr/lib/x86_64-linux-gnu/lib lldb-3.5.so.1 (0x00007f0a6b2d8000)
Seeing that LLDB 3.5 was required, I installed it with sudo apt install lldb-3.5, but YMMV on other distros, of course.
Opening The Core File And Loading SOS
Now you’re ready to open the core file in LLDB. If you’re doing this on a different box, you’ll need the same version of .NET Core installed — that’s where the dotnet binary, SOS itself, and the DAC (debugger data access component) are coming from. You could also copy the /usr/share/dotnet/shared/Microsoft.NETCore.App/nnnn directory over, of course.
$ lldb $(which dotnet) --core ./core
Once inside LLDB, you’ll need to load the SOS plugin. It’s the one we found earlier:
(lldb) plugin load /usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.1/libsosplugin.so
Now, if everything went well, the SOS plugin needs the DAC (libmscordaccore.so), so you’ll need to tell it where to look:
(lldb) setclrpath /usr/share/dotnet/shared/Microsoft.NETCore.App/1.1.1
With that, SOS should be loaded and ready for use.
Running Analysis
You’d think you can just start running the SOS commands you know and love, but there’s one final hurdle. Here’s what happened when I opened a core file generated from a crash, and tried to get the exception information (note that you should prefix SOS commands with ‘sos’):
(lldb) sos PrintException The current thread is unmanaged
… which is kind of odd, right? Considering that the process crashed as a result of a managed exception. Looking at the docs, it looks like SOS and LLDB have trouble communicating around the current thread’s identity. So first, let’s find the thread that encountered an exception:
(lldb) sos Threads ThreadCount: 13 UnstartedThread: 0 BackgroundThread: 11 PendingThread: 0 DeadThread: 0 Hosted Runtime: no Lock ID OSID ThreadOBJ State GC Mode GC Alloc Context Domain Count Apt Exception XXXX 1 57ff 0000000000C2B380 2020020 Preemptive (nil):(nil) 0000000000C195C0 0 Ukn XXXX 2 5807 0000000000CAAF80 21220 Preemptive 0x7f5ad2fcbc40:0x7f5ad2fcdae0 0000000000C195C0 0 Ukn (Finalizer) XXXX 4 580a 0000000000DC2730 21220 Preemptive (nil):(nil) 0000000000C195C0 0 Ukn XXXX 6 580d 0000000000EC1D70 21220 Preemptive 0x7f5ad576b4d0:0x7f5ad576cf58 0000000000C195C0 0 Ukn XXXX 7 5a13 00007F5ABC0292A0 1021220 Preemptive 0x7f5ad5888d30:0x7f5ad5888fd0 0000000000C195C0 0 Ukn (Threadpool Worker) XXXX 8 5a15 00007F5AC006A3F0 21020 Preemptive 0x7f5ad594dd10:0x7f5ad594ece8 0000000000C195C0 0 Ukn System.IO.FileNotFoundException 00007f5ad593fa80 (nested exceptions) XXXX 9 5a16 00007F5AC00916A0 21220 Preemptive (nil):(nil) 0000000000C195C0 0 Ukn XXXX 10 5a17 00007F5AC80015D0 1021220 Preemptive 0x7f5ad593a9a0:0x7f5ad593b978 0000000000C195C0 0 Ukn (Threadpool Worker) XXXX 5 5a18 00007F5AC0814DF0 21220 Preemptive 0x7f5ad50ed1b8:0x7f5ad50eefd0 0000000000C195C0 0 Ukn XXXX 3 5a19 00007F5C54000A00 1020220 Preemptive (nil):(nil) 0000000000C195C0 0 Ukn (Threadpool Worker) XXXX 11 5a1a 00007F5C50019270 1021220 Preemptive 0x7f5ad58a5710:0x7f5ad58a6fd0 0000000000C195C0 0 Ukn (Threadpool Worker) XXXX 12 5a1b 00007F5AC0831B80 1021220 Preemptive 0x7f5ad58fcf68:0x7f5ad58fd000 0000000000C195C0 0 Ukn (Threadpool Worker) XXXX 13 5a1c 0000000000E8F720 1021220 Preemptive 0x7f5ad593bc80:0x7f5ad593d978 0000000000C195C0 0 Ukn (Threadpool Worker)
Thread #8 looks suspicious, what with the System.IO.FileNotFoundException in the Exception column. Now, let’s see all the LLDB threads:
(lldb) thread list Process 0 stopped * thread #1: tid = 0, 0x00007f5c5d83b7ef libc.so.6`__GI_raise(sig=2) + 159 at raise.c:58, name = 'dotnet', stop reason = signal SIGABRT thread #2: tid = 1, 0x00007f5c5e482510 libpthread.so.0`__pthread_cond_wait + 256, stop reason = signal SIGABRT thread #3: tid = 2, 0x00007f5c5d907d29 libc.so.6`syscall + 25, stop reason = signal SIGABRT thread #4: tid = 3, 0x00007f5c5d907d29 libc.so.6`syscall + 25, stop reason = signal SIGABRT ... more threads snipped for brevity ...
Here, it looks like thread 1 is the one with the exception being raised. So we have to map the OS thread ID from the first command, to the LLDB thread id from the second command:
(lldb) setsostid 5a15 1 Mapped sos OS tid 0x5a15 to lldb thread index 1
And now, we’re ready to roll:
(lldb) sos PrintException Exception object: 00007f5ad593fa80 Exception type: System.IO.FileNotFoundException Message: Could not load the specified file. InnerException: <none> StackTrace (generated): SP IP Function 00007F5C45D227C0 00007F5BE37412E7 System.Private.CoreLib.ni.dll!System.Runtime.Loader.AssemblyLoadContext.ResolveUsingEvent(System.Reflection.AssemblyName)+0x20ab07 00007F5C45D227F0 00007F5BE353664F System.Private.CoreLib.ni.dll!System.Runtime.Loader.AssemblyLoadContext.ResolveUsingResolvingEvent(IntPtr, System.Reflection.AssemblyName)+0x4f StackTraceString: <none> HResult: 80070002 Nested exception ------------------------------------------------------------- Exception object: 00007f5ad593dea0 Exception type: System.InvalidOperationException Message: Authorization cannot be requested before logging in. InnerException: <none> StackTrace (generated): SP IP Function 00007F5C45D29890 00007F5BE63002FE kitt3ns.dll!WebApplication.Controllers.AuthorizationBackgroundWorker.VerifyAuthorized(System.String)+0xae 00007F5C45D298D0 00007F5BE630022B kitt3ns.dll!WebApplication.Controllers.AuthorizationBackgroundWorker.RequestAuthorization()+0x2b 00007F5C45D298E0 00007F5BE55BC31C kitt3ns.dll!WebApplication.Controllers.AuthorizationBackgroundWorker+<>c.<Authorize>b__0_0()+0x4c 00007F5C45D29910 00007F5BE33BDF11 System.Private.CoreLib.ni.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)+0x111 StackTraceString: <none> HResult: 80131509 (lldb) sos ClrStack OS Thread Id: 0x5a15 (1) Child SP IP Call Site 00007F5C45D272C8 00007f5c5d83b7ef [HelperMethodFrame: 00007f5c45d272c8] 00007F5C45D273E0 00007F5BE33BDF11 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) 00007F5C45D29770 00007f5c5cbe9bad [HelperMethodFrame: 00007f5c45d29770] 00007F5C45D29890 00007F5BE63002FE WebApplication.Controllers.AuthorizationBackgroundWorker.VerifyAuthorized(System.String) [/home/vagrant/kitt3ns/Controllers/AccountController.cs @ 37] 00007F5C45D298D0 00007F5BE630022B WebApplication.Controllers.AuthorizationBackgroundWorker.RequestAuthorization() [/home/vagrant/kitt3ns/Controllers/AccountController.cs @ 30] 00007F5C45D298E0 00007F5BE55BC31C WebApplication.Controllers.AuthorizationBackgroundWorker+<>c.<Authorize>b__0_0() [/home/vagrant/kitt3ns/Controllers/AccountController.cs @ 24] 00007F5C45D29910 00007F5BE33BDE71 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) 00007F5C45D29B50 00007f5c5cbfb207 [GCFrame: 00007f5c45d29b50] 00007F5C45D29D30 00007f5c5cbfb207 [DebuggerU2MCatchHandlerFrame: 00007f5c45d29d30]
This gives us the exception information and the thread’s current stack, if we want it. We could similarly inspect other threads by mapping the OS thread id to the LLDB thread id, but for a thread that didn’t have an exception, where do you get that clue that connects the OS thread id to the debugger thread ID? Well, it seems that GDB is using the same numbering as LLDB, but in GDB you can actually see the LWP id (on Linux, GDB LWP = kernel pid = thread) using ‘info threads’:
$ gdb $(which dotnet) --core ./core ... (gdb) info threads Id Target Id Frame * 1 Thread 0x7f5c45d2a700 (LWP 23061) __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58 2 Thread 0x7f5c5eaab740 (LWP 22527) 0x00007f5c5e482510 in pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:219 3 Thread 0x7f5c5b411700 (LWP 22529) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 4 Thread 0x7f5c5ac10700 (LWP 22530) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 5 Thread 0x7f5c5a40f700 (LWP 22531) 0x00007f5c5d9020bd in poll () at ../sysdeps/unix/syscall-template.S:84 6 Thread 0x7f5c59c0e700 (LWP 22532) 0x00007f5c5e485d8d in __pause_nocancel () at ../sysdeps/unix/syscall-template.S:84 7 Thread 0x7f5c5940d700 (LWP 22533) 0x00007f5c5e482510 in pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:219 8 Thread 0x7f5c589b2700 (LWP 22534) 0x00007f5c5e482510 in pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:219 9 Thread 0x7f5c498ae700 (LWP 22535) 0x00007f5c5e4828b9 in pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:258 10 Thread 0x7f5c454ef700 (LWP 22538) 0x00007f5c5e4856ed in __close_nocancel () at ../sysdeps/unix/syscall-template.S:84 11 Thread 0x7f5ad2324700 (LWP 22540) 0x00007f5c5e4856ed in __close_nocancel () at ../sysdeps/unix/syscall-template.S:84 12 Thread 0x7f5ad1b23700 (LWP 22541) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 13 Thread 0x7f5ad2b25700 (LWP 23059) 0x00007f5c5e4828b9 in pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:258 ... more output snipped for brevity ...
So, for example, suppose we wanted to know what managed thread #6 (OS thread id 0x580d from the ‘sos Threads’ output above) was doing when the dump file was generated. 0x580d = 22541, which is thread #12 in the output above. Going back to LLDB (note the hex notation for both thread ids):
(lldb) setsostid 580d c Mapped sos OS tid 0x580d to lldb thread index 12 (lldb) clrstack OS Thread Id: 0x580d (12) Child SP IP Call Site 00007F5AD1B227F8 00007f5c5d907d29 [InlinedCallFrame: 00007f5ad1b227f8] Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.Libuv+NativeMethods.uv_run(Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvLoopHandle, Int32) 00007F5AD1B227F8 00007f5be45cea3a [InlinedCallFrame: 00007f5ad1b227f8] Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.Libuv+NativeMethods.uv_run(Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvLoopHandle, Int32) 00007F5AD1B227E0 00007F5BE45CEA3A DomainBoundILStubClass.(Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvLoopHandle, Int32) 00007F5AD1B22890 00007F5BE45CE968 Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.Libuv.run(Microsoft.AspNetCore.Server.Kestrel.Internal.Networking.UvLoopHandle, Int32) 00007F5AD1B228B0 00007F5BE45CBCFF Microsoft.AspNetCore.Server.Kestrel.Internal.KestrelThread.ThreadStart(System.Object) 00007F5AD1B22910 00007F5BE33BDE71 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) 00007F5AD1B22B50 00007f5c5cbfb207 [GCFrame: 00007f5ad1b22b50] 00007F5AD1B22D30 00007f5c5cbfb207 [DebuggerU2MCatchHandlerFrame: 00007f5ad1b22d30]
Other SOS commands that don’t depend on thread context (e.g. listing assemblies, heap objects, finalization queues and so on) do not require any fiddling with thread ids, and you can just run them directly.
Summary
So, what we had to do in order to open a .NET Core core dump from a Linux system was:
- Set up the Linux system to generate core dumps on crash
- Copy or install the right version of .NET Core on the analysis machine
- Install the version of LLDB matching your .NET Core’s SOS plugin
- Load the SOS plugin in LLDB and tell it where to find the DAC
- Set the debugger thread id for SOS thread-sensitive commands to work
- Run sos PrintException or any other commands to analyze the crash