Header Shadow Image


Program terminated with signal 6, Aborted Linux

This is an extension of the gdb and cores file post earlier and viewable int he link below.  In this case, whenever we restart or try to shutdown the system, we see a brief message on the prompt about some crash which quickly disappears.  Then the system restarts but is not capable of shutting down unless we cut the power.  It's so quick we can barely make out any characters.  So we check the cores files folder setup earlier so we can double check on why this could be happening by corrolating this with last|more information and the core files produced.

[root@mbpc cores]# locate gvfs-gdu-volume
/cores/core.gvfs-gdu-volume.10346.mbpc.1372911605
/cores/core.gvfs-gdu-volume.12094.mbpc.1372910362
/cores/core.gvfs-gdu-volume.14548.mbpc.1369504171
/cores/core.gvfs-gdu-volume.28404.mbpc.1372697005
/cores/core.gvfs-gdu-volume.29440.mbpc.1372704190
/cores/core.gvfs-gdu-volume.32474.mbpc.1372707034
/cores/core.gvfs-gdu-volume.4191.mbpc.1369503901
/cores/core.gvfs-gdu-volume.5841.mbpc.1372910706
/cores/core.gvfs-gdu-volume.7077.mbpc.1372707292
/usr/libexec/gvfs-gdu-volume-monitor

So we start the debugger in this manner specifying the executable and the core file.

[root@mbpc cores]# gdb /usr/libexec/gvfs-gdu-volume-monitor core.gvfs-gdu-volume.24332.mbpc.1373231919
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>…
Reading symbols from /usr/libexec/gvfs-gdu-volume-monitor…(no debugging symbols found)…done.

warning: core file may not match specified executable file.
[New Thread 24332]
Reading symbols from /usr/lib64/libgdu.so.0…(no debugging symbols found)…done.
Loaded symbols for /usr/lib64/libgdu.so.0
Reading symbols from /usr/lib64/libgvfscommon.so.0…(no debugging symbols found)…done.
Loaded symbols for /usr/lib64/libgvfscommon.so.0
Reading symbols from /lib64/libdbus-1.so.3…(no debugging symbols found)…done.
Loaded symbols for /lib64/libdbus-1.so.3
Reading symbols from /lib64/libpthread.so.0…(no debugging symbols found)…done.
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libgthread-2.0.so.0…(no debugging symbols found)…done.
Loaded symbols for /lib64/libgthread-2.0.so.0
Reading symbols from /lib64/librt.so.1…(no debugging symbols found)…done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/libgio-2.0.so.0…(no debugging symbols found)…done.
Loaded symbols for /lib64/libgio-2.0.so.0
Reading symbols from /lib64/libgobject-2.0.so.0…(no debugging symbols found)…done.
Loaded symbols for /lib64/libgobject-2.0.so.0
Reading symbols from /lib64/libgmodule-2.0.so.0…(no debugging symbols found)…done.
Loaded symbols for /lib64/libgmodule-2.0.so.0
Reading symbols from /lib64/libglib-2.0.so.0…(no debugging symbols found)…done.
Loaded symbols for /lib64/libglib-2.0.so.0
Reading symbols from /lib64/libutil.so.1…(no debugging symbols found)…done.
Loaded symbols for /lib64/libutil.so.1
Reading symbols from /lib64/libc.so.6…(no debugging symbols found)…done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /usr/lib64/libdbus-glib-1.so.2…(no debugging symbols found)…done.
Loaded symbols for /usr/lib64/libdbus-glib-1.so.2
Reading symbols from /usr/lib64/libgnome-keyring.so.0…(no debugging symbols found)…done.
Loaded symbols for /usr/lib64/libgnome-keyring.so.0
Reading symbols from /lib64/ld-linux-x86-64.so.2…(no debugging symbols found)…done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib64/libdl.so.2…(no debugging symbols found)…done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libresolv.so.2…(no debugging symbols found)…done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/libselinux.so.1…(no debugging symbols found)…done.
Loaded symbols for /lib64/libselinux.so.1
Reading symbols from /usr/lib64/gio/modules/libgvfsdbus.so…(no debugging symbols found)…done.
Loaded symbols for /usr/lib64/gio/modules/libgvfsdbus.so
Reading symbols from /lib64/libudev.so.0…(no debugging symbols found)…done.
Loaded symbols for /lib64/libudev.so.0
Reading symbols from /lib64/libnss_files.so.2…(no debugging symbols found)…done.
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `/usr/libexec/gvfs-gdu-volume-monitor'.
Program terminated with signal 6, Aborted.
#0  0x000000314e0328a5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install gvfs-1.4.3-12.el6.x86_64
(gdb) tr
Tracepoint 1 at 0x314e0328a5
(gdb) where
#0  0x000000314e0328a5 in raise () from /lib64/libc.so.6
#1  0x000000314e034085 in abort () from /lib64/libc.so.6
#2  0x000000314dc5ea0f in g_assertion_message () from /lib64/libglib-2.0.so.0
#3  0x000000314dc5efb0 in g_assertion_message_expr () from /lib64/libglib-2.0.so.0
#4  0×0000003154410086 in gdu_pool_get_presentables () from /usr/lib64/libgdu.so.0
#5  0x000000000040f1ae in ?? ()
#6  0×0000000000410165 in ?? ()
#7  0x000000314f8121fa in g_object_newv () from /lib64/libgobject-2.0.so.0
#8  0x000000314f8128ed in g_object_new_valist () from /lib64/libgobject-2.0.so.0
#9  0x000000314f812a4c in g_object_new () from /lib64/libgobject-2.0.so.0
#10 0×0000000000410290 in ?? ()
#11 0x00000000004103d1 in ?? ()
#12 0x000000314e01ecdd in __libc_start_main () from /lib64/libc.so.6
#13 0×0000000000407359 in ?? ()
#14 0x00007fffd39a19d8 in ?? ()
#15 0x000000000000001c in ?? ()
#16 0×0000000000000001 in ?? ()
#17 0x00007fffd39a2b26 in ?? ()
#18 0×0000000000000000 in ?? ()
(gdb)

We note the message in red above so we follow what it says and run that above line to install more debugging symbols.

[root@mbpc cores]# which debuginfo-install
/usr/bin/debuginfo-install
[root@mbpc cores]# debuginfo-install gvfs-1.4.3-12.el6.x86_64

and rerun the debug command above to see more messages but also more items we can install:

Missing separate debuginfos, use: debuginfo-install dbus-glib-0.86-6.el6_4.x86_64 libselinux-2.0.94-5.3.el6.x86_64

And we try to run gdb again to see all the messages possible.  Sure enough when we run the gdb with the same executable line as above, we now get this message:

Core was generated by `/usr/libexec/gvfs-gdu-volume-monitor'.
Program terminated with signal 6, Aborted.
#0  0x000000314e0328a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb)

so we trace it further to get:

(gdb) where
#0  0x000000314e0328a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x000000314e034085 in abort () at abort.c:92
#2  0x000000314dc5ea0f in IA__g_assertion_message (domain=<value optimized out>, file=0x315442c5c5 "gdu-pool.c", line=<value optimized out>,
    func=0x315442d710 "gdu_pool_get_presentables", message=0×1683570 "assertion failed: (pool != NULL)") at gtestutils.c:1302
#3  0x000000314dc5efb0 in IA__g_assertion_message_expr (domain=0x315442a2a4 "libgdu", file=0x315442c5c5 "gdu-pool.c", line=2565,
    func=0x315442d710 "gdu_pool_get_presentables", expr=<value optimized out>) at gtestutils.c:1313
#4  0×0000003154410086 in gdu_pool_get_presentables (pool=<value optimized out>) at gdu-pool.c:2565
#5  0x000000000040f1ae in update_drives (monitor=0×1671810, emit_changes=0) at ggduvolumemonitor.c:1228
#6  update_all (monitor=0×1671810, emit_changes=0) at ggduvolumemonitor.c:1004
#7  0×0000000000410165 in g_gdu_volume_monitor_constructor (type=<value optimized out>, n_construct_properties=<value optimized out>,
    construct_properties=<value optimized out>) at ggduvolumemonitor.c:455
#8  0x000000314f8121fa in IA__g_object_newv (object_type=23524304, n_parameters=0, parameters=0×0) at gobject.c:1171
#9  0x000000314f8128ed in IA__g_object_new_valist (object_type=23524304, first_property_name=0×0, var_args=0x7fffd39a17c0) at gobject.c:1323
#10 0x000000314f812a4c in IA__g_object_new (object_type=23524304, first_property_name=0×0) at gobject.c:1086
#11 0×0000000000410290 in monitor_try_create () at gvfsproxyvolumemonitordaemon.c:2045
#12 0x00000000004103d1 in g_vfs_proxy_volume_monitor_daemon_main (argc=<value optimized out>, argv=<value optimized out>,
    dbus_name=0x4142b0 "org.gtk.Private.GduVolumeMonitor", volume_monitor_type=23524304) at gvfsproxyvolumemonitordaemon.c:2088
#13 0x000000314e01ecdd in __libc_start_main (main=0×407420 <main>, argc=1, ubp_av=0x7fffd39a19e8, init=<value optimized out>, fini=<value optimized out>,
    rtld_fini=<value optimized out>, stack_end=0x7fffd39a19d8) at libc-start.c:226
#14 0×0000000000407359 in _start ()
(gdb)

So now we get a full view of the message and the issue. On to the resolution which I suspect would require some coding and research into the source files.  For this, we'll set the kernel panic parameter like this:

[root@mbpc log]# cat /proc/sys/kernel/panic
60
[root@mbpc log]# echo 300 > /proc/sys/kernel/panic
[root@mbpc log]# cat /proc/sys/kernel/panic
300
[root@mbpc log]#

 

It was set to 0 by default, meaning it should just wait there (or I guess in RHEL / SL it means wait 0 seconds).  So we try this to see if we can spot the message and if it had anything to do with the gvfs stuff above.  Once this was set, we can see the following message snippet on the monitor:

lockdep.c:2465

but that isn't really giving us much.  So we install the RHEL crash utility using yum install crash* to install all associated tools to speed things up instead of getting stuck on missing items.  Having installed crash, now I find the vmcore files on the system using locate like this:

[root@mbpc crash]# ls -altri
total 52
134654 drwxr-xr-x.  2 root root 4096 Sep  2  2012 127.0.0.1-2012-09-02-09:09:08
133608 drwxr-xr-x.  2 root root 4096 Sep  3  2012 127.0.0.1-2012-09-03-19:16:57
131458 drwxr-xr-x.  2 root root 4096 May  8 02:33 127.0.0.1-2013-05-08-02:31:19
131073 drwxr-xr-x. 22 root root 4096 May 24 22:50 ..
131378 drwxr-xr-x.  2 root root 4096 Jun 30 01:05 127.0.0.1-2013-06-30-01:02:15
131134 drwxr-xr-x.  2 root root 4096 Jul  4 00:01 127.0.0.1-2013-07-04-00:00:56
131108 drwxr-xr-x.  2 root root 4096 Jul  4 00:06 127.0.0.1-2013-07-04-00:06:41
131147 drwxr-xr-x.  2 root root 4096 Jul  7 15:37 127.0.0.1-2013-07-07-15:36:39
131127 drwxr-xr-x.  2 root root 4096 Jul  7 16:03 127.0.0.1-2013-07-07-16:03:20
131156 drwxr-xr-x.  2 root root 4096 Jul  7 17:20 127.0.0.1-2013-07-07-17:20:18
131158 drwxr-xr-x.  2 root root 4096 Jul  7 17:59 127.0.0.1-2013-07-07-17:59:14
132681 drwxr-xr-x. 13 root root 4096 Jul  7 18:04 .
131161 drwxr-xr-x.  2 root root 4096 Jul  7 18:05 127.0.0.1-2013-07-07-18:04:52
[root@mbpc crash]# cd 127.0.0.1-2013-07-07-18\:04\:52/
[root@mbpc 127.0.0.1-2013-07-07-18:04:52]# pwd
/var/crash/127.0.0.1-2013-07-07-18:04:52
[root@mbpc 127.0.0.1-2013-07-07-18:04:52]# ls -altri
total 47260
132681 drwxr-xr-x. 13 root root     4096 Jul  7 18:04 ..
131168 -rw——-.  1 root root 48380959 Jul  7 18:05 vmcore
131161 drwxr-xr-x.  2 root root     4096 Jul  7 18:05 .
[root@mbpc 127.0.0.1-2013-07-07-18:04:52]#

So now let's run some analysis on what we found.  Trying get's me this:

crash: namelist argument required
crash: cannot find booted kernel — please enter namelist argument

The second one means that you did not specify a vcore file as an argument when running crash.  For the first one, we need to install some more packages.  So let's do that.  

[root@mbpc crash]# cat /etc/yum.repos.d/rhel-debuginfo.repo
[rhel-debuginfo]
name=Red Hat Enterprise Linux $releasever – $basearch – Debug
baseurl=ftp://ftp.redhat.com/pub/redhat/linux/enterprise/$releasever/en/os/$basearch/Debuginfo/
enabled=0
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release
[root@mbpc crash]#

 

but nothing.  So we change to the sl-updates.repo file as indicated here. 

[root@mbpc yum.repos.d]# yum install kernel-debuginfo.x86_64 kernel-debug-debuginfo.x86_64
=====================================================================================================================

 Package                                 Arch            Version                         Repository       Size
===============================================================================================================
Installing:
 kernel-debug-debuginfo                  x86_64          2.6.32-358.11.1.el6             sl-debuginfo    250 M
 kernel-debuginfo                        x86_64          2.6.32-358.11.1.el6             sl-debuginfo    244 M
Installing for dependencies:
 kernel-debuginfo-common-x86_64          x86_64          2.6.32-358.11.1.el6             sl-debuginfo     37 M

Transaction Summary
===============================================================================================================
Install       3 Package(s)

Total download size: 531 M
Installed size: 3.0 G
Is this ok [y/N]:

Just say YES and proceed.  sl-other.repo should have the above defined as follows:

[sl-debuginfo]
name=Scientific Linux Debuginfo
baseurl=http://ftp.scientificlinux.org/linux/scientific/$releasever/archive/debuginfo/
                http://ftp1.scientificlinux.org/linux/scientific/$releasever/archive/debuginfo/
                http://ftp2.scientificlinux.org/linux/scientific/$releasever/archive/debuginfo/
                ftp://ftp.scientificlinux.org/linux/scientific/$releasever/archive/debuginfo/
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-sl file:///etc/pki/rpm-gpg/RPM-GPG-KEY-dawson

If enabled=0, change it to 1 or use the –enablerepo sl-debuginfo command line option to yum to enable it temporarily at runtime.  If you have another debug info repo enabled, you might want to move it out of the way or change it's priority using priority=0 within the repo or similar.  Now you could use either:

# debuginfo-install kernel
# yum install kernel-debuginfo.x86_64 kernel-debug-debuginfo.x86_64

to install the debuginfo items as per above.  The latter worked for us but will take some time (500+MB zipped.)  Once installed, verify everything exists:

[root@mbpc debug]# ls -al /usr/lib/debug/lib/modules/2.6.32-358.11.1.el6.x86_64.debug/
total 134808
drwxr-xr-x.  4 root root      4096 Jul  7 23:53 .
drwxr-xr-x.  4 root root      4096 Jul  7 23:53 ..
drwxr-xr-x. 11 root root      4096 Jul  7 23:53 kernel
drwxr-xr-x.  2 root root      4096 Jul  7 23:53 vdso
-rwxr-xr-x.  1 root root 138022520 Jun 11 18:57 vmlinux
[root@mbpc debug]#

and run crash in this manner:

# crash vmcore /usr/lib/debug/lib/modules/2.6.32-358.11.1.el6.x86_64.debug/vmlinux

got me:

crash: cannot resolve: "xtime"

which I then resolved by upgrading the ENTIRE OS to 6.4 from 6.3 then getting this package:

http://rpmfind.net/linux/RPM/centos/6.4/x86_64/Packages/crash-devel-6.1.0-1.el6.x86_64.html

* Thu Aug 23 2012 Dave Anderson <anderson@redhat.com> - 6.0.9-1.el6
  - Fix for "crash: cannot resolve: xtime" session invocation failure.
    Resolves: rhbz#843093
  - Enhance "struct" command to accept -o option with an address argument.
    Resolves: rhbz#834260
  - Rebase to upstream version 6.0.9.
  - Support for compressed/filtered ppc64 firmware-assisted dump (fadump).
    Resolves: rhbz#840051

So we get the latest packages and install them using rpm -Uvh:

[root@mbpc Linux]# rpm -aq|grep -i crash
crash-devel-6.1.0-1.el6.x86_64
crash-6.1.0-1.el6.x86_64
crash-trace-command-1.0-4.el6.x86_64
crash-gcore-command-1.0-3.el6.x86_64
crash-debuginfo-6.1.0-1.el6.x86_64
crash-trace-command-debuginfo-1.0-4.el6.x86_64
[root@mbpc Linux]# ls -altri
total 2468
41943045 -rw-r–r–.  1 root root   28468 Jun 24  2012 crash-trace-command-1.0-4.el6.x86_64.rpm
41943044 -rw-r–r–.  1 root root   36608 Jun 24  2012 crash-gcore-command-1.0-3.el6.x86_64.rpm
41943043 -rw-r–r–.  1 root root   65736 Feb 23 12:42 crash-devel-6.1.0-1.el6.x86_64.rpm
       2 drwxr-xr-x. 25 root root    4096 Jul  8 00:39 ..
41943042 -rw-r–r–.  1 root root 2380992 Jul  8 00:39 crash-6.1.0-1.el6.x86_64.rpm
41943041 drwxr-xr-x.  2 root root    4096 Jul  8 00:43 .
[root@mbpc Linux]#

And we try again but get:

crash: page excluded: kernel virtual address: ffffffff81a97918  type: "pv_init_ops"
crash: page excluded: kernel virtual address: ffffffff81eafda8  type: "timekeeper xtime"
crash: page excluded: kernel virtual address: ffffffff81a8e944  type: "init_uts_ns"
crash: /usr/lib/debug/lib/modules/2.6.32-358.11.1.el6.x86_64/vmlinux and vmcore do not match!

What we need is the 6.2 not the 11.1 so we download and install manually:

kernel-debuginfo-common-x86_64-2.6.32-358.6.2.el6.x86_64.rpm
kernel-debug-debuginfo-2.6.32-358.6.2.el6.x86_64.rpm

kernel-debuginfo-2.6.32-358.6.2.el6.x86_64.rpm

And try again with the above crash command to finally get these results (Why it picked 11.1 from above site I'm not sure but probably defaulted to the latest.):

crash: page excluded: kernel virtual address: ffffffff81a97918  type: "pv_init_ops"
crash: page excluded: kernel virtual address: ffffffff81eafda8  type: "timekeeper xtime"
crash: page excluded: kernel virtual address: ffffffff81a8e944  type: "init_uts_ns"
crash: /usr/lib/debug/lib/modules/2 and vmcore do not match!

This is because I didn't install the kernel-debug-debuginfo one above.  And we finally have a winner:

[root@mbpc 127.0.0.1-2013-07-07-18:04:52]# crash vmcore /usr/lib/debug/lib/modules/2.6.32-358.6.2.el6.x86_64.debug/vmlinux

crash 6.1.0-1.el6
Copyright (C) 2002-2012  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

GNU gdb (GDB) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"…

      KERNEL: /usr/lib/debug/lib/modules/2.6.32-358.6.2.el6.x86_64.debug/vmlinux
    DUMPFILE: vmcore  [PARTIAL DUMP]
        CPUS: 2
        DATE: Sun Jul  7 18:03:30 2013
      UPTIME: 00:03:31
LOAD AVERAGE: 1.85, 1.42, 0.59
       TASKS: 118
    NODENAME: mbpc
     RELEASE: 2.6.32-358.6.2.el6.x86_64.debug
     VERSION: #1 SMP Thu May 16 11:38:53 CDT 2013
     MACHINE: x86_64  (2310 Mhz)
      MEMORY: 4 GB
       PANIC: ""
         PID: 31
     COMMAND: "khubd"
        TASK: ffff88012d8e0800  [THREAD_INFO: ffff88012d8de000]
         CPU: 1
       STATE: TASK_RUNNING (PANIC)

crash>

On to the debugging:

crash> log

…..
sd 9:0:0:0: [sdg] Synchronizing SCSI cache
sd 9:0:0:0: [sdg] Stopping disk
sd 5:0:0:0: [sdf] Stopping disk
sd 4:0:0:0: [sde] Stopping disk
sd 3:0:0:0: [sdd] Stopping disk
sd 2:0:0:0: [sdc] Stopping disk
sd 1:0:0:0: [sdb] Stopping disk
sd 0:0:0:0: [sda] Stopping disk
e1000 0000:04:06.0: PCI INT A disabled
e1000 0000:04:06.0: PME# enabled
pci 0000:00:14.4: wake-up capability enabled by ACPI
r8169 0000:03:00.0: PME# enabled
ACPI: Preparing to enter system sleep state S5
usb 3-3: USB disconnect, device number 2
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/block/dm-0/dm/name
CPU 1
Modules linked in: ebtable_nat ebtables bonding xt_CHECKSUM bridge fuse nfsd lockd nfs_acl auth_rpcgss autofs4 sunrpc powernow_k8 freq_table mperf 8021q garp stp llc ipt_REJECT ipt_LOG xt_multiport ip6t_REJECT ipv6 xfs exportfs dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun kvm_amd kvm uinput snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_event snd_seq_midi_emul snd_emu10k1 snd_rawmidi snd_ac97_codec ac97_bus snd_util_mem raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx e1000 microcode sg serio_raw edac_core edac_mce_amd k10temp shpchp i2c_piix4 r8169 mii snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc ext4 mbcache jbd2 sr_mod cdrom firewire_ohci firewire_core crc_itu_t sd_mod crc_t10dif pata_acpi ata_generic pata_jmicron ahci radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mod [last unloaded: cpufreq_ondemand]

Pid: 31, comm: khubd Tainted: G        W  —————    2.6.32-358.6.2.el6.x86_64.debug #1 Gigabyte Technology Co., Ltd. GA-890XA-UD3/GA-890XA-UD3
RIP: 0010:[]  [] sysfs_addrm_start+0x3f/0xd0
RSP: 0018:ffff88012d8dfbf0  EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88012d8dfc20 RCX: 0000000000000000
RDX: 2222222222222222 RSI: 0000000000000000 RDI: ffff88012d8dfb70
RBP: ffff88012d8dfc10 R08: 0000000000000000 R09: 0000000000000001
R10: 2222222222222222 R11: 2222222222222222 R12: 6b6b6b6b6b6b6b6b
R13: ffff88012d8dfc80 R14: ffff8801284b60f0 R15: 000000000000001f
FS:  00007fcc31ada700(0000) GS:ffff88002c200000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007fcc311ebeb0 CR3: 00000001294bd000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process khubd (pid: 31, threadinfo ffff88012d8de000, task ffff88012d8e0800)
Stack:
 ffff88012d8dfc80 ffff8801284b6090 ffff88012d8dfc20 6b6b6b6b6b6b6b6b
<d> ffff88012d8dfc60 ffffffff81217e88 6b6b6b6b6b6b6b6b 0000000000000000
<d> 0000000000000000 0000000000000000 000000001f3d17bb ffff88012e7f4980
Call Trace:
 [] sysfs_hash_and_remove+0×38/0×90
 [] sysfs_remove_link+0×21/0×30
 [] device_remove_sys_dev_entry+0×65/0×90
 [] device_del+0x1b0/0x1e0
 [] usb_disconnect+0×103/0x1f0
 [] hub_thread+0x6ac/0x1a60
 [] ? trace_hardirqs_on_caller+0x14d/0×190
 [] ? autoremove_wake_function+0×0/0×40
 [] ? hub_thread+0×0/0x1a60
 [] kthread+0×96/0xa0
 [] child_rip+0xa/0×20
 [] ? restore_args+0×0/0×30
 [] ? kthread+0×0/0xa0
 [] ? child_rip+0×0/0×20
Code: 89 f4 48 c7 47 08 00 00 00 00 48 c7 47 10 00 00 00 00 48 c7 47 18 00 00 00 00 48 c7 c7 40 34 8b 81 48 89 33 31 f6 e8 21 70 32 00 <49> 8b 74 24 78 48 8b 3d c5 6c c7 01 4c 89 e1 48 c7 c2 f0 94 21
RIP  [] sysfs_addrm_start+0x3f/0xd0
 RSP <ffff88012d8dfbf0>
crash>

What we quickly notice is that there is a:

general protection fault: 0000 [#1] SMP
comm: khubd Tainted: G

The first suggesting possible memory issue however let's try to use one of the other kernels we've been having a problem with and see what those show.  The result, all show the khubd kernel process crashing the system in all crashes:

[root@mbpc 127.0.0.1-2013-07-07-16:03:20]# ps -ef|grep -i khubd
root        31     2  0 07:22 ?        00:00:00 [khubd]
root      4302 25687  0 18:39 pts/0    00:00:00 grep -i khubd
[root@mbpc 127.0.0.1-2013-07-07-16:03:20]#

 

crash> log

…..

e1000 0000:04:06.0: PCI INT A disabled
e1000 0000:04:06.0: PME# enabled
pci 0000:00:14.4: wake-up capability enabled by ACPI
r8169 0000:03:00.0: PME# enabled
ACPI: Preparing to enter system sleep state S5
usb 3-3: USB disconnect, device number 2
general protection fault: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/block/dm-0/dm/name
CPU 0
Modules linked in: ebtable_nat ebtables bonding xt_CHECKSUM bridge fuse nfsd lockd nfs_acl auth_rpcgss autofs4 sunrpc powernow_k8 freq_table mperf 8021q garp stp llc ipt_REJECT ipt_LOG xt_multiport ip6t_REJECT ipv6 xfs exportfs dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun kvm_amd kvm uinput snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_event snd_seq_midi_emul snd_emu10k1 snd_rawmidi snd_ac97_codec ac97_bus snd_util_mem raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx e1000 microcode serio_raw sg edac_core edac_mce_amd k10temp shpchp i2c_piix4 r8169 mii snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc ext4 mbcache jbd2 sr_mod cdrom firewire_ohci firewire_core crc_itu_t sd_mod crc_t10dif pata_acpi ata_generic pata_jmicron ahci radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mod [last unloaded: cpufreq_ondemand]

Pid: 31, comm: khubd Tainted: G        W  —————    2.6.32-358.6.2.el6.x86_64.debug #1 Gigabyte Technology Co., Ltd. GA-890XA-UD3/GA-890XA-UD3
RIP: 0010:[]  [] sysfs_addrm_start+0x3f/0xd0
RSP: 0018:ffff88012d8dfbf0  EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffff88012d8dfc20 RCX: 0000000000000000
RDX: 2222222222222222 RSI: 0000000000000000 RDI: ffff88012d8dfb70
RBP: ffff88012d8dfc10 R08: 0000000000000000 R09: 0000000000000001
R10: 2222222222222222 R11: 2222222222222222 R12: 6b6b6b6b6b6b6b6b
R13: ffff88012d8dfc80 R14: ffff8801284af0f0 R15: 000000000000001f
FS:  00007f6a5fce8700(0000) GS:ffff88002c000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007f6a5f3f9eb0 CR3: 0000000125420000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process khubd (pid: 31, threadinfo ffff88012d8de000, task ffff88012d8e0800)
Stack:
 ffff88012d8dfc80 ffff8801284af090 ffff88012d8dfc20 6b6b6b6b6b6b6b6b
<d> ffff88012d8dfc60 ffffffff81217e88 6b6b6b6b6b6b6b6b 0000000000000000
<d> 0000000000000000 0000000000000000 0000000036563398 ffff88012e7f4980
Call Trace:
 [] sysfs_hash_and_remove+0×38/0×90
 [] sysfs_remove_link+0×21/0×30
 [] device_remove_sys_dev_entry+0×65/0×90
 [] device_del+0x1b0/0x1e0
 [] usb_disconnect+0×103/0x1f0
 [] hub_thread+0x6ac/0x1a60
 [] ? trace_hardirqs_on_caller+0x14d/0×190
 [] ? autoremove_wake_function+0×0/0×40
 [] ? hub_thread+0×0/0x1a60
 [] kthread+0×96/0xa0
 [] child_rip+0xa/0×20
 [] ? restore_args+0×0/0×30
 [] ? kthread+0×0/0xa0
 [] ? child_rip+0×0/0×20
Code: 89 f4 48 c7 47 08 00 00 00 00 48 c7 47 10 00 00 00 00 48 c7 47 18 00 00 00 00 48 c7 c7 40 34 8b 81 48 89 33 31 f6 e8 21 70 32 00 <49> 8b 74 24 78 48 8b 3d c5 6c c7 01 4c 89 e1 48 c7 c2 f0 94 21
RIP  [] sysfs_addrm_start+0x3f/0xd0
 RSP <ffff88012d8dfbf0>
crash>

The vcore-incomplete files give the following error:

crash: seek error: kernel virtual address: ffffffff81565ae0  type: "cpu_possible_mask"

so we can't do much with those.  The only exception was an md0 raid error that crashed the kernel at 2013-06-30-01:02:15 :

crash> log
…….
md/raid:md0: read error corrected (8 sectors at 14640 on sdb)
md/raid:md0: read error corrected (8 sectors at 14648 on sdb)
__ratelimit: 3 callbacks suppressed
sd 1:0:0:0: [sdb] Unhandled sense code
sd 1:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 1:0:0:0: [sdb] Sense Key : Medium Error [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
        00 00 39 45
sd 1:0:0:0: [sdb] Add. Sense: Unrecovered read error – auto reallocate failed
sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 00 00 38 08 00 04 00 00
__ratelimit: 3 callbacks suppressed
md/raid:md0: Disk failure on sdb, disabling device.
md/raid:md0: Operation continuing on 5 devices.
md/raid:md0: read error NOT corrected!! (sector 14736 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14744 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14752 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14760 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14768 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14776 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14784 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14792 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14800 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14808 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14816 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14824 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14832 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14840 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14848 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14856 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14864 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14872 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14880 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14888 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14896 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14904 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14912 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14920 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14928 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14936 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14944 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14952 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14960 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14968 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14976 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14984 on sdb).
md/raid:md0: read error NOT corrected!! (sector 14992 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15000 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15008 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15016 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15024 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15032 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15040 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15048 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15056 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15064 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15072 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15080 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15088 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15096 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15104 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15112 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15120 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15128 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15136 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15144 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15152 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15160 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15168 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15176 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15184 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15192 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15200 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15208 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15216 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15224 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15232 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15240 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15248 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15256 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15264 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15272 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15280 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15288 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15296 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15304 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15312 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15320 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15328 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15336 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15344 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15352 on sdb).
md/raid:md0: read error NOT corrected!! (sector 15360 on sdb).
ata2: EH complete
————[ cut here ]————
kernel BUG at drivers/md/raid5.c:3013!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/module/ip_tables/initstate
CPU 0
Modules linked in: iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ebtable_nat ebtables xt_CHECKSUM bridge vhost_net macvtap macvlan tun kvm_amd kvm bonding fuse nfsd lockd nfs_acl auth_rpcgss autofs4 sunrpc cpufreq_ondemand powernow_k8 freq_table mperf 8021q garp stp llc ipt_REJECT ipt_LOG xt_multiport ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 xfs exportfs dm_mirror dm_region_hash dm_log uinput snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_event snd_seq_midi_emul snd_emu10k1 snd_rawmidi snd_ac97_codec ac97_bus snd_util_mem raid456 async_raid6_recov async_pq e1000 raid6_pq async_xor xor async_memcpy async_tx microcode serio_raw k10temp edac_core edac_mce_amd sg shpchp i2c_piix4 r8169 mii snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc ext4 mbcache jbd2 firewire_ohci firewire_core crc_itu_t sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic pata_jmicron ahci radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core dm_mod [last unloaded: ip_tables]

Pid: 1059, comm: md0_raid6 Tainted: G        W  —————    2.6.32-358.6.2.el6.x86_64.debug #1 Gigabyte Technology Co., Ltd. GA-890XA-UD3/GA-890XA-UD3
RIP: 0010:[]  [] handle_stripe+0x262c/0x2a90 [raid456]
RSP: 0000:ffff88012c75dbe0  EFLAGS: 00010297
RAX: 0000000000000001 RBX: ffff880106aed778 RCX: 0000000000000001
RDX: 0000000000000005 RSI: 0000000000000002 RDI: 0000000000000002
RBP: ffff88012c75dd50 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000006 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f3af7fff700(0000) GS:ffff88002c000000(0000) knlGS:00000000f773f6c0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000004271ac8 CR3: 000000010a0f9000 CR4: 00000000000007f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process md0_raid6 (pid: 1059, threadinfo ffff88012c75c000, task ffff880127cd0d00)
Stack:
 ffffffffa03bc82e ffffffff81057e62 ffff88012c75dc50 ffff88012c75dc70
<d> 0000000000000000 ffffffff81057e62 ffff880100000000 ffff88012736a9c8
<d> 0000000000000286 ffff880126afa8a8 ffff880126afa8c0 0000000000000046
Call Trace:
 [] ? handle_stripe+0x7e/0x2a90 [raid456]
 [] ? __wake_up+0×32/0×70
 [] ? __wake_up+0×32/0×70
 [] ? raid5d+0x46a/0×650 [raid456]
 [] raid5d+0x42c/0×650 [raid456]
 [] ? trace_hardirqs_on+0xd/0×10
 [] md_thread+0×116/0×150
 [] ? autoremove_wake_function+0×0/0×40
 [] ? md_thread+0×0/0×150
 [] kthread+0×96/0xa0
 [] child_rip+0xa/0×20
 [] ? _spin_unlock_irq+0×30/0×40
 [] ? restore_args+0×0/0×30
 [] ? kthread+0×0/0xa0
 [] ? child_rip+0×0/0×20
Code: ff 44 8b 9d c0 fe ff ff 0f 84 9f e0 ff ff 49 8b 54 24 58 48 29 55 c0 e9 91 e0 ff ff f0 80 4b 41 04 f0 80 4b 40 01 e9 b1 fb ff ff <0f> 0b 66 90 eb fc 48 8b 53 20 4c 89 f7 48 c1 e7 04 48 8b 92 28
RIP  [] handle_stripe+0x262c/0x2a90 [raid456]
 RSP <ffff88012c75dbe0>
crash>

Time to upgrade the kernel to 2.6.32-358.11.1.el6 and try booting again.  Hopefully we see less issues and I won't have to use the debug kernel anymore.

Cheers,
TK

P.S.  A great source.

Leave a Reply

 


     
  Copyright © 2003 - 2013 Tom Kacperski (microdevsys.com). All rights reserved.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License