Header Shadow Image


failed command: READ FPDMA QUEUED FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE

So my last Seagate SATA drive in my RAID 6 Array died spectacularly taking out my 4.8.4 Kernel and locking up my storage to the point where the only way I can get to it is via the kernel boot parameter init=/bin/bash .  The disk lasted about 5.762 years:  

[root@rfc1178-01 log]# smartctl -A /dev/sdd
smartctl 6.1 2013-03-16 r3800 [i686-linux-3.10.5-201.fc19.i686] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   082   082   006    Pre-fail  Always       –       49816764
  3 Spin_Up_Time            0x0003   095   095   000    Pre-fail  Always       –       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       –       358
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       –       0
  7 Seek_Error_Rate         0x000f   082   060   030    Pre-fail  Always       –       199979728
  9 Power_On_Hours          0x0032   043   043   000    Old_age   Always       –       50479
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       –       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       –       173
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       –       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       –       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       –       665
188 Command_Timeout         0x0032   099   099   000    Old_age   Always       –       65540
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       –       0
190 Airflow_Temperature_Cel 0x0022   069   059   045    Old_age   Always       –       31 (Min/Max 23/31)
194 Temperature_Celsius     0x0022   031   041   000    Old_age   Always       –       31 (0 20 0 0 0)
195 Hardware_ECC_Recovered  0x001a   039   018   000    Old_age   Always       –       49816764
197 Current_Pending_Sector  0x0012   099   098   000    Old_age   Always       –       42
198 Offline_Uncorrectable   0x0010   099   098   000    Old_age   Offline      –       42
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       –       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      –       266288022969
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      –       1037691197
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      –       1219786117

[root@rfc1178-01 log]#  hdparm -i /dev/sdd

/dev/sdd:

 Model=ST31000520AS, FwRev=CC32, SerialNo=9VX0WJKA
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=unknown, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=1953525168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6
 AdvancedPM=yes: unknown setting WriteCache=enabled
 Drive conforms to: unknown:  ATA/ATAPI-4,5,6,7

 * signifies the current active mode

[root@rfc1178-01 log]#
[root@rfc1178-01 log]#
[root@rfc1178-01 log]# fdisk -l /dev/sdd

Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes, 1953525168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

[root@rfc1178-01 log]#

And with these errors in the /var/log/messages ( /root/spectacular-failure-messages ):

Mar 19 15:49:09 mbpc-pc kernel: ata4.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
Mar 19 15:49:09 mbpc-pc kernel: ata4.00: irq_stat 0x40000008
Mar 19 15:49:09 mbpc-pc kernel: ata4.00: failed command: READ FPDMA QUEUED
Mar 19 15:49:09 mbpc-pc kernel: ata4.00: cmd 60/40:70:40:2b:b2/05:00:3b:00:00/40 tag 14 ncq dma 688128 in
Mar 19 15:49:09 mbpc-pc kernel:         res 41/40:40:ff:2b:b2/00:05:3b:00:00/00 Emask 0x409 (media error) <F>
Mar 19 15:49:09 mbpc-pc kernel: ata4.00: status: { DRDY ERR }
Mar 19 15:49:09 mbpc-pc kernel: ata4.00: error: { UNC }
Mar 19 15:49:09 mbpc-pc kernel: qla2xxx [0000:04:00.0]-680a:20: Loop down – seconds remaining 160.
Mar 19 15:49:09 mbpc-pc kernel: ata4.00: configured for UDMA/133
Mar 19 15:49:09 mbpc-pc kernel: sd 7:0:0:0: [sdd] tag#14 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mar 19 15:49:09 mbpc-pc kernel: sd 7:0:0:0: [sdd] tag#14 Sense Key : Medium Error [current]
Mar 19 15:49:09 mbpc-pc kernel: sd 7:0:0:0: [sdd] tag#14 Add. Sense: Unrecovered read error – auto reallocate failed
Mar 19 15:49:09 mbpc-pc kernel: sd 7:0:0:0: [sdd] tag#14 CDB: Read(10) 28 00 3b b2 2b 40 00 05 40 00
Mar 19 15:49:09 mbpc-pc kernel: blk_update_request: I/O error, dev sdd, sector 1001532415
Mar 19 15:49:09 mbpc-pc kernel: ata4: EH complete

 

And with these following:

Mar 19 15:54:20 mbpc-pc kernel: blk_update_request: I/O error, dev sdd, sector 1001534264
Mar 19 15:54:20 mbpc-pc kernel: ata4: EH complete
Mar 19 15:54:24 mbpc-pc kernel: ata4.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
Mar 19 15:54:24 mbpc-pc kernel: ata4.00: irq_stat 0x40000008
Mar 19 15:54:24 mbpc-pc kernel: ata4.00: failed command: READ FPDMA QUEUED
Mar 19 15:54:24 mbpc-pc kernel: ata4.00: cmd 60/08:28:48:34:b2/00:00:3b:00:00/40 tag 5 ncq dma 4096 in
Mar 19 15:54:24 mbpc-pc kernel:         res 41/40:08:48:34:b2/00:00:3b:00:00/00 Emask 0x409 (media error) <F>
Mar 19 15:54:24 mbpc-pc kernel: ata4.00: status: { DRDY ERR }
Mar 19 15:54:24 mbpc-pc kernel: ata4.00: error: { UNC }
Mar 19 15:54:24 mbpc-pc kernel: ata4.00: configured for UDMA/133
Mar 19 15:54:24 mbpc-pc kernel: sd 7:0:0:0: [sdd] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mar 19 15:54:24 mbpc-pc kernel: sd 7:0:0:0: [sdd] tag#5 Sense Key : Medium Error [current]
Mar 19 15:54:24 mbpc-pc kernel: sd 7:0:0:0: [sdd] tag#5 Add. Sense: Unrecovered read error – auto reallocate failed
Mar 19 15:54:24 mbpc-pc kernel: sd 7:0:0:0: [sdd] tag#5 CDB: Read(10) 28 00 3b b2 34 48 00 00 08 00
Mar 19 15:54:24 mbpc-pc kernel: blk_update_request: I/O error, dev sdd, sector 1001534536
Mar 19 15:54:24 mbpc-pc kernel: ata4: EH complete
Mar 19 15:54:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-e872:20: qlt_24xx_atio_pkt_all_vps: qla_target(0): type d ox_id 0000
Mar 19 15:54:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-e82e:20: IMMED_NOTIFY ATIO
Mar 19 15:54:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-f826:20: qla_target(0): Port ID: 0x00:00:01 ELS opcode: 0x03
Mar 19 15:54:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-e81c:20: Sending TERM ELS CTIO (ha=ffff88010ef90000)
Mar 19 15:54:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-f897:20: Linking sess ffff8800c3f84b40 [0] wwn 50:01:43:80:16:77:99:38 with PLOGI ACK to wwn 50:01:43:80:16:77:99:38 s_id 01:00:00, ref=1
Mar 19 15:54:24 mbpc-pc kernel: qla2xxx [0000:04:00.0]-e862:20: qla_target(0): Unexpected NOTIFY_ACK received
Mar 19 15:54:26 mbpc-pc kernel: INFO: task kworker/1:2:96 blocked for more than 120 seconds.
Mar 19 15:54:26 mbpc-pc kernel:      Not tainted 4.8.4 #2
Mar 19 15:54:26 mbpc-pc kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 15:54:26 mbpc-pc kernel: kworker/1:2     D ffff8801115db718     0    96      2 0x0000000
Mar 19 15:54:26 mbpc-pc kernel: kworker/1:2     D ffff8801115db718     0    96      2 0x00000000
Mar 19 15:54:26 mbpc-pc kernel: Workqueue: qla_tgt_wq qlt_do_work [qla2xxx]
Mar 19 15:54:26 mbpc-pc kernel: ffff8801115db718 ffff8801115db688 ffff88011a83a300 ffff88011fc17a80
Mar 19 15:54:26 mbpc-pc kernel: ffff8801115d20c0 ffff880100000001 ffffffff8109075d 0000000000000000
Mar 19 15:54:26 mbpc-pc kernel: ffff88011ffdc5c0 ffff880100000000 0000000000000011 ffff880100000000
Mar 19 15:54:26 mbpc-pc kernel: Call Trace:
Mar 19 15:54:26 mbpc-pc kernel: [] ? ttwu_do_wakeup+0x1d/0xf0
Mar 19 15:54:26 mbpc-pc kernel: [] ? get_page_from_freelist+0x573/0x8a0
Mar 19 15:54:26 mbpc-pc kernel: [] ? ttwu_do_activate+0x7a/0x90
Mar 19 15:54:26 mbpc-pc kernel: [] schedule+0x40/0xb0
Mar 19 15:54:26 mbpc-pc kernel: [] rwsem_down_read_failed+0xae/0x100
Mar 19 15:54:26 mbpc-pc kernel: [] ? xfs_file_buffered_aio_read+0x5b/0xe0 [xfs]
Mar 19 15:54:26 mbpc-pc kernel: [] call_rwsem_down_read_failed+0x18/0x30
Mar 19 15:54:26 mbpc-pc kernel: [] down_read+0x24/0x40
Mar 19 15:54:26 mbpc-pc kernel: [] xfs_ilock+0xbe/0x120 [xfs]
Mar 19 15:54:26 mbpc-pc kernel: [] xfs_file_buffered_aio_read+0x5b/0xe0 [xfs]
Mar 19 15:54:26 mbpc-pc kernel: [] xfs_file_read_iter+0x77/0xd0 [xfs]
Mar 19 15:54:26 mbpc-pc kernel: [] vfs_iter_read+0x8b/0xd0
Mar 19 15:54:26 mbpc-pc kernel: [] fd_do_rw+0x11b/0x1e0 [target_core_file]
Mar 19 15:54:26 mbpc-pc kernel: [] fd_execute_rw+0x1aa/0x2ac [target_core_file]
Mar 19 15:54:26 mbpc-pc kernel: [] sbc_execute_rw+0x22/0x30 [target_core_mod]
Mar 19 15:54:26 mbpc-pc kernel: [] __target_execute_cmd+0x8a/0xa0 [target_core_mod]
Mar 19 15:54:26 mbpc-pc kernel: [] target_execute_cmd+0xb7/0xf0 [target_core_mod]
Mar 19 15:54:26 mbpc-pc kernel: [] transport_generic_new_cmd+0x103/0x250 [target_core_mod]
Mar 19 15:54:26 mbpc-pc kernel: [] transport_handle_cdb_direct+0x39/0x90 [target_core_mod]
Mar 19 15:54:26 mbpc-pc kernel: [] target_submit_cmd_map_sgls+0x153/0x240 [target_core_mod]
Mar 19 15:54:26 mbpc-pc kernel: [] ? dequeue_task_fair+0x6e/0x850
Mar 19 15:54:26 mbpc-pc kernel: [] target_submit_cmd+0x59/0x60 [target_core_mod]
Mar 19 15:54:26 mbpc-pc kernel: [] ? put_prev_entity+0x31/0x3a0
Mar 19 15:54:26 mbpc-pc kernel: [] tcm_qla2xxx_handle_cmd+0x88/0xc0 [tcm_qla2xxx]
Mar 19 15:54:26 mbpc-pc kernel: [] __qlt_do_work+0x143/0x290 [qla2xxx]
Mar 19 15:54:26 mbpc-pc kernel: [] qlt_do_work+0x62/0x80 [qla2xxx]
Mar 19 15:54:26 mbpc-pc kernel: [] process_one_work+0x189/0x4e0
Mar 19 15:54:26 mbpc-pc kernel: [] ? del_timer_sync+0x4c/0x60
Mar 19 15:54:26 mbpc-pc kernel: [] ? maybe_create_worker+0x8e/0x110
Mar 19 15:54:26 mbpc-pc kernel: [] ? schedule+0x40/0xb0
Mar 19 15:54:26 mbpc-pc kernel: [] worker_thread+0x16d/0x520
Mar 19 15:54:26 mbpc-pc kernel: [] ? default_wake_function+0x12/0x20
Mar 19 15:54:26 mbpc-pc kernel: [] ? __wake_up_common+0x56/0x90
Mar 19 15:54:26 mbpc-pc kernel: [] ? maybe_create_worker+0x110/0x110
Mar 19 15:54:26 mbpc-pc kernel: [] ? schedule+0x40/0xb0
Mar 19 15:54:26 mbpc-pc kernel: [] ? maybe_create_worker+0x110/0x110
Mar 19 15:54:26 mbpc-pc kernel: [] kthread+0xcc/0xf0
Mar 19 15:54:26 mbpc-pc kernel: [] ? schedule_tail+0x1e/0xc0
Mar 19 15:54:26 mbpc-pc kernel: [] ret_from_fork+0x1f/0x40
Mar 19 15:54:26 mbpc-pc kernel: [] ? kthread_freezable_should_stop+0x70/0x7
Mar 19 15:54:26 mbpc-pc kernel: INFO: task kworker/u16:4:262 blocked for more than 120 seconds.
Mar 19 15:54:26 mbpc-pc kernel:      Not tainted 4.8.4 #2
Mar 19 15:54:26 mbpc-pc kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 15:54:26 mbpc-pc kernel: kworker/u16:4   D ffff88011086fa18     0   262      2 0x00000000
Mar 19 15:54:26 mbpc-pc kernel: Workqueue: tmr-fileio target_tmr_work [target_core_mod]
Mar 19 15:54:26 mbpc-pc kernel: ffff88011086fa18 0000000000000400 ffff8800bfe1a500 ffff88011086f998
Mar 19 15:54:26 mbpc-pc kernel: ffff880110862000 ffffffff81f99ca0 ffffffff81f998ef ffff880100000000
Mar 19 15:54:26 mbpc-pc kernel: ffffffff812f27d9 ffff880100000000 ffffffff8109a2f8 0000000000000000
Mar 19 15:54:26 mbpc-pc kernel: Call Trace:
Mar 19 15:54:26 mbpc-pc kernel: [] ? number+0x2e9/0x310
Mar 19 15:54:26 mbpc-pc kernel: [] ? update_cfs_rq_load_avg+0x3d8/0x430
Mar 19 15:54:26 mbpc-pc kernel: [] schedule+0x40/0xb0
Mar 19 15:54:26 mbpc-pc kernel: [] ? start_flush_work+0x49/0x180
Mar 19 15:54:26 mbpc-pc kernel: [] schedule_timeout+0x9c/0xe0
Mar 19 15:54:26 mbpc-pc kernel: [] ? flush_work+0x1a/0x40
Mar 19 15:54:26 mbpc-pc kernel: [] ? console_unlock+0x35c/0x380
Mar 19 15:54:26 mbpc-pc kernel: [] wait_for_completion+0xc0/0xf0
Mar 19 15:54:26 mbpc-pc kernel: [] ? try_to_wake_up+0x260/0x260
Mar 19 15:54:26 mbpc-pc kernel: [] __transport_wait_for_tasks+0xb4/0x1b0 [target_core_mod]
Mar 19 15:54:26 mbpc-pc kernel: [] ? vprintk_default+0x1f/0x30
Mar 19 15:54:26 mbpc-pc kernel: [] ? printk+0x46/0x48
Mar 19 15:54:26 mbpc-pc kernel: [] transport_wait_for_tasks+0x44/0x60 [target_core_mod]
Mar 19 15:54:26 mbpc-pc kernel: [] core_tmr_abort_task+0xf2/0x160 [target_core_mod]
Mar 19 15:54:26 mbpc-pc kernel: [] target_tmr_work+0x154/0x160 [target_core_mod]
Mar 19 15:54:26 mbpc-pc kernel: [] process_one_work+0x189/0x4e0
Mar 19 15:54:26 mbpc-pc kernel: [] ? del_timer_sync+0x4c/0x60
Mar 19 15:54:26 mbpc-pc kernel: [] ? maybe_create_worker+0x8e/0x110
Mar 19 15:54:26 mbpc-pc kernel: [] ? schedule+0x40/0xb0
Mar 19 15:54:26 mbpc-pc kernel: [] worker_thread+0x16d/0x520
Mar 19 15:54:26 mbpc-pc kernel: [] ? default_wake_function+0x12/0x20
Mar 19 15:54:26 mbpc-pc kernel: [] ? __wake_up_common+0x56/0x90
Mar 19 15:54:26 mbpc-pc kernel: [] ? maybe_create_worker+0x110/0x110
Mar 19 15:54:26 mbpc-pc kernel: [] ? schedule+0x40/0xb0
Mar 19 15:54:26 mbpc-pc kernel: [] ? maybe_create_worker+0x110/0x110
Mar 19 15:54:26 mbpc-pc kernel: [] kthread+0xcc/0xf0
Mar 19 15:54:26 mbpc-pc kernel: [] ? schedule_tail+0x1e/0xc0
Mar 19 15:54:26 mbpc-pc kernel: [] ret_from_fork+0x1f/0x40
Mar 19 15:54:26 mbpc-pc kernel: [] ? kthread_freezable_should_stop+0x70/0x70
Mar 19 15:54:26 mbpc-pc kernel: INFO: task kworker/u16:8:294 blocked for more than 120 seconds.
Mar 19 15:54:26 mbpc-pc kernel:      Not tainted 4.8.4 #2
Mar 19 15:54:26 mbpc-pc kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 19 15:54:26 mbpc-pc kernel: kworker/u16:8   D ffff8801109d7a18     0   294      2 0x0000000
Mar 19 15:54:26 mbpc-pc kernel: Workqueue: tmr-fileio target_tmr_work [target_core_mod]
Mar 19 15:54:26 mbpc-pc kernel: ffff8801109d7a18 0000000000000400 ffff88011a84c380 ffff8801109d7998
Mar 19 15:54:26 mbpc-pc kernel: ffff88011090a240 ffffffff81f99ca0 ffffffff81f998ef ffff880100000000
Mar 19 15:54:26 mbpc-pc kernel: ffffffff812f27d9 0000000000000000 0000000000000000 0000000000000000
Mar 19 15:54:26 mbpc-pc kernel: Call Trace:
Mar 19 15:54:26 mbpc-pc kernel: [] ? number+0x2e9/0x310
Mar 19 15:54:26 mbpc-pc kernel: [] schedule+0x40/0xb0
Mar 19 15:54:26 mbpc-pc kernel: [] ? start_flush_work+0x49/0x180
Mar 19 15:54:26 mbpc-pc kernel: [] schedule_timeout+0x9c/0xe0
Mar 19 15:54:26 mbpc-pc kernel: [] ? flush_work+0x1a/0x40
Mar 19 15:54:26 mbpc-pc kernel: [] ? console_unlock+0x35c/0x380
Mar 19 15:54:26 mbpc-pc kernel: [] wait_for_completion+0xc0/0xf0
Mar 19 15:54:26 mbpc-pc kernel: [] ? try_to_wake_up+0x260/0x260
Mar 19 15:54:26 mbpc-pc kernel: [] __transport_wait_for_tasks+0xb4/0x1b0 [target_core_mod]
Mar 19 15:54:26 mbpc-pc kernel: [] ? vprintk_default+0x1f/0x30
Mar 19 15:54:26 mbpc-pc kernel: [] ? printk+0x46/0x48
Mar 19 15:54:26 mbpc-pc kernel: [] transport_wait_for_tasks+0x44/0x60 [target_core_mod]
Mar 19 15:54:26 mbpc-pc kernel: [] core_tmr_abort_task+0xf2/0x160 [target_core_mod]
Mar 19 15:54:26 mbpc-pc kernel: [] target_tmr_work+0x154/0x160 [target_core_mod]
Mar 19 15:54:26 mbpc-pc kernel: [] process_one_work+0x189/0x4e0
Mar 19 15:54:26 mbpc-pc kernel: [] ? del_timer_sync+0x4c/0x60
Mar 19 15:54:26 mbpc-pc kernel: [] ? maybe_create_worker+0x8e/0x110
Mar 19 15:54:26 mbpc-pc kernel: [] ? schedule+0x40/0xb0
Mar 19 15:54:26 mbpc-pc kernel: [] worker_thread+0x16d/0x520
Mar 19 15:54:26 mbpc-pc kernel: [] ? default_wake_function+0x12/0x20
Mar 19 15:54:26 mbpc-pc kernel: [] ? __wake_up_common+0x56/0x90
Mar 19 15:54:26 mbpc-pc kernel: [] ? maybe_create_worker+0x110/0x110
Mar 19 15:54:26 mbpc-pc kernel: [] ? schedule+0x40/0xb0
Mar 19 15:54:26 mbpc-pc kernel: [] ? maybe_create_worker+0x110/0x110
Mar 19 15:54:26 mbpc-pc kernel: [] kthread+0xcc/0xf0
Mar 19 15:54:26 mbpc-pc kernel: [] ? schedule_tail+0x1e/0xc0
Mar 19 15:54:26 mbpc-pc kernel: [] ret_from_fork+0x1f/0x40
Mar 19 15:54:26 mbpc-pc kernel: [] ? kthread_freezable_should_stop+0x70/0x70

…..

So with that goes the last disk of it's kind in this array with NO data loss to the array itself, over the last 8 years.

Cheers,
TK

 

 

 

 

Leave a Reply

You must be logged in to post a comment.


     
  Copyright © 2003 - 2013 Tom Kacperski (microdevsys.com). All rights reserved.

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 Unported License