1<html><head><meta http-equiv="Content-Type" content="text/html; charset=ANSI_X3.4-1968"><title>Chapter 7. ATA errors and exceptions</title><meta name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" href="index.html" title="libATA Developer's Guide"><link rel="up" href="index.html" title="libATA Developer's Guide"><link rel="prev" href="API-ata-scsi-dev-rescan.html" title="ata_scsi_dev_rescan"><link rel="next" href="exrec.html" title="EH recovery actions"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter 7. ATA errors and exceptions</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="API-ata-scsi-dev-rescan.html">Prev</a> </td><th width="60%" align="center"> </th><td width="20%" align="right"> <a accesskey="n" href="exrec.html">Next</a></td></tr></table><hr></div><div class="chapter"><div class="titlepage"><div><div><h1 class="title"><a name="ataExceptions"></a>Chapter 7. ATA errors and exceptions</h1></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl class="toc"><dt><span class="sect1"><a href="ataExceptions.html#excat">Exception categories</a></span></dt><dd><dl><dt><span class="sect2"><a href="ataExceptions.html#excatHSMviolation">HSM violation</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatDevErr">ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatATAPIcc">ATAPI device CHECK CONDITION</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatNCQerr">ATA device error (NCQ)</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatATAbusErr">ATA bus error</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatPCIbusErr">PCI bus error</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatLateCompletion">Late completion</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatUnknown">Unknown error (timeout)</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatHoplugPM">Hotplug and power management exceptions</a></span></dt></dl></dd><dt><span class="sect1"><a href="exrec.html">EH recovery actions</a></span></dt><dd><dl><dt><span class="sect2"><a href="exrec.html#exrecClr">Clearing error condition</a></span></dt><dt><span class="sect2"><a href="exrec.html#exrecRst">Reset</a></span></dt><dt><span class="sect2"><a href="exrec.html#exrecReconf">Reconfigure transport</a></span></dt></dl></dd></dl></div><p> 2 This chapter tries to identify what error/exception conditions exist 3 for ATA/ATAPI devices and describe how they should be handled in 4 implementation-neutral way. 5 </p><p> 6 The term 'error' is used to describe conditions where either an 7 explicit error condition is reported from device or a command has 8 timed out. 9 </p><p> 10 The term 'exception' is either used to describe exceptional 11 conditions which are not errors (say, power or hotplug events), or 12 to describe both errors and non-error exceptional conditions. Where 13 explicit distinction between error and exception is necessary, the 14 term 'non-error exception' is used. 15 </p><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="excat"></a>Exception categories</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="ataExceptions.html#excatHSMviolation">HSM violation</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatDevErr">ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatATAPIcc">ATAPI device CHECK CONDITION</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatNCQerr">ATA device error (NCQ)</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatATAbusErr">ATA bus error</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatPCIbusErr">PCI bus error</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatLateCompletion">Late completion</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatUnknown">Unknown error (timeout)</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatHoplugPM">Hotplug and power management exceptions</a></span></dt></dl></div><p> 16 Exceptions are described primarily with respect to legacy 17 taskfile + bus master IDE interface. If a controller provides 18 other better mechanism for error reporting, mapping those into 19 categories described below shouldn't be difficult. 20 </p><p> 21 In the following sections, two recovery actions - reset and 22 reconfiguring transport - are mentioned. These are described 23 further in <a class="xref" href="exrec.html" title="EH recovery actions">the section called “EH recovery actions”</a>. 24 </p><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="excatHSMviolation"></a>HSM violation</h3></div></div></div><p> 25 This error is indicated when STATUS value doesn't match HSM 26 requirement during issuing or execution any ATA/ATAPI command. 27 </p><div class="itemizedlist"><p class="title"><b>Examples</b></p><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p> 28 ATA_STATUS doesn't contain !BSY && DRDY && !DRQ while trying 29 to issue a command. 30 </p></li><li class="listitem"><p> 31 !BSY && !DRQ during PIO data transfer. 32 </p></li><li class="listitem"><p> 33 DRQ on command completion. 34 </p></li><li class="listitem"><p> 35 !BSY && ERR after CDB transfer starts but before the 36 last byte of CDB is transferred. ATA/ATAPI standard states 37 that "The device shall not terminate the PACKET command 38 with an error before the last byte of the command packet has 39 been written" in the error outputs description of PACKET 40 command and the state diagram doesn't include such 41 transitions. 42 </p></li></ul></div><p> 43 In these cases, HSM is violated and not much information 44 regarding the error can be acquired from STATUS or ERROR 45 register. IOW, this error can be anything - driver bug, 46 faulty device, controller and/or cable. 47 </p><p> 48 As HSM is violated, reset is necessary to restore known state. 49 Reconfiguring transport for lower speed might be helpful too 50 as transmission errors sometimes cause this kind of errors. 51 </p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="excatDevErr"></a>ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)</h3></div></div></div><p> 52 These are errors detected and reported by ATA/ATAPI devices 53 indicating device problems. For this type of errors, STATUS 54 and ERROR register values are valid and describe error 55 condition. Note that some of ATA bus errors are detected by 56 ATA/ATAPI devices and reported using the same mechanism as 57 device errors. Those cases are described later in this 58 section. 59 </p><p> 60 For ATA commands, this type of errors are indicated by !BSY 61 && ERR during command execution and on completion. 62 </p><p>For ATAPI commands,</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p> 63 !BSY && ERR && ABRT right after issuing PACKET 64 indicates that PACKET command is not supported and falls in 65 this category. 66 </p></li><li class="listitem"><p> 67 !BSY && ERR(==CHK) && !ABRT after the last 68 byte of CDB is transferred indicates CHECK CONDITION and 69 doesn't fall in this category. 70 </p></li><li class="listitem"><p> 71 !BSY && ERR(==CHK) && ABRT after the last byte 72 of CDB is transferred *probably* indicates CHECK CONDITION and 73 doesn't fall in this category. 74 </p></li></ul></div><p> 75 Of errors detected as above, the followings are not ATA/ATAPI 76 device errors but ATA bus errors and should be handled 77 according to <a class="xref" href="ataExceptions.html#excatATAbusErr" title="ATA bus error">the section called “ATA bus error”</a>. 78 </p><div class="variablelist"><dl class="variablelist"><dt><span class="term">CRC error during data transfer</span></dt><dd><p> 79 This is indicated by ICRC bit in the ERROR register and 80 means that corruption occurred during data transfer. Up to 81 ATA/ATAPI-7, the standard specifies that this bit is only 82 applicable to UDMA transfers but ATA/ATAPI-8 draft revision 83 1f says that the bit may be applicable to multiword DMA and 84 PIO. 85 </p></dd><dt><span class="term">ABRT error during data transfer or on completion</span></dt><dd><p> 86 Up to ATA/ATAPI-7, the standard specifies that ABRT could be 87 set on ICRC errors and on cases where a device is not able 88 to complete a command. Combined with the fact that MWDMA 89 and PIO transfer errors aren't allowed to use ICRC bit up to 90 ATA/ATAPI-7, it seems to imply that ABRT bit alone could 91 indicate transfer errors. 92 </p><p> 93 However, ATA/ATAPI-8 draft revision 1f removes the part 94 that ICRC errors can turn on ABRT. So, this is kind of 95 gray area. Some heuristics are needed here. 96 </p></dd></dl></div><p> 97 ATA/ATAPI device errors can be further categorized as follows. 98 </p><div class="variablelist"><dl class="variablelist"><dt><span class="term">Media errors</span></dt><dd><p> 99 This is indicated by UNC bit in the ERROR register. ATA 100 devices reports UNC error only after certain number of 101 retries cannot recover the data, so there's nothing much 102 else to do other than notifying upper layer. 103 </p><p> 104 READ and WRITE commands report CHS or LBA of the first 105 failed sector but ATA/ATAPI standard specifies that the 106 amount of transferred data on error completion is 107 indeterminate, so we cannot assume that sectors preceding 108 the failed sector have been transferred and thus cannot 109 complete those sectors successfully as SCSI does. 110 </p></dd><dt><span class="term">Media changed / media change requested error</span></dt><dd><p> 111 <<TODO: fill here>> 112 </p></dd><dt><span class="term">Address error</span></dt><dd><p> 113 This is indicated by IDNF bit in the ERROR register. 114 Report to upper layer. 115 </p></dd><dt><span class="term">Other errors</span></dt><dd><p> 116 This can be invalid command or parameter indicated by ABRT 117 ERROR bit or some other error condition. Note that ABRT 118 bit can indicate a lot of things including ICRC and Address 119 errors. Heuristics needed. 120 </p></dd></dl></div><p> 121 Depending on commands, not all STATUS/ERROR bits are 122 applicable. These non-applicable bits are marked with 123 "na" in the output descriptions but up to ATA/ATAPI-7 124 no definition of "na" can be found. However, 125 ATA/ATAPI-8 draft revision 1f describes "N/A" as 126 follows. 127 </p><div class="blockquote"><blockquote class="blockquote"><div class="variablelist"><dl class="variablelist"><dt><span class="term">3.2.3.3a N/A</span></dt><dd><p> 128 A keyword the indicates a field has no defined value in 129 this standard and should not be checked by the host or 130 device. N/A fields should be cleared to zero. 131 </p></dd></dl></div></blockquote></div><p> 132 So, it seems reasonable to assume that "na" bits are 133 cleared to zero by devices and thus need no explicit masking. 134 </p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="excatATAPIcc"></a>ATAPI device CHECK CONDITION</h3></div></div></div><p> 135 ATAPI device CHECK CONDITION error is indicated by set CHK bit 136 (ERR bit) in the STATUS register after the last byte of CDB is 137 transferred for a PACKET command. For this kind of errors, 138 sense data should be acquired to gather information regarding 139 the errors. REQUEST SENSE packet command should be used to 140 acquire sense data. 141 </p><p> 142 Once sense data is acquired, this type of errors can be 143 handled similarly to other SCSI errors. Note that sense data 144 may indicate ATA bus error (e.g. Sense Key 04h HARDWARE ERROR 145 && ASC/ASCQ 47h/00h SCSI PARITY ERROR). In such 146 cases, the error should be considered as an ATA bus error and 147 handled according to <a class="xref" href="ataExceptions.html#excatATAbusErr" title="ATA bus error">the section called “ATA bus error”</a>. 148 </p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="excatNCQerr"></a>ATA device error (NCQ)</h3></div></div></div><p> 149 NCQ command error is indicated by cleared BSY and set ERR bit 150 during NCQ command phase (one or more NCQ commands 151 outstanding). Although STATUS and ERROR registers will 152 contain valid values describing the error, READ LOG EXT is 153 required to clear the error condition, determine which command 154 has failed and acquire more information. 155 </p><p> 156 READ LOG EXT Log Page 10h reports which tag has failed and 157 taskfile register values describing the error. With this 158 information the failed command can be handled as a normal ATA 159 command error as in <a class="xref" href="ataExceptions.html#excatDevErr" title="ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)">the section called “ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)”</a> and all 160 other in-flight commands must be retried. Note that this 161 retry should not be counted - it's likely that commands 162 retried this way would have completed normally if it were not 163 for the failed command. 164 </p><p> 165 Note that ATA bus errors can be reported as ATA device NCQ 166 errors. This should be handled as described in <a class="xref" href="ataExceptions.html#excatATAbusErr" title="ATA bus error">the section called “ATA bus error”</a>. 167 </p><p> 168 If READ LOG EXT Log Page 10h fails or reports NQ, we're 169 thoroughly screwed. This condition should be treated 170 according to <a class="xref" href="ataExceptions.html#excatHSMviolation" title="HSM violation">the section called “HSM violation”</a>. 171 </p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="excatATAbusErr"></a>ATA bus error</h3></div></div></div><p> 172 ATA bus error means that data corruption occurred during 173 transmission over ATA bus (SATA or PATA). This type of errors 174 can be indicated by 175 </p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p> 176 ICRC or ABRT error as described in <a class="xref" href="ataExceptions.html#excatDevErr" title="ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)">the section called “ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)”</a>. 177 </p></li><li class="listitem"><p> 178 Controller-specific error completion with error information 179 indicating transmission error. 180 </p></li><li class="listitem"><p> 181 On some controllers, command timeout. In this case, there may 182 be a mechanism to determine that the timeout is due to 183 transmission error. 184 </p></li><li class="listitem"><p> 185 Unknown/random errors, timeouts and all sorts of weirdities. 186 </p></li></ul></div><p> 187 As described above, transmission errors can cause wide variety 188 of symptoms ranging from device ICRC error to random device 189 lockup, and, for many cases, there is no way to tell if an 190 error condition is due to transmission error or not; 191 therefore, it's necessary to employ some kind of heuristic 192 when dealing with errors and timeouts. For example, 193 encountering repetitive ABRT errors for known supported 194 command is likely to indicate ATA bus error. 195 </p><p> 196 Once it's determined that ATA bus errors have possibly 197 occurred, lowering ATA bus transmission speed is one of 198 actions which may alleviate the problem. See <a class="xref" href="exrec.html#exrecReconf" title="Reconfigure transport">the section called “Reconfigure transport”</a> for more information. 199 </p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="excatPCIbusErr"></a>PCI bus error</h3></div></div></div><p> 200 Data corruption or other failures during transmission over PCI 201 (or other system bus). For standard BMDMA, this is indicated 202 by Error bit in the BMDMA Status register. This type of 203 errors must be logged as it indicates something is very wrong 204 with the system. Resetting host controller is recommended. 205 </p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="excatLateCompletion"></a>Late completion</h3></div></div></div><p> 206 This occurs when timeout occurs and the timeout handler finds 207 out that the timed out command has completed successfully or 208 with error. This is usually caused by lost interrupts. This 209 type of errors must be logged. Resetting host controller is 210 recommended. 211 </p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="excatUnknown"></a>Unknown error (timeout)</h3></div></div></div><p> 212 This is when timeout occurs and the command is still 213 processing or the host and device are in unknown state. When 214 this occurs, HSM could be in any valid or invalid state. To 215 bring the device to known state and make it forget about the 216 timed out command, resetting is necessary. The timed out 217 command may be retried. 218 </p><p> 219 Timeouts can also be caused by transmission errors. Refer to 220 <a class="xref" href="ataExceptions.html#excatATAbusErr" title="ATA bus error">the section called “ATA bus error”</a> for more details. 221 </p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="excatHoplugPM"></a>Hotplug and power management exceptions</h3></div></div></div><p> 222 <<TODO: fill here>> 223 </p></div></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="API-ata-scsi-dev-rescan.html">Prev</a> </td><td width="20%" align="center"> </td><td width="40%" align="right"> <a accesskey="n" href="exrec.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top"><span class="phrase">ata_scsi_dev_rescan</span> </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> EH recovery actions</td></tr></table></div></body></html> 224