1<html><head><meta http-equiv="Content-Type" content="text/html; charset=ANSI_X3.4-1968"><title>Chapter&#160;7.&#160;ATA errors and exceptions</title><meta name="generator" content="DocBook XSL Stylesheets V1.78.1"><link rel="home" href="index.html" title="libATA Developer's Guide"><link rel="up" href="index.html" title="libATA Developer's Guide"><link rel="prev" href="API-ata-scsi-dev-rescan.html" title="ata_scsi_dev_rescan"><link rel="next" href="exrec.html" title="EH recovery actions"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">Chapter&#160;7.&#160;ATA errors and exceptions</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="API-ata-scsi-dev-rescan.html">Prev</a>&#160;</td><th width="60%" align="center">&#160;</th><td width="20%" align="right">&#160;<a accesskey="n" href="exrec.html">Next</a></td></tr></table><hr></div><div class="chapter"><div class="titlepage"><div><div><h1 class="title"><a name="ataExceptions"></a>Chapter&#160;7.&#160;ATA errors and exceptions</h1></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl class="toc"><dt><span class="sect1"><a href="ataExceptions.html#excat">Exception categories</a></span></dt><dd><dl><dt><span class="sect2"><a href="ataExceptions.html#excatHSMviolation">HSM violation</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatDevErr">ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatATAPIcc">ATAPI device CHECK CONDITION</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatNCQerr">ATA device error (NCQ)</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatATAbusErr">ATA bus error</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatPCIbusErr">PCI bus error</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatLateCompletion">Late completion</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatUnknown">Unknown error (timeout)</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatHoplugPM">Hotplug and power management exceptions</a></span></dt></dl></dd><dt><span class="sect1"><a href="exrec.html">EH recovery actions</a></span></dt><dd><dl><dt><span class="sect2"><a href="exrec.html#exrecClr">Clearing error condition</a></span></dt><dt><span class="sect2"><a href="exrec.html#exrecRst">Reset</a></span></dt><dt><span class="sect2"><a href="exrec.html#exrecReconf">Reconfigure transport</a></span></dt></dl></dd></dl></div><p>
2  This chapter tries to identify what error/exception conditions exist
3  for ATA/ATAPI devices and describe how they should be handled in
4  implementation-neutral way.
5  </p><p>
6  The term 'error' is used to describe conditions where either an
7  explicit error condition is reported from device or a command has
8  timed out.
9  </p><p>
10  The term 'exception' is either used to describe exceptional
11  conditions which are not errors (say, power or hotplug events), or
12  to describe both errors and non-error exceptional conditions.  Where
13  explicit distinction between error and exception is necessary, the
14  term 'non-error exception' is used.
15  </p><div class="sect1"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="excat"></a>Exception categories</h2></div></div></div><div class="toc"><dl class="toc"><dt><span class="sect2"><a href="ataExceptions.html#excatHSMviolation">HSM violation</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatDevErr">ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatATAPIcc">ATAPI device CHECK CONDITION</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatNCQerr">ATA device error (NCQ)</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatATAbusErr">ATA bus error</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatPCIbusErr">PCI bus error</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatLateCompletion">Late completion</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatUnknown">Unknown error (timeout)</a></span></dt><dt><span class="sect2"><a href="ataExceptions.html#excatHoplugPM">Hotplug and power management exceptions</a></span></dt></dl></div><p>
16     Exceptions are described primarily with respect to legacy
17     taskfile + bus master IDE interface.  If a controller provides
18     other better mechanism for error reporting, mapping those into
19     categories described below shouldn't be difficult.
20     </p><p>
21     In the following sections, two recovery actions - reset and
22     reconfiguring transport - are mentioned.  These are described
23     further in <a class="xref" href="exrec.html" title="EH recovery actions">the section called &#8220;EH recovery actions&#8221;</a>.
24     </p><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="excatHSMviolation"></a>HSM violation</h3></div></div></div><p>
25        This error is indicated when STATUS value doesn't match HSM
26        requirement during issuing or execution any ATA/ATAPI command.
27        </p><div class="itemizedlist"><p class="title"><b>Examples</b></p><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
28	ATA_STATUS doesn't contain !BSY &amp;&amp; DRDY &amp;&amp; !DRQ while trying
29	to issue a command.
30        </p></li><li class="listitem"><p>
31	!BSY &amp;&amp; !DRQ during PIO data transfer.
32        </p></li><li class="listitem"><p>
33	DRQ on command completion.
34        </p></li><li class="listitem"><p>
35	!BSY &amp;&amp; ERR after CDB transfer starts but before the
36        last byte of CDB is transferred.  ATA/ATAPI standard states
37        that "The device shall not terminate the PACKET command
38        with an error before the last byte of the command packet has
39        been written" in the error outputs description of PACKET
40        command and the state diagram doesn't include such
41        transitions.
42	</p></li></ul></div><p>
43	In these cases, HSM is violated and not much information
44	regarding the error can be acquired from STATUS or ERROR
45	register.  IOW, this error can be anything - driver bug,
46	faulty device, controller and/or cable.
47	</p><p>
48	As HSM is violated, reset is necessary to restore known state.
49	Reconfiguring transport for lower speed might be helpful too
50	as transmission errors sometimes cause this kind of errors.
51	</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="excatDevErr"></a>ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)</h3></div></div></div><p>
52	These are errors detected and reported by ATA/ATAPI devices
53	indicating device problems.  For this type of errors, STATUS
54	and ERROR register values are valid and describe error
55	condition.  Note that some of ATA bus errors are detected by
56	ATA/ATAPI devices and reported using the same mechanism as
57	device errors.  Those cases are described later in this
58	section.
59	</p><p>
60	For ATA commands, this type of errors are indicated by !BSY
61	&amp;&amp; ERR during command execution and on completion.
62	</p><p>For ATAPI commands,</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
63	!BSY &amp;&amp; ERR &amp;&amp; ABRT right after issuing PACKET
64	indicates that PACKET command is not supported and falls in
65	this category.
66	</p></li><li class="listitem"><p>
67	!BSY &amp;&amp; ERR(==CHK) &amp;&amp; !ABRT after the last
68	byte of CDB is transferred indicates CHECK CONDITION and
69	doesn't fall in this category.
70	</p></li><li class="listitem"><p>
71	!BSY &amp;&amp; ERR(==CHK) &amp;&amp; ABRT after the last byte
72        of CDB is transferred *probably* indicates CHECK CONDITION and
73        doesn't fall in this category.
74	</p></li></ul></div><p>
75	Of errors detected as above, the followings are not ATA/ATAPI
76	device errors but ATA bus errors and should be handled
77	according to <a class="xref" href="ataExceptions.html#excatATAbusErr" title="ATA bus error">the section called &#8220;ATA bus error&#8221;</a>.
78	</p><div class="variablelist"><dl class="variablelist"><dt><span class="term">CRC error during data transfer</span></dt><dd><p>
79	   This is indicated by ICRC bit in the ERROR register and
80	   means that corruption occurred during data transfer.  Up to
81	   ATA/ATAPI-7, the standard specifies that this bit is only
82	   applicable to UDMA transfers but ATA/ATAPI-8 draft revision
83	   1f says that the bit may be applicable to multiword DMA and
84	   PIO.
85	   </p></dd><dt><span class="term">ABRT error during data transfer or on completion</span></dt><dd><p>
86	   Up to ATA/ATAPI-7, the standard specifies that ABRT could be
87	   set on ICRC errors and on cases where a device is not able
88	   to complete a command.  Combined with the fact that MWDMA
89	   and PIO transfer errors aren't allowed to use ICRC bit up to
90	   ATA/ATAPI-7, it seems to imply that ABRT bit alone could
91	   indicate transfer errors.
92	   </p><p>
93	   However, ATA/ATAPI-8 draft revision 1f removes the part
94	   that ICRC errors can turn on ABRT.  So, this is kind of
95	   gray area.  Some heuristics are needed here.
96	   </p></dd></dl></div><p>
97	ATA/ATAPI device errors can be further categorized as follows.
98	</p><div class="variablelist"><dl class="variablelist"><dt><span class="term">Media errors</span></dt><dd><p>
99	   This is indicated by UNC bit in the ERROR register.  ATA
100	   devices reports UNC error only after certain number of
101	   retries cannot recover the data, so there's nothing much
102	   else to do other than notifying upper layer.
103	   </p><p>
104	   READ and WRITE commands report CHS or LBA of the first
105	   failed sector but ATA/ATAPI standard specifies that the
106	   amount of transferred data on error completion is
107	   indeterminate, so we cannot assume that sectors preceding
108	   the failed sector have been transferred and thus cannot
109	   complete those sectors successfully as SCSI does.
110	   </p></dd><dt><span class="term">Media changed / media change requested error</span></dt><dd><p>
111	   &lt;&lt;TODO: fill here&gt;&gt;
112	   </p></dd><dt><span class="term">Address error</span></dt><dd><p>
113	   This is indicated by IDNF bit in the ERROR register.
114	   Report to upper layer.
115	   </p></dd><dt><span class="term">Other errors</span></dt><dd><p>
116	   This can be invalid command or parameter indicated by ABRT
117	   ERROR bit or some other error condition.  Note that ABRT
118	   bit can indicate a lot of things including ICRC and Address
119	   errors.  Heuristics needed.
120	   </p></dd></dl></div><p>
121	Depending on commands, not all STATUS/ERROR bits are
122	applicable.  These non-applicable bits are marked with
123	"na" in the output descriptions but up to ATA/ATAPI-7
124	no definition of "na" can be found.  However,
125	ATA/ATAPI-8 draft revision 1f describes "N/A" as
126	follows.
127	</p><div class="blockquote"><blockquote class="blockquote"><div class="variablelist"><dl class="variablelist"><dt><span class="term">3.2.3.3a N/A</span></dt><dd><p>
128	   A keyword the indicates a field has no defined value in
129	   this standard and should not be checked by the host or
130	   device. N/A fields should be cleared to zero.
131	   </p></dd></dl></div></blockquote></div><p>
132	So, it seems reasonable to assume that "na" bits are
133	cleared to zero by devices and thus need no explicit masking.
134	</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="excatATAPIcc"></a>ATAPI device CHECK CONDITION</h3></div></div></div><p>
135	ATAPI device CHECK CONDITION error is indicated by set CHK bit
136	(ERR bit) in the STATUS register after the last byte of CDB is
137	transferred for a PACKET command.  For this kind of errors,
138	sense data should be acquired to gather information regarding
139	the errors.  REQUEST SENSE packet command should be used to
140	acquire sense data.
141	</p><p>
142	Once sense data is acquired, this type of errors can be
143	handled similarly to other SCSI errors.  Note that sense data
144	may indicate ATA bus error (e.g. Sense Key 04h HARDWARE ERROR
145	&amp;&amp; ASC/ASCQ 47h/00h SCSI PARITY ERROR).  In such
146	cases, the error should be considered as an ATA bus error and
147	handled according to <a class="xref" href="ataExceptions.html#excatATAbusErr" title="ATA bus error">the section called &#8220;ATA bus error&#8221;</a>.
148	</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="excatNCQerr"></a>ATA device error (NCQ)</h3></div></div></div><p>
149	NCQ command error is indicated by cleared BSY and set ERR bit
150	during NCQ command phase (one or more NCQ commands
151	outstanding).  Although STATUS and ERROR registers will
152	contain valid values describing the error, READ LOG EXT is
153	required to clear the error condition, determine which command
154	has failed and acquire more information.
155	</p><p>
156	READ LOG EXT Log Page 10h reports which tag has failed and
157	taskfile register values describing the error.  With this
158	information the failed command can be handled as a normal ATA
159	command error as in <a class="xref" href="ataExceptions.html#excatDevErr" title="ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)">the section called &#8220;ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)&#8221;</a> and all
160	other in-flight commands must be retried.  Note that this
161	retry should not be counted - it's likely that commands
162	retried this way would have completed normally if it were not
163	for the failed command.
164	</p><p>
165	Note that ATA bus errors can be reported as ATA device NCQ
166	errors.  This should be handled as described in <a class="xref" href="ataExceptions.html#excatATAbusErr" title="ATA bus error">the section called &#8220;ATA bus error&#8221;</a>.
167	</p><p>
168	If READ LOG EXT Log Page 10h fails or reports NQ, we're
169	thoroughly screwed.  This condition should be treated
170	according to <a class="xref" href="ataExceptions.html#excatHSMviolation" title="HSM violation">the section called &#8220;HSM violation&#8221;</a>.
171	</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="excatATAbusErr"></a>ATA bus error</h3></div></div></div><p>
172	ATA bus error means that data corruption occurred during
173	transmission over ATA bus (SATA or PATA).  This type of errors
174	can be indicated by
175	</p><div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem"><p>
176	ICRC or ABRT error as described in <a class="xref" href="ataExceptions.html#excatDevErr" title="ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)">the section called &#8220;ATA/ATAPI device error (non-NCQ / non-CHECK CONDITION)&#8221;</a>.
177	</p></li><li class="listitem"><p>
178	Controller-specific error completion with error information
179	indicating transmission error.
180	</p></li><li class="listitem"><p>
181	On some controllers, command timeout.  In this case, there may
182	be a mechanism to determine that the timeout is due to
183	transmission error.
184	</p></li><li class="listitem"><p>
185	Unknown/random errors, timeouts and all sorts of weirdities.
186	</p></li></ul></div><p>
187	As described above, transmission errors can cause wide variety
188	of symptoms ranging from device ICRC error to random device
189	lockup, and, for many cases, there is no way to tell if an
190	error condition is due to transmission error or not;
191	therefore, it's necessary to employ some kind of heuristic
192	when dealing with errors and timeouts.  For example,
193	encountering repetitive ABRT errors for known supported
194	command is likely to indicate ATA bus error.
195	</p><p>
196	Once it's determined that ATA bus errors have possibly
197	occurred, lowering ATA bus transmission speed is one of
198	actions which may alleviate the problem.  See <a class="xref" href="exrec.html#exrecReconf" title="Reconfigure transport">the section called &#8220;Reconfigure transport&#8221;</a> for more information.
199	</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="excatPCIbusErr"></a>PCI bus error</h3></div></div></div><p>
200	Data corruption or other failures during transmission over PCI
201	(or other system bus).  For standard BMDMA, this is indicated
202	by Error bit in the BMDMA Status register.  This type of
203	errors must be logged as it indicates something is very wrong
204	with the system.  Resetting host controller is recommended.
205	</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="excatLateCompletion"></a>Late completion</h3></div></div></div><p>
206	This occurs when timeout occurs and the timeout handler finds
207	out that the timed out command has completed successfully or
208	with error.  This is usually caused by lost interrupts.  This
209	type of errors must be logged.  Resetting host controller is
210	recommended.
211	</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="excatUnknown"></a>Unknown error (timeout)</h3></div></div></div><p>
212	This is when timeout occurs and the command is still
213	processing or the host and device are in unknown state.  When
214	this occurs, HSM could be in any valid or invalid state.  To
215	bring the device to known state and make it forget about the
216	timed out command, resetting is necessary.  The timed out
217	command may be retried.
218	</p><p>
219	Timeouts can also be caused by transmission errors.  Refer to
220	<a class="xref" href="ataExceptions.html#excatATAbusErr" title="ATA bus error">the section called &#8220;ATA bus error&#8221;</a> for more details.
221	</p></div><div class="sect2"><div class="titlepage"><div><div><h3 class="title"><a name="excatHoplugPM"></a>Hotplug and power management exceptions</h3></div></div></div><p>
222	&lt;&lt;TODO: fill here&gt;&gt;
223	</p></div></div></div><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="API-ata-scsi-dev-rescan.html">Prev</a>&#160;</td><td width="20%" align="center">&#160;</td><td width="40%" align="right">&#160;<a accesskey="n" href="exrec.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top"><span class="phrase">ata_scsi_dev_rescan</span>&#160;</td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top">&#160;EH recovery actions</td></tr></table></div></body></html>
224