Lines Matching refs:the

3 A CRC is a long-division remainder.  You add the CRC to the message,
4 and the whole thing (message+CRC) is a multiple of the given
5 CRC polynomial. To check the CRC, you can either check that the
6 CRC matches the recomputed value, *or* you can check that the
7 remainder computed on the message+CRC is 0. This latter approach
9 protocols put the end-of-frame flag after the CRC.
11 It's actually the same long division you learned in school, except that
12 - We're working in binary, so the digits are only 0 and 1, and
15 the difference between adding and subtracting.
17 Like all division, the remainder is always smaller than the divisor.
18 To produce a 32-bit CRC, the divisor is actually a 33-bit CRC polynomial.
19 Since it's 33 bits long, bit 32 is always going to be set, so usually the
20 CRC is written in hex with the most significant bit omitted. (If you're
21 familiar with the IEEE 754 floating-point format, it's the same idea.)
24 to decide on the endianness of the bits within each byte. To get
25 the best error-detecting properties, this should correspond to the
27 little-endian; the most significant bit (sometimes used for parity)
29 do it in the right order, matching the endianness.
32 Each step of the division you take one more digit (bit) of the dividend
33 and append it to the current remainder. Then you figure out the
34 appropriate multiple of the divisor to subtract to being the remainder
36 and to make the XOR cancel, it's just a copy of bit 32 of the remainder.
38 When computing a CRC, we don't care about the quotient, so we can
39 throw the quotient bit away, but subtract the appropriate multiple of
40 the polynomial from the remainder and we're back to where we started,
41 ready to process the next bit.
49 Notice how, to get at bit 32 of the shifted remainder, we look
50 at bit 31 of the remainder *before* shifting it.
52 But also notice how the next_input_bit() bits we're shifting into
53 the remainder don't actually affect any decision-making until
54 32 bits later. Thus, the first 32 cycles of this are pretty boring.
55 Also, to add the CRC to a message, we need a 32-bit-long hole for it at
56 the end, so we have to add 32 extra cycles shifting in zeros at the
59 These details lead to a standard trick: rearrange merging in the
60 next_input_bit() until the moment it's needed. Then the first 32 cycles
61 can be precomputed, and merging in the final 32 zero bits to make room
62 for the CRC can be skipped entirely. This changes the code to:
70 With this optimization, the little-endian code is particularly simple:
77 The most significant coefficient of the remainder polynomial is stored
78 in the least significant bit of the binary "remainder" variable.
82 As long as next_input_bit is returning the bits in a sensible order, we don't
83 *have* to wait until the last possible moment to merge in additional bits.
102 If the input is a multiple of 32 bits, you can even XOR in a 32-bit
103 word at a time and increase the inner loop count to 32.
105 You can also mix and match the two loop styles, for example doing the
107 for any fractional bytes at the end.
109 To reduce the number of conditional branches, software commonly uses
110 the byte-at-a-time table method, popularized by Dilip V. Sarwate,
114 Here, rather than just shifting one bit of the remainder to decide
115 in the correct multiple to subtract, we can shift a byte at a time.
117 and the correct multiple of the polynomial to subtract is found using
118 a 256-entry lookup table indexed by the high 8 bits.
120 (The table entries are simply the CRC-32 of the given one-byte messages.)
127 more importantly, too much of the L1 cache.
130 See "High Octane CRC Generation with the Intel Slicing-by-8 Algorithm",
133 This does not change the number of table lookups, but does increase
134 the parallelism. With the classic Sarwate algorithm, each table lookup
135 must be completed before the index of the next can be computed.
137 A "slicing by 2" technique would shift the remainder 16 bits at a time,
139 lookup in a 65536-entry table, the two high bytes are looked up in
140 two different 256-entry tables. Each contains the remainder required
141 to cancel out the corresponding byte. The tables are different because the
143 x^32 to x^39, while the other goes from x^40 to x^47.
147 twice as fast as the basic Sarwate algorithm.
150 Each step, 32 bits of data is fetched, XORed with the CRC, and the result
151 broken into bytes and looked up in the tables. Because the 32-bit shift
152 leaves the low-order bits of the intermediate remainder zero, the
153 final CRC is simply the XOR of the 4 table look-ups.
156 look-ups cannot begin until the previous groups 4 table look-ups have all
157 been completed. Thus, the processor's load/store unit is sometimes idle.
159 To make maximum use of the processor, "slicing by 8" performs 8 look-ups
160 in parallel. Each step, the 32-bit CRC is shifted 64 bits and XORed
162 those 8 bytes are simply copies of the input data; they do not depend
163 on the previous CRC at all. Thus, those 4 table look-ups may commence
164 immediately, without waiting for the previous loop iteration.
169 Two more details about CRC implementation in the real world:
174 a CRC to detect this condition, it's common to invert the CRC before
175 appending it. This makes the remainder of the message+crc come out not
176 as zero, but some fixed non-zero value. (The CRC of the inversion
179 The same problem applies to zero bits prepended to the message, and a
180 similar solution is used. Instead of starting the CRC computation with
182 you start the same way on decoding, it doesn't make a difference.