Wednesday, August 25, 2021

Tech in Plain Site: Check Digits and Human Error

Bar code shown in a 3D plain in Vaporwave Aesthetic

Computers in working order and with correct software don’t make mistakes. People, however, make plenty of mistakes (including writing bad software or breaking computers). In quality circles, there’s a Japanese term, poka yoke, which roughly means ‘error avoidance’. The idea is to avoid errors by making them too obvious for them to occur. For example, consider a SIM card in your phone. The little diagonal corner means it only goes in one way. If you put it in the wrong way, it is obviously wrong.

To be successful at poka yoke, you have to be able to imagine what a user might do wrong and then come up with some way to make it obvious that it is wrong. There are examples of this all around us and we sometimes don’t even know it. For example, what do your credit card number, your car’s VIN code, and a UPC code on a can of beans have in common?

The answer is that they all are long strings of digits which are notoriously difficult for humans to enter correctly. People miss numbers or transpose them. So people who write applications that take numbers like this often want to check to make sure the person didn’t make a mistake.

Of course, a number is a number, right? If I tell you to enter a five digit zip code, I can figure out if you put in four digits or six, but it is hard to know if 77508 or 77580 is the one you meant. That’s why long and important numbers have one or more check digits.

A check digit is like a checksum or a CRC — you compute it from the other digits in the number and if your computation doesn’t match the check digit you got, then something was put in wrong.

A Simple Example

Just to keep things simple, suppose you have a a four digit PIN number 0000 to 9999 and we want to make a five digit code with a check digit. A simple way to do it would be to add all the digits together and throw away all but the last digit (that is, the remainder after dividing by 10 or modulo 10).

For example 0052 becomes 00527 and 9522 becomes 95228. Simple, right? Now you know that 10118 is not a valid number. Of course, 00527 is valid but so is 00257 or 52007. So maybe we can do better.

Real Life

In real life, algorithms try to take the position of the digits into account. There are a few ways to do this and, as you might expect, there’s a lot of math to decide what’s best. Many systems use a weighted algorithm where each digit has a different weight, usually a 1, 3, 7, or 9 with no two adjacent digits having the same weight. Since those numbers are coprime with 10, any single digit change causes a different check digit. Using that weighting also catches about 90% of single transpositions, other than those involving a 5 and a 2 (since 5 and 2 multiply to 10).

For example, the ubiquitous UPC code uses weights of 1 and 3 for alternate numbers with digit 1 being the rightmost digit (other than the check digit) followed by digits 2, 3, and so on moving to the left. The algorithm is:

  1. Ignoring the check digit, start at the right and add all digits in odd-numbered positions together
  2. Multiply the sum by 3 (the weight for odd digits)
  3. Ignoring the check digit, add the remaining digits to the running total
  4. Take the last digit of the sum (that is, the remainder after dividing by 10); if that digit is not zero subtract it from 10

For example, I have a can of spray air on my desk with a UPC of 681131309516. The first six digits are unique to the company. The next five digits are a unique ID and the final is the check digit. That means the odd-position digits are 1, 9, 3, 3, 1, and 6. The even-position digits are 5, 0, 1, 1, and 8. The first sum, then is 23 and multiplying by 3 gives 69. The even digits results in 15 for a grand total of 84 and a preliminary check digit of 4. Since this isn’t a zero, the real check digit is 10-4 or 6. Try changing any number or swapping any two numbers between the groups and see what result you get.

Even Better

ISBN-10 is even more robust. It uses a ten digit number where each digit has a different weight from 1 to 10 and takes the remainder after dividing by 11. This catches all common errors, but can result in a check digit of 10 which is represented by an X.

There are other even more robust algorithms such as Damm, Verhoeff, and Luhn. You can also add more check digits to get better performance, just as more bits of a CRC are usually more robust.

Meaning

These check digits aren’t made to act as a security device. Usually, the algorithm is well-known and easy to figure out. So it isn’t that bad guys can’t figure out how to make fake credit card numbers because of the check digits. They simply provide a little poka yoke so a program can immediately spot common errors in numbers like these. Something to remember next time you design an interface or anything else prone to human errors.

Of course, if you want to protect against computer errors, you are better off with a CRC. There are also other ways to catch errors if you aren’t worried about humans calculating the check digits.


No comments:

Post a Comment