What is integer overflow and why it occurs?
An integer overflow, or integer wrapping, is a potential problem in a program based upon the fact that the value that can be held in a numeric data type is limited by the data type's size in bytes. ANSI C uses the following minimum sizes:
data type size (bytes)
char 1
short 2
int 2
long 4
In practice, many compilers use a 4-byte int. It also should be noted that the actual ranges for the data types depend on whether or not they are signed. for instance, a signed 2-byte short may be between -32767 and 32767, while an unsigned short may be between 0 and 65535. See your [include]/limits.h file for specific numbers for your compiler.
Why should you care?
If you try to put a value into a data type that is too small to hold it, the high-order bits are dropped, and only the low-order bits are stored. Another way of saying that is that modulo-arithmetic is performed on the value before storing it to make sure it fits within the data type. Taking our unsigned short example:
Limit: 65535 or 1111 1111 1111 1111
Too big: 65536 or 1 0000 0000 0000 0000
What's stored: 0 or 0000 0000 0000 0000
As the above makes evident, that result is because the high-order (or left-most) bit of the value that's too big is dropped. Or you could say that what's stored is the result of
Stored = value % (limit + 1) or 65536 % (65535 + 1) = 0
In signed data types, the result is a little different and results in some seemingly weird behavior:
Positive limit: 32767 or 0111 1111 1111 1111
Too big: 32768 or 1000 0000 0000 0000
What's stored: -32768
Why's that?
It's because of "2′s compliment," which is how negative numbers are represented in binary. To make a long story short, the first half of the range (0 thru 0111 1111 1111 1111) is used for positive numbers in order of least to greatest. The second half of the range is then used for negative numbers in order of least to greatest. So, the negative range for a signed 2-byte short is -32768 thru -1, in that order.
When it occurs?
Integer overflow happens because computers use fixed width to represent integers. So which are the operations that result in overflow? Bitwise and logical operations cannot overflow, while cast and arithmetic operations can. For example, ++ and += operators can overflow, whereas && or & operators (or even <> operators) cannot.
Regarding arithmetic operators, it is obvious that operations like addition, subtraction and multiplication can overflow. How about operations like (unary) negation, division and mod (remainder)? For unary negation, -MIN_INT is equal to MIN_INT (and not MAX_INT), so it overflows. Following the same logic, division overflows for the expression (MIN_INT / -1). How about a mod operation? It does not overflow. The only possible overflow case (MIN_INT % -1) is equal to 0 (verify this yourself—the formula for % operator is a % b = a – ((a / b) * b)).
What happens when it occur?
Suppose memory is being allocated based on an unsigned integer data type's value. If that value has wrapped around, it may be that far too little memory will be made available. Or if a comparison is being made between a signed integer value and some other number, assuming that the former should be less than the latter, if that value has over flown into the negative, the comparison would pass. But are things going to behave the way the programmer intended? Probably not.
Sorting and searching techniques are most important for the programmer and we will use it in most of the programs we write. Among them the most widely used are binary search and quick sort for efficiency. In these algorithms we use mean of two elements. If the elements are integers then the mean is prone to be integer overflow condition. This results in undesirable results or bugs. So, let's look at some techniques to detect an overflow before it occurs.
For the statement int k = (i + j);
• If i and j are of different signs, it cannot overflow.
• If i and j are of same signs (- or +), it can overflow.
• If i and j are positive integers, then their sign bit is zero. If k is negative, it means its sign bit is 1—it indicates the value of (i + j) is too large to represent in k, so it overflows.
• If i and j are negative integers, then their sign bit is one. If k is positive, it means its sign bit is 0—it indicates that the value of (i + j) is too small to represent in k, so it overflows.
How to check for the overflow?
To check for overflow, we have to provide checks for conditions 3 and 4. Here is the straightforward conversion of these two statements into code. The function isSafeToAdd(), returns true or false, after checking for overflow.
/* Is it safe to add i and j without overflow? Return value 1 indicates there is no overflow; else it is overflow and not safe to add i and j */
int isSafeToAdd(int i, int j)
{
if( (i < 0 && j =0) || (i > 0 && j > 0) && k INT_MAX) or if ((i + j) INT_MAX) || ((i + j) INT_MAX), we can check the condition (i > INT_MAX – j) by moving j to the RHS of the expression. So, the condition in isSafeToAdd can be rewritten as:
if( (i > INT_MAX – j) || (i < INT_MIN – j) )
return 0;
That works! But can we simplify it further? From condition 2, we know that for an overflow to occur, the signs of i and j should be the same. If you notice the conditions in 3 and 4, the sign bit of the result (k) is different from (i and j). Does this strike you as the check that the ^ operator can be used? How about this check:
int k = (i + j);
if( ((i ^ k) & (j ^ k)) < 0)
return 0;
Let us check it. Assume that i and j are positive values and when it overflows, the result k will be negative. Now the condition (i ^ k) will be a negative value—the sign bit of i is 0 and the sign bit of k is 1; so ^ of the sign bit will be 1 and hence the value of the expression (i ^ k) is negative. So is the case for (j ^ k) and when the & of two values is negative; hence, the condition check with < 0 becomes true when there is overflow. When i and j are negative and k is positive, the condition again is < 0 (following the same logic described above).
So, yes, this also works! Though the if condition is not very easy to understand, it is correct and is also an efficient solution!
No comments:
Post a Comment