Search in shivacherukuri.tech@blogger.com

Showing posts with label C. Show all posts
Showing posts with label C. Show all posts

Sunday, September 18, 2011

C's volatile usage

Have you experienced any of the following in your C or C++ embedded code?

  • Code that works fine--until you enable compiler optimizations
  • Code that works fine--until interrupts are enabled
  • Flaky hardware drivers
  • RTOS tasks that work fine in isolation--until some other task is spawned

If you answered yes to any of the above, it's likely that you didn't use the C keyword volatile. You aren't alone. The use of volatile is poorly understood by many programmers. Unfortunately, most books about the C programming language dismiss volatile in a sentence or two.

C's volatile keyword is a qualifier that is applied to a variable when it is declared. It tells the compiler that the value of the variable may change at any time--without any action being taken by the code the compiler finds nearby. The implications of this are quite serious. However, before we examine them, let's take a look at the syntax.

volatile keyword syntax

To declare a variable volatile, include the keyword volatile before or after the data type in the variable definition. For instance both of these declarations will declare foo to be a volatile integer:

volatile int foo;  int volatile foo; 

Now, it turns out that pointers to volatile variables are very common, especially with memory-mapped I/O registers. Both of these declarations declare pReg to be a pointer to a volatile unsigned 8-bit integer:

volatile uint8_t * pReg;  uint8_t volatile * pReg; 

Volatile pointers to non-volatile data are very rare (I think I've used them once), but I'd better go ahead and give you the syntax:

int * volatile p; 

And just for completeness, if you really must have a volatile pointer to a volatile variable, you'd write:

int volatile * volatile p; 

Incidentally, for a great explanation of why you have a choice of where to place volatile and why you should place it after the data type (for example, int volatile * foo), read Dan Sak's column "Top-Level cv-Qualifiers in Function Parameters" (Embedded Systems Programming, February 2000, p. 63).

Finally, if you apply volatile to a struct or union, the entire contents of the struct/union are volatile. If you don't want this behavior, you can apply the volatile qualifier to the individual members of the struct/union.

Proper use of volatile

A variable should be declared volatile whenever its value could change unexpectedly. In practice, only three types of variables could change:

1. Memory-mapped peripheral registers

2. Global variables modified by an interrupt service routine

3. Global variables accessed by multiple tasks within a multi-threaded application

We'll talk about each of these cases in the sections that follow.

Peripheral registers

Embedded systems contain real hardware, usually with sophisticated peripherals. These peripherals contain registers whose values may change asynchronously to the program flow. As a very simple example, consider an 8-bit status register that is memory mapped at address 0x1234. It is required that you poll the status register until it becomes non-zero. The naive and incorrect implementation is as follows:

uint8_t * pReg = (uint8_t *) 0x1234; 
// Wait for register to become non-zero
while (*pReg == 0) { } // Do something else

This will almost certainly fail as soon as you turn compiler optimization on, since the compiler will generate assembly language that looks something like this:

  mov ptr, #0x1234 mov a, @ptr  loop:   bz loop 

The rationale of the optimizer is quite simple: having already read the variable's value into the accumulator (on the second line of assembly), there is no need to reread it, since the value will always be the same. Thus, in the third line, we end up with an infinite loop. To force the compiler to do what we want, we modify the declaration to:

uint8_t volatile * pReg = (uint8_t volatile *) 0x1234; 

The assembly language now looks like this:

  mov ptr, #0x1234  loop:   mov a, @ptr   bz loop 

The desired behavior is achieved.

Subtler problems tend to arise with registers that have special properties. For instance, a lot of peripherals contain registers that are cleared simply by reading them. Extra (or fewer) reads than you are intending can cause quite unexpected results in these cases.

Interrupt service routines

Interrupt service routines often set variables that are tested in mainline code. For example, a serial port interrupt may test each received character to see if it is an ETX character (presumably signifying the end of a message). If the character is an ETX, the ISR might set a global flag. An incorrect implementation of this might be:
int etx_rcvd = FALSE;  void main()  {     ...      while (!ext_rcvd)     
{ // Wait } ... }
interrupt void rx_isr(void) { ... if (ETX == rx_char) {
etx_rcvd = TRUE; } ... }

With compiler optimization turned off, this code might work. However, any half decent optimizer will "break" the code. The problem is that the compiler has no idea that etx_rcvd can be changed within an ISR. As far as the compiler is concerned, the expression !ext_rcvd is always true, and, therefore, you can never exit the while loop. Consequently, all the code after the while loop may simply be removed by the optimizer. If you are lucky, your compiler will warn you about this. If you are unlucky (or you haven't yet learned to take compiler warnings seriously), your code will fail miserably. Naturally, the blame will be placed on a "lousy optimizer."

The solution is to declare the variable etx_rcvd to be volatile. Then all of your problems (well, some of them anyway) will disappear.

Multi-threaded applications

Despite the presence of queues, pipes, and other scheduler-aware communications mechanisms in real-time operating systems, it is still fairly common for two tasks to exchange information via a shared memory location (that is, a global). Even as you add a preemptive scheduler to your code, your compiler has no idea what a context switch is or when one might occur. Thus, another task modifying a shared global is conceptually identical to the problem of interrupt service routines discussed previously. So all shared global variables should be declared volatile. For example, this is asking for trouble:

int cntr;  void task1(void)  {     cntr = 0;         
while (cntr == 0) { sleep(1); } ... }
void task2(void) { ... cntr++; sleep(10); ... }

This code will likely fail once the compiler's optimizer is enabled. Declaring cntr to be volatile is the proper way to solve the problem.

How offsetof() works

The offsetof() macro is an ANSI-required macro that should be found in stddef.h. Simply put, the offsetof() macro returns the number of bytes of offset before a particular element of a struct or union.

The declaration of the macro varies from vendor to vendor and depends upon the processor architecture. Browsing through the compilers on my computer, I found the example declarations shown in Listing 1. As you can see, the definition of the macro can get complicated.

// Keil 8051 compiler #define offsetof(s,m) (size_t)&(((s *)0)->m) 
// Microsoft x86 compiler (version 7)
#define offsetof(s,m) (size_t)(unsigned long)&(((s *)0)->m)
// Diab Coldfire compiler #define offsetof(s,memb) \
((size_t)((char *)&((s *)0)->memb-(char *)0))

Listing 1. A representative set of offsetof() macro definitions

Regardless of the implementation, the offsetof() macro takes two parameters. The first parameter is the structure name; the second, the name of the structure element. (I apologize for using a term as vague as "structure name." I'll refine this shortly.) A straightforward use of the macro is shown in Listing 2.

typedef struct {     int   i;     float f;     char  c;      }
SFOO; void main(void) {
printf("Offset of 'f' is %u", offsetof(SFOO, f)); }

Listing 2. A straightforward use of offsetof()

To better understand the magic of the offsetof() macro, consider the details of Keil's definition. The various operators within the macro are evaluated in an order such that the following steps are performed:

  • ((s *)0): takes the integer zero and casts it as a pointer to s.
  • ((s *)0)->m: dereferences that pointer to point to structure member m.
  • &(((s *)0)->m): computes the address of m.
  • (size_t)&(((s *)0)->m): casts the result to an appropriate data type.

By definition, the structure itself resides at address 0. It follows that the address of the field pointed to (Step 3 above) must be the offset, in bytes, from the start of the structure. At this point, we can make several observations:

  • We can be a bit more specific about the term "structure name." In a nutshell, if the structure name you use, call it s, results in a valid C expression when written as (s *)0->m, you can use s in the offsetof() macro. The examples shown in Listings 3 and 4 will help clarify that point.
  • The member expression, m, can be of arbitrary complexity; indeed, if you have nested structures, then the member field can be an expression that resolves to a parameter deeply nested within a structure
  • It's easy enough to see why this macro also works with unions
  • The macro won't work with bitfields, as you can't take the address of a bitfield member of a structure or union

Listings 3 and 4 contain simple variations on the usage of this macro. These should help you get you comfortable with the offsetof() basics.

struct sfoo  {     int   i;     float f;     char  c;      };   void main(void) {
printf("Offset of 'f' is %u", offsetof(struct sfoo, f)); }

Listing 3. A struct without a typedef

typedef struct {     long  l;     short s;      } SBAR; 
typedef struct {
int i; float f; SBAR b; } SFOO;
void main(void) {
printf("Offset of 'l' is %u", offsetof(SFOO, b.l)); }

Listing 4. Nested structs

Now that you understand the semantics of the macro, it's time to take a look at a few use examples.

struct padding bytes

Most 16-bit and larger processors require that data structures in memory be aligned on a multibyte (for example, 16-bit or 32-bit) boundary. Sometimes the requirement is absolute, and sometimes it's merely recommended for optimal bus throughput. In the latter case, the flexibility is offered because the designers recognized that you may wish to trade off memory access time with other competing issues such as memory size and the ability to transfer (perhaps via a communications link or direct memory access) the memory contents directly to another processor that has a different alignment requirement.

For cases such as these, it's often necessary to resort to compiler directives to achieve the required level of packing. As the C structure declarations can be quite complex, working out how to achieve this can be daunting. Furthermore, after poring over the compiler manuals, I'm always left with a slight sense of unease about whether I've really achieved what I set out to do.

The most straightforward solution to this problem is to write a small piece of test code. For instance, consider the moderately complex declaration given in Listing 5.

typedef union {     int   i;     float f;     char  c;     
struct { float g; double h; } b;
} UFOO; void main(void) {
printf("Offset of 'h' is %u", offsetof(UFOO, b.h)); }

Listing 5. A union containing a struct

If you need to know where field b.h resides in the structure, then the simplest way to find out is to write some test code such as that shown in Listing 5.

This is all well and good, but what about portability? Writing code that relies on offsets into structures can be risky—particularly if the code gets ported to a new target at a later date. Adding a comment is of course a good idea—but what one really needs is a means of forcing the compiler to tell you if the critical members of a structure are in the wrong place. Fortunately, one can do this using the offsetof() macro and the technique in Listing 6.

typedef union {     int   i;     float f;     char  c;          struct          {  
float g; double h; } b; } UFOO;
static union { char wrong_offset_i[offsetof(UFOO, i) == 0];
char wrong_offset_f[offsetof(UFOO, f) == 0];
... char wrong_offset_h[offsetof(UFOO, b.h) == 2]; // Error };

Listing 6. An anonymous union to check struct offsets

The technique works by attempting to declare a union of one-char arrays. If any test evaluates to false, its array will be of zero size, and a compiler error will result. One compiler I tested generated the specific error "Invalid dimension size [0]" on the line defining array wrong_offset_h[].

Thus the offsetof() macro can be used both to determine and to validate the packing of elements within C structs.

Nonvolatile memory layouts

Many embedded systems contain some form of nonvolatile memory, which holds configuration parameters and other device-specific information. One of the most common types of nonvolatile memory is serial EEPROM. Normally, such memories are byte addressable. The result is often a serial EEPROM driver that provides an API that includes a read function that looks like this:

ee_rd(uint16_t offset, uint16_t nBytes, uint8_t * dest)

In other words, read nBytes from offset offset in the EEPROM and store them at dest. The problem is knowing what offset in EEPROM to read from and how many bytes to read (in other words, the underlying size of the variable being read). The most common solution to this problem is to declare a data structure that represents the EEPROM and then declare a pointer to that struct and assign it to address 0x0000000. This technique is shown in Listing 7.

typedef struct {     int   i;     float f;     char  c;      } EEPROM;
EEPROM * const pEE = 0x0000000; ee_rd(&(pEE->f), sizeof(pEE->f), dest);

Listing 7. Accessing data in serial EEPROM via a pointer

This technique has been in use for years. However, I dislike it precisely because it does create an actual pointer to a variable that supposedly resides at address 0. In my experience, this can create a number of problems including:

  • Someone maintaining the code can get confused into thinking that the EEPROM data structure really does exist
  • You can write perfectly legal code (for example, pEE->f = 3.2) and get no compiler warnings that what you're doing is disastrous
  • The code doesn't describe the underlying hardware well

A far better approach is to use the offsetof() macro. An example is shown in Listing 8

typedef struct {     int   i;     float f;     char  c;      } EEPROM;
ee_rd(offsetof(EEPROM,f), 4, dest);

Listing 8. Use offsetof() to access data stored in serial EEPROM

However, there's still a bit of a problem. The size of the parameter has been entered manually (in this case "4"). It would be a lot better if we could have the compiler work out the size of the parameter as well. No problem, you say, just use the sizeof() operator. However, the sizeof() operator doesn't work the way we would like it to. That is, we cannot write sizeof(EEPROM.f) because EEPROM is a data type and not a variable.

Normally, one always has at least one instance of a data type so that this is not a problem. In our case, the data type EEPROM is nothing more than a template for describing how data are stored in the serial EEPROM. So, how can we use the sizeof() operator in this case? Well, we can simply use the same technique used to define the offsetof() macro. Consider the definition:

#define SIZEOF(s,m) ((size_t) sizeof(((s *)0)->m))

This looks a lot like the offsetof() definitions in Listing 1. We take the value 0 and cast it to "pointer to s." This gives us a variable to point to. We then point to the member we want and apply the regular sizeof() operator. The net result is that we can get the size of any member of a typedef without having to actually declare a variable of that data type.

Thus, we can now refine our read from the serial EEPROM as follows:

ee_rd(offsetof(EEPROM, f), SIZEOF(EEPROM, f), &dest);

At this point, we're using two macros in the function call, with both macros taking the same two parameters. This leads to an obvious refinement that cuts down on typing and errors:

#define EE_RD(M,D) \     ee_rd(offsetof(EEPROM,M), SIZEOF(EEPROM,M), D) 

Now our call to the EEPROM driver becomes much more intuitive:

EE_RD(f, &dest);

That is, read f from the EEPROM and store its contents at location dest. The location and size of the parameter is handled automatically by the compiler, resulting in a clean, robust interface.

Protect nonvolatile memory

Many embedded systems contain directly addressable nonvolatile memory, such as battery-backed SRAM. It's usually important to detect if the contents of this memory have been corrupted. I usually group the data into a structure, compute a CRC (cyclic redundancy check) over that structure, and append it to the data structure. Thus, I often end up with something like this:

struct nv {     short    param_1;     float    param_2;     char     param_3;  
uint16_t crc; } nvram;

The intent of the crc field is that it contain a CRC of all the parameters in the data structure with the exception of itself. This seems reasonable enough. Thus, the question is, how does one compute the CRC? If we assume we have a function, crc16( ), that computes the CRC-16 over an array of bytes, then we might be tempted to use the following:

nvram.crc =      crc16((char *) &nvram, sizeof(nvram)-sizeof(nvram.crc)); 

This code will only definitely work with compilers that pack all data on byte boundaries. For compilers that don't do this, the code will almost certainly fail. To see that this is the case, let's look at this example structure for a compiler that aligns everything on a 32-bit boundary. The effective structure could look like that in Listing 9.

struct nv {     short    param_1;  // offset = 0     char     pad1[2];  
// 2 byte pad float param_2; // offset = 4 char param_3;
// offset = 8 char pad2[3]; // 3 byte pad uint16_t crc;
// offset = 12 char pad3[2]; // 2 byte pad } nvram;

Listing 9. An example struct for a compiler that aligns everything on a 32-bit boundary

The first two pads are expected. However, why is the compiler adding two bytes onto the end of the structure? It does this because it has to handle the case when you declare an array of such structures. Arrays are required to be contiguous in memory, too. So to meet this requirement and to maintain alignment, the compiler pads the structure out as shown.

On this basis, we can see that the sizeof(nvram) is 16 bytes. Now our naive code in Listing 9 computes the CRC over sizeof(nvram) - sizeof(nvram.crc) bytes = 16 - 2 = 14 bytes. Thus the stored crc fieldnow includes its previous value in its computation! We certainly haven't achieved what we set out to do.

struct nv {     struct data     {         short param_1;  
// offset = 0 float param_2; // offset = 4 char param_3;
// offset = 8 } data; uint 16_t crc; // offset = 12 } nvram;

Listing 10. Nested data structures

The most common solution to this problem is to nest data structures as shown in Listing 10. Now we can compute the CRC using:

nvram.crc =      crc16((uint8_t *) &nvram.data, sizeof(nvram.data)); 

This works well and is indeed the technique I've used over the years. However, it introduces an extra level of nesting within the structure—purely to overcome an artifact of the compiler. Another alternative is to place the CRC at the top of the structure. This overcomes most of the problems but feels unnatural to many people. On the basis that structures should always reflect the underlying system, a technique that doesn't rely on artifice is preferable—and that technique is to use the offsetof() macro.

Using the offsetof() macro, one can simply use the following (assuming the original structure definition):

nvram.crc =      crc16((uint8_t *) &nvram, offsetof(struct nv, crc));

Friday, May 20, 2011

C bit wise operators

As an old school C programmer, I use all these operators somewhat frequently. I thought I'd write this tutorial to illustrate typical situations where I use them, and provide some sample code snippets.

% MODULO

The modulo operator gives you the remainder after integer division. For example, 16 % 3 is 1. Why? If you divide 16 by 3, you get 5, with a 1 left over. The "left over" part is what the module operator gives you. If you use modulo with an ascending series of numbers, it produces a characteristic repeating "stripe" pattern that goes from 0 to N-1, like so:

0 % 3 == 0 1 % 3 == 1 2 % 3 == 2 3 % 3 == 0 4 % 3 == 1 5 % 3 == 2 6 % 3 == 0 7 % 3 == 1 etc. 
For positive numbers, the results of modulo are predictable.

Unfortunately, for negative numbers, the results vary, depending on the language. Javascript has the following pattern, which is arithmetically correct:

-1 % 3 == -1 -2 % 3 == -2 -3 % 3 == 0 -4 % 3 == -1 -5 % 3 == -2 -6 % 3 == 0 -7 % 3 == -1 etc. 

But other languages, like Perl, have the following pattern, which essentially continues the pattern you see in the positive numbers:

-1 % 3 == 2 -2 % 3 == 1 -3 % 3 == 0 -4 % 3 == 2 -5 % 3 == 1 -6 % 3 == 0 -7 % 3 == 2 etc. 

You can make arguments for either case, since they both have their uses. Since I personally have a hard time remembering these kinds of language-specific details (and I program in a variety of different languages), I tend to avoid using Modulo with negative numbers in my work. For now, I'll concentrate on the positive numbers, and then show a trick which avoids the problem with the negative numbers.

That basic stripe pattern of 0,1,2, 0,1,2, 0,1,2... is a very useful thing, and I tend to employ it when I have software that has right and left-arrow navigation. For example, there might be a user-interface element which contains a "right arrow" which takes you to the "next page". If I'm on the last page, I often want to "wrap around" to the first page. "Wrap around" is an ideal use for the modulo operator.

If the page numbers are numbered 0 thru N-1 (where N is the number of pages), and the current page number is represented by curPage, then pressing the right arrow can invoke a script which does something like this:

curPage = (curPage + 1) % N; 
If you were not familiar with the modulo operator, you might code this like so:
curPage = curPage + 1 if (curPage == N)   curPage = 0; 

If N is 3, both code snippets produce the following transitions:

0 -> 1 1 -> 2 2 -> 0 
... But the modulo version is only one line. This to me, is a classic use of the modulo operator. Typically, I will internally represent page number using a zero-based number, to make it easier to use the modulo operator in this way. But even if your page numbers start at 1, you can still use modulo, by changing the forumula as follows:

curPage = ((curPage-1)+1 % N) + 1; 

which simplifies to:

curPage = (curPage % N) + 1; 
If N is three, this produces the following transitions.
1 -> 2 2 -> 3 3 -> 1 
Now let's consider how you can handle a left-arrow, which takes you to the previous page, and also loops around from the first page to the last page. If your page numbers are 0 - N-1, you might start with the following formula:
curPage = (curPage - 1) % N; 
But notice there is a problem if curPage is set to 0 (zero), depending on the behavior of the modulo operator. In the Perl implementation of Modulo, this code works (which is probably why Perl implemented Modulo that way), but in the Javascript implementation of Modulo, which is arithmetically correct, it does not work, so you have to special case it:
if (curPage == 0)   curPage = N-1; else   curPage = (curPage - 1) % N; 
Here's a cool trick that avoids the special case:
curPage = (curPage + (N-1)) % N; 
Notice this still has the following transition table for N=3:
0 -> 2 1 -> 0 2 -> 1 
If your page numbers start at 1, then convert it to a zero-based number first (by subtracting one), perform the operation, and then convert it back (by adding one):
curPage = ((curPage-1) + (N-1)) % N + 1; 
which has the following transition table for N=3:
1 -> 3 2 -> 1 3 -> 2 
You can generalize both of these directions by using a delta variable, which keeps track of the direction of travel. If traveling to the left, delta = -1, if traveling to the right, delta = 1. In this case, you would use the following code for zero-based page numbers (from 0 to N-1):
function gotoNextPage(delta) {   curPage = (curPage + N + delta) % N; } 
and the following code for one-based page numbers (from 1 to N):
function gotoNextPage(delta) {   curPage = ((curPage-1) + N + delta) % N + 1; } 
You can use modulo anytime where that "stripe" effect is useful - even for making stripes in graphics software! I've also used it for game software where a spaceship needs to wrap around the screen - reappearing on the left side after disappearing on the right. This looks something like this:
spaceship.x = (spaceship.x + screenWidth + horizontalSpeed) % screenWidth; spaceship.y = (spaceship.y + screenHeight + verticalSpeed) % screenHeight; 

& Bitwise AND
| Bitwise OR
~ Bitwise NOT

Here are some tables showing the effect of these bitwise logic operators. I've included the ^ XOR operator as well, which I'll discuss further down.
0 & 0 == 0 0 & 1 == 0 1 & 0 == 0 1 & 1 == 1  0 | 0 == 0 0 | 1 == 1 1 | 0 == 1 1 | 1 == 1  0 ^ 0 == 0 0 ^ 1 == 1 1 ^ 0 == 1 1 ^ 1 == 0  ~0 == 1 ~1 == 0 
These operators are called "bitwise" because they are applied to every single bit in the operands. So for example, in binary
00101101 & 00011100 == 00001100 
which in decimal comes to
45 | 28 == 12 
I most commonly use these bitwise logic operators for maintaining a set of flags, which are stored in a single numeric variable. To make sense of these operators, you have to visualize the binary representation of the numbers involved.

Let's say I have a game which involves different characteristics of animals, which are tracked using flags.

kHasTeeth   = 1; kHasFur     = 2; kHasScales  = 4; kHasLungs   = 8; kHasGills   = 16; kHasTusks   = 32; 
Note that these flags corresponding to single bits in binary numbers:
          // decimal    binary kHasTeeth   = 1;    // 00000001 kHasHair    = 2;    // 00000010 kHasScales  = 4;    // 00000100 kHasLungs   = 8;    // 00001000 kHasGills   = 16;   // 00010000 kHasTusks   = 32;   // 00100000 
We can use the Bitwise OR (|) operator to combine these flags into a single "characteristics" variable.
if (animal == "sheep")   characteristics = kHasTeeth | kHasHair | kHasLungs; else if (animal == "snake")     characteristics = kHasScales | kHasLungs; else if (animal == "goldfish")     characteristics = kHasScales | kHasGills; else if (animal == "shark")     characteristics = kHasTeeth | kHasScales | kHasGills; else if (animal == "elephant")     characteristics = kHasTusks | kHasLungs | kHasHair; 
We can use the Bitwise AND (&) variable to test if a particular flag is turned on.
if ( (characteristics & kHasTeeth) )    message = "This animal has teeth."; 
Here's a more complex example:
if ( (characteristics & kHasScales) && (characteristics & kHasLungs) )    message = "This animal is probably a reptile."; else if ( (characteristics & kHasScales) && (characteristics & kHasGills) )    message = "This animal is probably a fish."; 
The Bitwise Negation (~) operator give you the opposite bit pattern of it's operand, typically extended up to 32 bits.
X = 1      // x = 00000000000000000000000000000001 X = ~X;    // x = 11111111111111111111111111111110 
This operator is commonly combined with the Bitwise AND (&) operator to clear individual bits from flags. For example, the following code turns off the kHasHair bit in characteristics.
characteristics = (characteristics & ~kHasHair); 
This can be shortened to
characteristics &= ~kHasHair; 
Note that this code will turn the kHasHair bit off, if it is on, and will have no effect otherwise. If you want to TOGGLE the value of that bit, use the XOR (^) operator, without the negation.
characteristics ^= kHasHair; 
More on XOR below...

^ XOR

The bitwise XOR operator is one of my favorites, since it often lies at the center of some interesting arcane trickery.

The effect of XOR

A = A ^ B 
or
A ^= B 
Is to toggle any bits in A which are set (or turned on) in B. If those bits are turned off in A, they are turned on. If they are turned on in A, they are turned off.

Interestingly, if you apply it twice, you get the same result back. For example, if you execute the following code

myVar = (myVar ^ X) ^ X 
or
myVar ^= X myVar ^= X 
Where X is any integer value, then the value of myVar is unchanged.

As described above, the toggle nature of XOR makes it ideal for toggling flags.

myFlags = (myFlags ^ SomeFlag); 
or
myFlags ^= SomeFlag; 
A more obscure use of XOR I've used is to swap the values of two integer variables, without requiring a third temporary variable. Normally, if you wanted to swap two integer variables, you'd do this:
var temp = x; x = y; y = temp; 
However, back in the days when I cared about such things as minimizing the numbers of registers, I would write code like this:
x ^= y; y ^= x; x ^= y; 
..which had the same effect. Let's examine what is going on with some real values to get an intuitive understanding of why. Let's say X is equal to 6 and Y is equal to 5. The following tables shows what is happening in binary.
                      X               Y original values       110             101 x ^= y                011             101   x=x^y y ^= x                011             110   y=y^(x^y) or y=x x ^= y                101             110   x=(x^y)^x or x=y 
voila! they've been swapped!

XOR is often used in simple (and easily crackable) encryption algorithms, since you can apply it to a series of character codes to first encrypt, and then decrypt them.

<< Shift Left >> Shift Right

The shift left operator shift all the bits to the left in a number. For example:
x = 6                   // X is 110 in binary x = (x << 1)            // X is now 1100 in binary, or 12 in decimal. 
Notice that shifting a number to the left by 1 is the same as multiplying it by two. Shifting to the left by N bits is the same as multiplying it by 2 to the N. Similarly, the shift right operator is the same as integer-dividing by 2 to the N.

Back when CPU speed was a major issue, you could get speed gains by substituting left and right shifts for multiplies and divides in this manner. However, in high level languages like Javascript and Actionscript, there is no real savings.

I do often use the shift operator to convert ordinal numbers to flags. For example, in my Sudoku software, I use a single number to indicate which digits (from 1-9) are possibilities for a particular cell.

I can set a digit as a possibilty using code like the following:

possibles = (possibles | (1 << digit)); 
or
possibles |= (1 << digit); 
and I can clear a digit as a possibility using code like this:
possibles = (possibles & ~(1 << digit)); 
or
possibles &= ~(1 << digit); 
If there are 9 possible digits (1-9), I can set all the possible digits at once, using this:
N = 9 possibles = (1 << N)-1; 
Note however, that this code assumes I am using bit-numbers 0 thru N-1, rather than bit-numbers 1 thru N. As an old-school programmer, I pretty much always use zero-based counting systems, when I can, since it simplifies the resulting code.

Another very common way I use the shift and bitwise logic operators is for converting individual R,G and B values into colors.

compositeColor = (red << 16) | (green << 8) | blue; 


--
Siva
9886179349

Monday, May 9, 2011

is main() function required for C program?

No, the ISO C standard states that a main function is only required for a hosted environment (one with an underlying OS).

For a freestanding environment like an embedded system (or an operating system itself), it's implementation defined. From C99 5.1.2:

Two execution environments are defined: freestanding and hosted. In both cases, program startup occurs when a designated C function is called by the execution environment.

In a freestanding environment (in which C program execution may take place without any benefit of an operating system), the name and type of the function called at program startup are implementation-defined.

As to how Linux itself starts, the start point for the Linux kernel is start_kernel though, for a more complete picture of the entire boot process, you should start here.

and

The operating systems loader has to call a single entry point; in the GNU compiler, the entry point is defined in the crt0.o linked object file, the source for this is the assembler file crt0.s - that invokes main() after performing various run-time start-up tasks (such as establishing a stack, static initialisation). So when building an executable that links the default crt0.o, you must have a main(), otherwise you will get a linker error since in crt0.o main() is an unresolved symbol.

It would be possible (if somewhat perverse and unnecessary) to modify crt0.s to call a different entry point. Just make sure that you make such an object file specific to your project rather than modifying the default version, or you will break every build on that machine.

The OS itself has its own C runtime start-up (which will be called from the bootloader) so can call any entry point it wishes. I have not looked at the Linux source, but imagine that it has its own crt0.s that will call whatever the C code entry point is

where do constant variables stored in C

How they are stored is an implementation detail (depends on the compiler).

For example, in the GCC compiler, on most machines, read-only variables, constants, and jump tables are placed in the text section.

Consider the code:

const int i = 0;
static const int k = 99;

int function(void)
{
   
const int j = 37;
    totherfunc
(&j);
    totherfunc
(&i);
 
//totherfunc(&k);
   
return(j+3);
}

Generally, i can be stored in the text segment (it's a read-only variable with a fixed value). If it is not in the text segment, it will be stored beside the global variables. Given that it is initialized to zero, it might be in the 'bss'(block starting symbols) section (where zeroed variables are usually allocated) or in the 'data' section (where initialized variables are usually allocated).

If the compiler is convinced the k is unused (which it could be since it is local to a single file), it might not appear in the object code at all. If the call to totherfunc() that references k was not commented out, then k would have to be allocated an address somewhere - it would likely be in the same segment as i.

The constant (if it is a constant, is it still a variable?) j will most probably appear on the stack of a conventional C implementation. (If you were asking in the comp.std.c news group, someone would mention that the standard doesn't say that automatic variables appear on the stack; fortunately, SO isn't comp.std.c!)

Note that I forced the variables to appear because I passed them by reference - presumably to a function expecting a pointer to a constant integer. If the addresses were never taken, then j and k could be optimized out of the code altogether. To remove i, the compiler would have to know all the source code for the entire program - it is accessible in other translation units (source files), and so cannot as readily be removed. Doubly not if the program indulges in dynamic loading of shared libraries - one of those libraries might rely on that global variable.

(Stylistically - the variables i and j should have longer, more meaningful names; this is only an example!)

Thursday, March 17, 2011

Using extern in c and machine endianness

declaration and definition of a variable/function

Declaration of a variable/function simply declares that the variable/function exists somewhere in the program but the memory is not allocated for them. But the declaration of a variable/function serves an important role. And that is the type of the variable/function. Therefore, when a variable is declared, the program knows the data type of that variable. In case of function declaration, the program knows what are the arguments to that functions, their data types, the order of arguments and the return type of the function. So that's all about declaration. Coming to the definition, when we define a variable/function, apart from the role of declaration, it also allocates memory for that variable/function. Therefore, we can think of definition as a super set of declaration. (or declaration as a subset of definition). From this explanation, it should be obvious that a variable/function can be declared any number of times but it can be defined only once. (Remember the basic principle that you can't have two locations of the same variable/function). So that's all about declaration and definition.

Understanding "extern" keyword in C

it's mandatory to understand declaration/defination to understand the "extern" keyword. Let us first take the easy case. Use of extern with C functions. By default, the declaration and definition of a C function have "extern" prepended with them. It means even though we don't use extern with the declaration/definition of C functions, it is present there. For example, when we write.

int foo(int arg1, char arg2);

There's an extern present in the beginning which is hidden and the compiler treats it as below.

extern int foo(int arg1, char arg2);

Same is the case with the definition of a C function (Definition of a C function means writing the body of the function). Therefore whenever we define a C function, an extern is present there in the beginning of the function definition. Since the declaration can be done any number of times and definition can be done only once, we can notice that declaration of a function can be added in several C/H files or in a single C/H file several times. But we notice the actual definition of the function only once (i.e. in one file only). And as the extern extends the visibility to the whole program, the functions can be used (called) anywhere in any of the files of the whole program provided the declaration of the function is known. (By knowing the declaration of the function, C compiler knows that the definition of the function exists and it goes ahead to compile the program). So that's all about extern with C functions.

Now let us the take the second and final case i.e. use of extern with C variables. I feel that it more interesting and information than the previous case where extern is present by default with C functions. So let me ask the question, how would you declare a C variable without defining it? Many of you would see it trivial but it's important question to understand extern with C variables. The answer goes as follows.

extern int var;

Here, an integer type variable called var has been declared (remember no definition i.e. no memory allocation for var so far). And we can do this declaration as many times as needed. (remember that declaration can be done any number of times) So far so good. :)

Now how would you define a variable. Now I agree that it is the most trivial question in programming and the answer is as follows.

int var;

Here, an integer type variable called var has been declared as well as defined. (remember that definition is the super set of declaration). Here the memory for var is also allocated. Now here comes the surprise, when we declared/defined a C function, we saw that an extern was present by default. While defining a function, we can prepend it with extern without any issues. But it is not the case with C variables. If we put the presence of extern in variable as default then the memory for them will not be allocated ever, they will be declared only. Therefore, we put extern explicitly for C variables when we want to declare them without defining them. Also, as the extern extends the visibility to the whole program, by externing a variable we can use the variables anywhere in the program provided we know the declaration of them and the variable is defined somewhere.

Now let us try to understand extern with examples.

Example 1:
int var;
int main(void)
{
var = 10;
return 0;
}

Analysis: This program is compiled successfully. Here var is defined (and declared implicitly) globally.

Example 2:
extern int var;
int main(void)
{
return 0;
}

Analysis: This program is compiled successfully. Here var is declared only. Notice var is never used so no problems.

Example 3:
extern int var;
int main(void)
{
var = 10;
return 0;
}

Analysis: This program throws error in compilation. Because var is declared but not defined anywhere. Essentially, the var isn't allocated any memory. And the program is trying to change the value to 10 of a variable that doesn't exist at all.

Example 4:
#include "somefile.h"
extern int var;
int main(void)
{
var = 10;
return 0;
}

Analysis: Supposing that somefile.h has the definition of var. This program will be compiled successfully.

Example 5:
extern int var = 0;
int main(void)
{
var = 10;
return 0;
}

Analysis: Guess this program will work? Well, here comes another surprise from C standards. They say that..if a variable is only declared and an initializer is also provided with that declaration, then the memory for that variable will be allocated i.e. that variable will be considered as defined. Therefore, as per the C standard, this program will compile successfully and work.

So that was a preliminary look at "extern" keyword in C.

I'm sure that you want to have some take away from the reading of this post. And I would not disappoint you. :)
In short, we can say

1. Declaration can be done any number of times but definition only once.
2. "extern" keyword is used to extend the visibility of variables/functions().
3. Since functions are visible through out the program by default. The use of extern is not needed in function declaration/definition. Its use is redundant.
4. When extern is used with a variable, it's only declared not defined.
5. As an exception, when an extern variable is declared with initialization, it is taken as definition of the variable as well.

Mystery of Bigendian and Littleendian

What are these?

Little and big endian are two ways of storing multibyte data-types ( int, float, etc). In little endian machines, last byte of binary representation of the multibyte data-type is stored first. On the other hand, in big endian machines, first byte of binary representation of the multibyte data-type is stored last.

Suppose integer is stored as 4 bytes (For those who are using DOS based compilers such as C++ 3.0 , integer is 2 bytes) then a variable x with value 0×01234567 will be stored as following.

How to see memory representation of multibyte data types on your machine?

Here is a sample C code that shows the byte representation of int, float and pointer.
#include

/* function to show bytes in memory, from location start to start+n*/
void show_mem_rep(char *start, int n)
{
int i;
for (i = 0; i < n; i++)
printf(" %.2x", start[i]);
printf("\n");
}

/*Main function to call above function for 0×01234567*/
int main()
{
int i = 0×01234567;
show_mem_rep((char *)&i, sizeof(i));
getchar();
return 0;
}

When above program is run on little endian machine, gives "67 45 23 01″ as output , while if it is run on endian machine, gives "01 23 45 67″ as output.

Is there a quick way to determine endianness of your machine?

There are n no. of ways for determining endianness of your machine. Here is one quick way of doing the same.
#include
int main()
{
unsigned int i = 1;
char *c = (char*)&i;
if (*c)
printf("Little endian");
else
printf("Big endian");
getchar();
return 0;
}

In the above program, a character pointer c is pointing to an integer i. Since size of character is 1 byte when the character pointer is de-referenced it will contain only first byte of integer. If machine is little endian then *c will be 1 (because last byte is stored first) and if machine is big endian then *c will be 0.

Does endianness matter for programmers?

Most of the times compiler takes care of endianness, however, endianness becomes an issue in following cases.

It matters in network programming: Suppose you write integers to file on a little endian machine and you transfer this file to a big endian machine. Unless there is little andian to big endian transformation, big endian machine will read the file in reverse order. You can find such a practical example here.

Standard byte order for networks is big endian, also known as network byte order. Before transferring data on network, data is first converted to network byte order (big endian).

Sometimes it matters when you are using type casting, below program is an example.
#include
int main()
{
unsigned char arr[2] = {0×01, 0×00};
unsigned short int x = *(unsigned short int *) arr;
printf("%d", x);
getchar();
return 0;
}

In the above program, a char array is typecasted to an unsigned short integer type. When I run above program on little endian machine, I get 1 as output, while if I run it on a big endian machine I get 512. To make programs endianness independent, above programming style should be avoided.

What are bi-endians?

Bi-endian processors can run in both modes little and big endian.

What are the examples of little, big endian and bi-endian machines ?
Intel based processors are little endians. ARM processors were little endians. Current generation ARM processors are bi-endian.

Motorola 68K processors are big endians. PowerPC (by Motorola) and SPARK (by Sun) processors were big endian. Current version of these processors are bi-endians.

Does endianness effects file formats?

File formats which have 1 byte as a basic unit are independent of endianness e..g., ASCII files . Other file formats use some fixed endianness forrmat e.g, JPEG files are stored in big endian format.

Which one is better — little endian or big endian

The term little and big endian came from Gulliver's Travels by Jonathan Swift. Two groups could not agree by which end a egg should be opened -a-the little or the big. Just like the egg issue, there is no technological reason to choose one byte ordering convention over the other, hence the arguments degenerate into bickering about sociopolitical issues. As long as one of the conventions is selected and adhered to consistently, the choice is arbitrary.

Some of the puzzling things about c language

Can you guess why there is no distinct format specifier for 'double' in the printf/scanf format string, although it is one of the four basic data types? (Remember we use %lf for printing the double value in printf/scanf; %d is for integers).

Ans: In older versions of C, there was no 'double'—it was just 'long float' type—and that is the reason why it has the format specifier '%lf' ('%d' was already in use to indicate signed decimal values). Later, double type was added to indicate that the floating point type might be of 'double precision' (IEEE format, 64-bit value). So a format specifier for long float and double was kept the same.

Why is the output file of the C compiler called a.out?

Ans: The a.out stands for 'assembler.output' file [2]. The original UNIX was written using an assembler for the PDP-7 machine. The output of the assembler was a fixed file name, which was a.out to indicate that it was the output file from the assembler. No assembly needs to be done in modern compilers; instead, linking and loading of object files is done. However, this tradition continues and the output of cc is by default a.out!

what is the use of volatile keyword in c?

The meaning of volatile is a popular interview question, particularly for freshers and I've read articles describing the properties of this keyword as if it had super powers.

The volatile keyword does a very simple job. When a variable is marked as volatile, the programmer is instructing the compiler not to cache this variable in a register but instead to read the value of the variable from memory each and every time the variable is used. That's it – simple isn't it?

To illustrate the use of the keyword, consider the following example:

volatile int* vp = SOME_REGISTER_ADDRESS;
for(int i=0; i<100; i++)
foo(*vp);

In this simple example, the pointer vp points to a volatile int. The value of this int is read from memory for each loop iteration. If volatile was not specified then it is likely that the compiler would generate optimized code which would read the value of the int once, temporarily store this in a register and then use the register copy during each iteration.

hope after reading this you reply confidently, when the same question is asked to you in the interview

Integer overflow and why it occurs?

What is integer overflow and why it occurs?

An integer overflow, or integer wrapping, is a potential problem in a program based upon the fact that the value that can be held in a numeric data type is limited by the data type's size in bytes. ANSI C uses the following minimum sizes:

data type size (bytes)
char 1
short 2
int 2
long 4

In practice, many compilers use a 4-byte int. It also should be noted that the actual ranges for the data types depend on whether or not they are signed. for instance, a signed 2-byte short may be between -32767 and 32767, while an unsigned short may be between 0 and 65535. See your [include]/limits.h file for specific numbers for your compiler.

Why should you care?

If you try to put a value into a data type that is too small to hold it, the high-order bits are dropped, and only the low-order bits are stored. Another way of saying that is that modulo-arithmetic is performed on the value before storing it to make sure it fits within the data type. Taking our unsigned short example:

Limit: 65535 or 1111 1111 1111 1111
Too big: 65536 or 1 0000 0000 0000 0000
What's stored: 0 or 0000 0000 0000 0000

As the above makes evident, that result is because the high-order (or left-most) bit of the value that's too big is dropped. Or you could say that what's stored is the result of
Stored = value % (limit + 1) or 65536 % (65535 + 1) = 0
In signed data types, the result is a little different and results in some seemingly weird behavior:

Positive limit: 32767 or 0111 1111 1111 1111
Too big: 32768 or 1000 0000 0000 0000
What's stored: -32768

Why's that?

It's because of "2′s compliment," which is how negative numbers are represented in binary. To make a long story short, the first half of the range (0 thru 0111 1111 1111 1111) is used for positive numbers in order of least to greatest. The second half of the range is then used for negative numbers in order of least to greatest. So, the negative range for a signed 2-byte short is -32768 thru -1, in that order.

When it occurs?

Integer overflow happens because computers use fixed width to represent integers. So which are the operations that result in overflow? Bitwise and logical operations cannot overflow, while cast and arithmetic operations can. For example, ++ and += operators can overflow, whereas && or & operators (or even <> operators) cannot.

Regarding arithmetic operators, it is obvious that operations like addition, subtraction and multiplication can overflow. How about operations like (unary) negation, division and mod (remainder)? For unary negation, -MIN_INT is equal to MIN_INT (and not MAX_INT), so it overflows. Following the same logic, division overflows for the expression (MIN_INT / -1). How about a mod operation? It does not overflow. The only possible overflow case (MIN_INT % -1) is equal to 0 (verify this yourself—the formula for % operator is a % b = a – ((a / b) * b)).

What happens when it occur?

Suppose memory is being allocated based on an unsigned integer data type's value. If that value has wrapped around, it may be that far too little memory will be made available. Or if a comparison is being made between a signed integer value and some other number, assuming that the former should be less than the latter, if that value has over flown into the negative, the comparison would pass. But are things going to behave the way the programmer intended? Probably not.

Sorting and searching techniques are most important for the programmer and we will use it in most of the programs we write. Among them the most widely used are binary search and quick sort for efficiency. In these algorithms we use mean of two elements. If the elements are integers then the mean is prone to be integer overflow condition. This results in undesirable results or bugs. So, let's look at some techniques to detect an overflow before it occurs.

For the statement int k = (i + j);
• If i and j are of different signs, it cannot overflow.
• If i and j are of same signs (- or +), it can overflow.
• If i and j are positive integers, then their sign bit is zero. If k is negative, it means its sign bit is 1—it indicates the value of (i + j) is too large to represent in k, so it overflows.
• If i and j are negative integers, then their sign bit is one. If k is positive, it means its sign bit is 0—it indicates that the value of (i + j) is too small to represent in k, so it overflows.

How to check for the overflow?

To check for overflow, we have to provide checks for conditions 3 and 4. Here is the straightforward conversion of these two statements into code. The function isSafeToAdd(), returns true or false, after checking for overflow.

/* Is it safe to add i and j without overflow? Return value 1 indicates there is no overflow; else it is overflow and not safe to add i and j */
int isSafeToAdd(int i, int j)
{
if( (i < 0 && j =0) || (i > 0 && j > 0) && k INT_MAX) or if ((i + j) INT_MAX) || ((i + j) INT_MAX), we can check the condition (i > INT_MAX – j) by moving j to the RHS of the expression. So, the condition in isSafeToAdd can be rewritten as:

if( (i > INT_MAX – j) || (i < INT_MIN – j) )
return 0;

That works! But can we simplify it further? From condition 2, we know that for an overflow to occur, the signs of i and j should be the same. If you notice the conditions in 3 and 4, the sign bit of the result (k) is different from (i and j). Does this strike you as the check that the ^ operator can be used? How about this check:

int k = (i + j);
if( ((i ^ k) & (j ^ k)) < 0)
return 0;

Let us check it. Assume that i and j are positive values and when it overflows, the result k will be negative. Now the condition (i ^ k) will be a negative value—the sign bit of i is 0 and the sign bit of k is 1; so ^ of the sign bit will be 1 and hence the value of the expression (i ^ k) is negative. So is the case for (j ^ k) and when the & of two values is negative; hence, the condition check with < 0 becomes true when there is overflow. When i and j are negative and k is positive, the condition again is < 0 (following the same logic described above).

So, yes, this also works! Though the if condition is not very easy to understand, it is correct and is also an efficient solution!