Search in shivacherukuri.tech@blogger.com

Thursday, March 17, 2011

Using extern in c and machine endianness

declaration and definition of a variable/function

Declaration of a variable/function simply declares that the variable/function exists somewhere in the program but the memory is not allocated for them. But the declaration of a variable/function serves an important role. And that is the type of the variable/function. Therefore, when a variable is declared, the program knows the data type of that variable. In case of function declaration, the program knows what are the arguments to that functions, their data types, the order of arguments and the return type of the function. So that's all about declaration. Coming to the definition, when we define a variable/function, apart from the role of declaration, it also allocates memory for that variable/function. Therefore, we can think of definition as a super set of declaration. (or declaration as a subset of definition). From this explanation, it should be obvious that a variable/function can be declared any number of times but it can be defined only once. (Remember the basic principle that you can't have two locations of the same variable/function). So that's all about declaration and definition.

Understanding "extern" keyword in C

it's mandatory to understand declaration/defination to understand the "extern" keyword. Let us first take the easy case. Use of extern with C functions. By default, the declaration and definition of a C function have "extern" prepended with them. It means even though we don't use extern with the declaration/definition of C functions, it is present there. For example, when we write.

int foo(int arg1, char arg2);

There's an extern present in the beginning which is hidden and the compiler treats it as below.

extern int foo(int arg1, char arg2);

Same is the case with the definition of a C function (Definition of a C function means writing the body of the function). Therefore whenever we define a C function, an extern is present there in the beginning of the function definition. Since the declaration can be done any number of times and definition can be done only once, we can notice that declaration of a function can be added in several C/H files or in a single C/H file several times. But we notice the actual definition of the function only once (i.e. in one file only). And as the extern extends the visibility to the whole program, the functions can be used (called) anywhere in any of the files of the whole program provided the declaration of the function is known. (By knowing the declaration of the function, C compiler knows that the definition of the function exists and it goes ahead to compile the program). So that's all about extern with C functions.

Now let us the take the second and final case i.e. use of extern with C variables. I feel that it more interesting and information than the previous case where extern is present by default with C functions. So let me ask the question, how would you declare a C variable without defining it? Many of you would see it trivial but it's important question to understand extern with C variables. The answer goes as follows.

extern int var;

Here, an integer type variable called var has been declared (remember no definition i.e. no memory allocation for var so far). And we can do this declaration as many times as needed. (remember that declaration can be done any number of times) So far so good. :)

Now how would you define a variable. Now I agree that it is the most trivial question in programming and the answer is as follows.

int var;

Here, an integer type variable called var has been declared as well as defined. (remember that definition is the super set of declaration). Here the memory for var is also allocated. Now here comes the surprise, when we declared/defined a C function, we saw that an extern was present by default. While defining a function, we can prepend it with extern without any issues. But it is not the case with C variables. If we put the presence of extern in variable as default then the memory for them will not be allocated ever, they will be declared only. Therefore, we put extern explicitly for C variables when we want to declare them without defining them. Also, as the extern extends the visibility to the whole program, by externing a variable we can use the variables anywhere in the program provided we know the declaration of them and the variable is defined somewhere.

Now let us try to understand extern with examples.

Example 1:
int var;
int main(void)
{
var = 10;
return 0;
}

Analysis: This program is compiled successfully. Here var is defined (and declared implicitly) globally.

Example 2:
extern int var;
int main(void)
{
return 0;
}

Analysis: This program is compiled successfully. Here var is declared only. Notice var is never used so no problems.

Example 3:
extern int var;
int main(void)
{
var = 10;
return 0;
}

Analysis: This program throws error in compilation. Because var is declared but not defined anywhere. Essentially, the var isn't allocated any memory. And the program is trying to change the value to 10 of a variable that doesn't exist at all.

Example 4:
#include "somefile.h"
extern int var;
int main(void)
{
var = 10;
return 0;
}

Analysis: Supposing that somefile.h has the definition of var. This program will be compiled successfully.

Example 5:
extern int var = 0;
int main(void)
{
var = 10;
return 0;
}

Analysis: Guess this program will work? Well, here comes another surprise from C standards. They say that..if a variable is only declared and an initializer is also provided with that declaration, then the memory for that variable will be allocated i.e. that variable will be considered as defined. Therefore, as per the C standard, this program will compile successfully and work.

So that was a preliminary look at "extern" keyword in C.

I'm sure that you want to have some take away from the reading of this post. And I would not disappoint you. :)
In short, we can say

1. Declaration can be done any number of times but definition only once.
2. "extern" keyword is used to extend the visibility of variables/functions().
3. Since functions are visible through out the program by default. The use of extern is not needed in function declaration/definition. Its use is redundant.
4. When extern is used with a variable, it's only declared not defined.
5. As an exception, when an extern variable is declared with initialization, it is taken as definition of the variable as well.

Mystery of Bigendian and Littleendian

What are these?

Little and big endian are two ways of storing multibyte data-types ( int, float, etc). In little endian machines, last byte of binary representation of the multibyte data-type is stored first. On the other hand, in big endian machines, first byte of binary representation of the multibyte data-type is stored last.

Suppose integer is stored as 4 bytes (For those who are using DOS based compilers such as C++ 3.0 , integer is 2 bytes) then a variable x with value 0×01234567 will be stored as following.

How to see memory representation of multibyte data types on your machine?

Here is a sample C code that shows the byte representation of int, float and pointer.
#include

/* function to show bytes in memory, from location start to start+n*/
void show_mem_rep(char *start, int n)
{
int i;
for (i = 0; i < n; i++)
printf(" %.2x", start[i]);
printf("\n");
}

/*Main function to call above function for 0×01234567*/
int main()
{
int i = 0×01234567;
show_mem_rep((char *)&i, sizeof(i));
getchar();
return 0;
}

When above program is run on little endian machine, gives "67 45 23 01″ as output , while if it is run on endian machine, gives "01 23 45 67″ as output.

Is there a quick way to determine endianness of your machine?

There are n no. of ways for determining endianness of your machine. Here is one quick way of doing the same.
#include
int main()
{
unsigned int i = 1;
char *c = (char*)&i;
if (*c)
printf("Little endian");
else
printf("Big endian");
getchar();
return 0;
}

In the above program, a character pointer c is pointing to an integer i. Since size of character is 1 byte when the character pointer is de-referenced it will contain only first byte of integer. If machine is little endian then *c will be 1 (because last byte is stored first) and if machine is big endian then *c will be 0.

Does endianness matter for programmers?

Most of the times compiler takes care of endianness, however, endianness becomes an issue in following cases.

It matters in network programming: Suppose you write integers to file on a little endian machine and you transfer this file to a big endian machine. Unless there is little andian to big endian transformation, big endian machine will read the file in reverse order. You can find such a practical example here.

Standard byte order for networks is big endian, also known as network byte order. Before transferring data on network, data is first converted to network byte order (big endian).

Sometimes it matters when you are using type casting, below program is an example.
#include
int main()
{
unsigned char arr[2] = {0×01, 0×00};
unsigned short int x = *(unsigned short int *) arr;
printf("%d", x);
getchar();
return 0;
}

In the above program, a char array is typecasted to an unsigned short integer type. When I run above program on little endian machine, I get 1 as output, while if I run it on a big endian machine I get 512. To make programs endianness independent, above programming style should be avoided.

What are bi-endians?

Bi-endian processors can run in both modes little and big endian.

What are the examples of little, big endian and bi-endian machines ?
Intel based processors are little endians. ARM processors were little endians. Current generation ARM processors are bi-endian.

Motorola 68K processors are big endians. PowerPC (by Motorola) and SPARK (by Sun) processors were big endian. Current version of these processors are bi-endians.

Does endianness effects file formats?

File formats which have 1 byte as a basic unit are independent of endianness e..g., ASCII files . Other file formats use some fixed endianness forrmat e.g, JPEG files are stored in big endian format.

Which one is better — little endian or big endian

The term little and big endian came from Gulliver's Travels by Jonathan Swift. Two groups could not agree by which end a egg should be opened -a-the little or the big. Just like the egg issue, there is no technological reason to choose one byte ordering convention over the other, hence the arguments degenerate into bickering about sociopolitical issues. As long as one of the conventions is selected and adhered to consistently, the choice is arbitrary.

3 comments: