Search in shivacherukuri.tech@blogger.com

Sunday, February 28, 2010

Unix Incompatibility Notes: Byte ordering and how to find machine endianness

FYI,

http://unixpapa.com/incnote/byteorder.html

 

Jan Wolter

Any program that writes binary files that may have to be read by another computer needs to be concerned about byte order issues. Different processors write integers differently.

There is a minority view that says if you code properly then you never need to know the endianness of your machine. You should certainly consider carefully if you can do so in your application.

Terminology

Let's suppose we are writing out a four byte long integer 67305985. In hexadecimal, this is 0x04030201, so the most significant byte contains the hexadecimal value 04 and the least significant byte contains the hexadecimal value 01. Suppose this is written out to memory address x. The value will actually be written to four consecutive addresses, x through x+3. Which byte of data goes in which memory location? It depends on the processor. The alternatives are named after Lilliputian political parties:

  • Big-Endian systems save the most significant byte first. Sun and Motorola processors, IBM-370s and PDP-10s are big-endian. JPEG images contains big-endian values.

x

04

x+1

03

x+2

02

x+3

01

  • Little-Endian systems save the least significant byte first. The entire Intel x86 family, Vaxes, Alphas and PDP-11s are little-endian. GIF images contains little-endian values.

x

01

x+1

02

x+2

03

x+3

04

  • Middle-Endian or PDP-Endian systems save the most significant word first, with each word having the least significant byte first. For developers of new software, it is not only perfectly reasonable, but strongly recommended, to ignore this possiblity. I don't think there ever was a processor that stored 32-bit integer values to memory in a middle-endian format, though middle-endianness has occasionally appeared in things like packed-decimal formats, floating point formats, and obscure communications protocols (it's used for the length of TCP/IP packets in Visa's "Visa Base I" protocol).

x

03

x+1

04

x+2

01

x+3

02

Some processors (PowerPC, MIPS, DEC Alpha) can be either big-endian or little-endian depending on software settings.

Network byte order is the standard used in packets sent over the internet. It is big-endian (except that technically it refers to the order in which bytes are transmitted, not the order in which they are stored). If you are going to chose an arbitrary order to standardize on, network-byte order is a sensible choice.

The unix functions htonl(), htons(), ntohl(), and ntohs() convert longs and shorts back and forth between the host byte order and network byte order. However, though they are widely available, they are not universally available.

Compile-time Tests

We'd usually prefer to determine endianness at compile time. Most modern Unix systems define the byte order in the sys/param.h include file. Some code I've seen references the endian.h or machine/endian.h files instead, but I think that if those exist, thensys/param.h always pulls the appropriate ones in. Note however that some older systems (including SunOS 4.1) have sys/param.h but it does not define any byte order information.

The sys/param.h header normally defines the symbols __BYTE_ORDER, __BIG_ENDIAN, __LITTLE_ENDIAN, and __PDP_ENDIAN. You can test endianness by doing something like:

   #include <sys/param.h>
 
   #ifdef __BYTE_ORDER
   # if __BYTE_ORDER == __LITTLE_ENDIAN
   #  define I_AM_LITTLE_ENDIAN
   # else
   #  if __BYTE_ORDER == __BIG_ENDIAN
   #   define I_AM_BIG_ENDIAN
   #  else
       Error: unknown byte order!
   #  endif
   # endif
   

Friday, February 26, 2010