Programming: Managing the risk of pointers
Keywords:Pointers C language programming language assembly language programmer memory
What is a pointer?
To an assembly language programmer, memory is a sequence of locations (bytes or words), each one of which has an address. There is not really the concept of a variable. Manipulating addresses is an everyday occurrence. It is the responsibility of the programmer to keep track of what type of data is stored in each memory location. The data might be a number or some text (which is just a sequence of numbers, of course) or it might be an address of another location or possible an address of an address and so forth. There are also some high level languages—untyped languages—that operate in the same way; Forth and BCPL are examples that come to mind.
The majority of high level languages support data typing to a lesser or greater extent. This means, in effect, that the programmer specifies that a variable contains a specific type of data and the language only allows appropriate operations on that variable. A pointer to a variable incorporates its address, but also embodies "knowledge" of the type of the variable.
Pointers and integers
For most, but not all, modern CPUs, an address is the same bit size as a word of memory; i.e. most 32bit CPUs have 32bit address space as well as favouring operations on 32bit data. In this context, most, but again not quite all, CPUs allow addresses to be stored in memory locations and registers and be operated on like any other data.
It is generally possible to store the value of a pointer (i.e. an address) in an "ordinary" variable—like an unsigned integer. An example of where this might be done in an embedded application is in device driver code. Here is an example:
unsigned normal;
unsigned *pointer;
pointer = &normal;
normal = (unsigned)pointer;
This would result in the variable normal containing its own address. On most CPUs this code will work as specified. Whether it is a good idea is another matter.
In broad terms, most of the time, code should:
1. perform the required function
2. be readable/maintainable
3. be readily portable to a different CPU
#3 may be considered less important in certain instances – like device drivers in embedded systems.
#2 is questionable. If the programmer really needs to take control of typing, the code should be very carefully commented to make this clear.
Pointer arithmetic
Because pointers "know" about the data type to which they appertain, operations on pointers can appear confusing to the inexperienced, even though they are entirely logical.
Consider this code:
int x;
int *ptr;
ptr = &x;
ptr++;
...
If the variable x is located at address 0x80000000 and we are using a 32bit processor (i.e. 4B integers), what value will ptr contain at the end of this code? The answer is 0x80000004. This makes sense, as the pointer is being advanced through memory by "1 unit", which, in this case is an integer, which is 4B.
What if you write the incrementing code like this?
ptr = &x;
ptr += 1;
...
or in this rather un-C-like fashion:
ptr = &x;
ptr = ptr + 1;
...
The answer is that the result is identical, even if intuitively you might expect the answer to 0x80000001.
But what if you really did want to increment this pointer by just one byte? One way to do it might be:
((unsigned)ptr)++;
or, perhaps slightly better might be:
((char *)ptr)++;
Visit Asia Webinars to learn about the latest in technology and get practical design tips.