Using C++ as better CUSENIX

  mccluskey,glen by Glen McCluskey
<[email protected]>

Glen McCluskey is a consultant with 15 years of experience and has focused on programming languages since 1988. He specializes in Java and C++ performance, testing, and technical documentation areas.

C pointers are a major strength of the language, and also a major weakness and source of programming mistakes. Sometimes you really truly need to use a pointer, but at other times, for example, to implement multiple return values from a function, C++ references can provide a superior alternative. This column also looks at a related topic, initialization.

References

A reference is another name for an object. For example, in this code:

int i;
int& ir = i;

ir is another name for i. To see how references are useful, and also how they're implemented, consider writing a function that has two return values to pass back. In ANSI C, we might say:

void f(int a, int b, int* sum, int* prod)
{
  *sum = a + b;
  *prod = a * b;
}

void g()
{
  int s;
  int p;

  f(37, 47, &s, &p);
}

In C++, we would say:

void f(int a, int b, int& sum, int& prod)
{
  sum = a + b;
  prod = a * b;
}

void g()
{
  int s;
  int p;

  f(37, 47, s, p);
}

One way of viewing references is to consider that they have some similarities to C pointers, but with one level of pointer removed. Pointers are a frequent source of errors in C.

A reference must be initialized, and its value (the object to which the pointer is pointing) cannot be changed after initialization. The value of the reference cannot change, but the value of the referenced object can, unless the reference is declared as constant. So, for example:

int i = 0;
int& ir = i;
ir = -19; // i gets the value -19
is acceptable, while:

const int& irc = 47;
irc = -37; // error

is not. A constant reference that points at a value like 47 can be implemented using a temporary.

References are especially useful in argument passing and return.

Global Initialization

In C, usage such as:

int f() {return 37;}
int i = 47;
int j;

for global variables is legal. Typically, in an object file and an executable program these types of declarations might be lumped into sections with names like "text," "data," and "bss," meaning "program code," "data with an initializer," and "data with no initializer."

When a program is loaded by the operating system for execution, a common scheme will have the text and data stored within the binary file on disk that represents the program and the bss section simply stored as an entry in a symbol table and created and zeroed dynamically when the program is loaded.

There are variations on this scheme, such as shared libraries, that are not our concern here. Rather, I want to discuss the workings of an extension that C++ makes to this scheme, namely, general initializers for globals. For example, I can say:

int f() {return 37;}
int i = 47;
int j = f() + i;

In some simple cases, a clever compiler can compute the value that should go into j, but in general, such values are not computable at compile time. Note also that sequences like:

class A {
public:
  A();
  ~A();
};

A a;

are legal, with the global a object constructed before the program "really" starts and destructed "after" the program terminates.

Because values cannot be computed at compile time, they must be computed at runtime. How is this done? One way is to generate a dummy function per object file:

int f() {return 37;}
int i = 47;
int j; // = f() + i;
static void __startup()
{
  j = f() + i;
}

and a similar function for shutdown as would be needed for calling destructors. Using a small tool that will modify binaries and an auxiliary data structure generated by the

compiler, it's possible to link all these _startup() function instances together in a linked list that can be traversed when the program starts.

Typically, this is done by immediately generating a call from within main() to a C++ library function _main() that iterates over all the _startup() functions. On program exit, similar magic takes place, typically tied to exit() function processing. This approach is used in some compilers but is not required; the standard mandates "what" rather than "how."

Some aspects of this processing have precedent in C. For example, when a program starts, standard I/O streams stdin, stdout, and stderr are established for doing I/O.

Within a given translation unit (source file), objects are initialized in the order of occurrence and destructed in reverse order (last in, first out). No ordering is imposed between files.

Some ambitious standards proposals have been made with regard to initialization ordering, but none has caught on. The draft standard says simply that all static objects in a translation unit (objects that persist for the life of the program) are zeroed, then constant initializers are applied (as in C), then dynamic general initializers are applied "before the first use of a function or object defined in that translation unit."

Calling the function abort() defined in the standard library will terminate the program without destructors for global static objects being called. Note that some libraries, for example, stream I/O, rely on destruction of global class objects as a hook for flushing I/O buffers. You should not rely on any particular order of initialization of global objects, and using a startup() function called from main(), just as in C, still can make sense as a program-structuring mechanism for initializing global objects.

Jumping Past Initialization

C++ does much more with initializing objects than C does. For example, class objects have constructors, and global objects can have general initializers that cannot be evaluated at compile time.

Another difference between C and C++ is the restriction C++ places on transferring control past an initialization. For example, the following is valid C but invalid C++:

#include <stdio.h>

int main()
{
  goto xxx;
  {
    int x = 0;
 xxx:
    printf("%d\n", x);
  }
  return 0;
}

With one compiler, compiling and executing this program as C code results in a value of 512 being printed, that is, garbage is output. Thus the restriction makes sense.

The use of goto statements is best avoided except in carefully structured situations such as jumping to the end of a block. Jumping over initializations can also occur with switch/case statements.

 

?Need help? Use our Contacts page.
First posted: 21st November 1997 efc
Last changed: 3 Dec 97 efc
Issue index
;login: index
USENIX home