Developing Programmers .com

Local Search:



This site is optimized for standards so you can use any standards compliant browser:

Valid XHTML 1.0 Transitional
Valid CSS!
(RSS) RSS Feed

Web Search:
Google


Friday, 27 January, 2006

The Standard Template Library  

Professional programmers avoid re-inventing the wheel unless they have a very, very good reason. This means you should have a good familiarity with the standard libraries available for your languages of choice, and you should at least browse downloadable libraries from time to time so you know what’s “out there”.

A surprising number of people who code in C++ are not familiar with the Standard Template Library, or “STL”. The STL has been a C++ standard for about 7 years now. Its most important components are the “string” class and the various “container” templates.

There are other libraries that offer similar functionality to the STL, such as the Core module of Qt, which is my personal favorite. I’ll probably write a tutorial on Qt some time, but even if you do prefer Qt you should know about the STL, since it is the standard and you can count on it being available in more situations than third party libraries like Qt.

How Long is a String?

The string class is a replacement for using “char*” or, worse, character arrays to represent strings. Even the university where I graduated I’m told they were still teaching character arrays as the primary way to handle strings in 2005.

Character arrays and “char*” do have their place in C++ programming, but the vast majority of security vulnerabilities and an awful lot of bugs are caused by their overuse: for a fixed sized string, character arrays make at least one and possibly two shaky assumptions about the string’s maximum size:

  1. That the string will never need to exceed your specified size, even when people start using your code in places you never dreamed of.
  2. That the string will never accidentally exceed your specified size, for example by “strcat()” adding something without checking (this happened at work recently), or by someone specifying a long path where you’d expected just a file name.

Remember, the whole reason you’re using a string variable instead of a literal is that you don’t want to make assumptions about what the value will be.

There are two common ways around strings with a fixed maximum size:

  • Allocate space dynamically as you go. Usually people who do this find themselves writing a small string library to support the habit.
  • Use a standard string library that solves all this for you and usually handles deallocation in nifty ways that make strings as convenient to allocate and manage as “ints” are.

The STL offers a class called “string“, or in full, “std::string” . The “std::” part is a C++ namespace, which can help you to avoid the library disasters you get when multiple libraries want to use symbols of the same name.

The “using” directive lets you write the short names for standard symbols like “string” in your code, and some people prefer to work that way. My view is that unless you take the time to read up on what exactly the pitfalls are you should stick to writing the fully qualified name.

Here’s an example of C++ strings (eg1.cpp):

    #include <iostream>
    #include <string>

    int main()
    {
            std::string path = "/home/sarah/examples";
            std::string filename = "foo.txt";

            std::string fullname = path + filename;

            std::cout << "The file is " << fullname << "\n";

            return 0;
    }

Running this program prints:

The file is /home/sarah/examplesfoo.txt

This is pretty intuitive. To create a string we need to have included the header, and we simply create our string variables of type std::string.

We can assign string literals or any string or char* to a string, although the “=” operator is actually copying the other string, not just pointing to it.

To illustrate this point, consider the code:

    char *str = "hello";
    char *foo = str;
    std::string bar = str;
    str[0] = 'H'; // Change first letter to uppercase
    std::cout << "foo = " << foo << "\n";
    std::cout << "bar = " << bar << "\n";

This will print:

Hello
hello

The “foo” variable pointed to the same storage as “str”, so it was affected by later changes to “str”. The “bar” variable made its own copy of “str”, which was not effected by the change.

The writers of the std::string class worked hard to make string act just like basic types like “int”, so you can copy with “=“, compare with “==“, and so on.

Also notice how memory allocation is done: it’s all implicit. The string class allocates and frees memory as needed, fully automatically, and even gets it right in tricky situations like storing intermediate results and return values.

You don’t need to mess with pointers, the storage space issues are dealt with “under the hood”. Copying and assigning all happens just like it does for “int” or “float”. This feature on its own has probably saved C++ developers from countless bugs.

Joining Two Pieces of String

The “+” operator makes little sense with a char* string, but for a std::string it is redefined by the string class to join the two strings end-to-end. This makes the “+=” operator very similar to strcat(), the main difference being that memory allocation is automatic and safe.

Consider this example (eg2.cpp):

    #include <string>
    #include <iostream>

    int main()
    {
            std::string sentence = " and ";
            std::string b = "beginning";
            sentence = b + sentence;
            sentence += "end.";
            std::cout << sentence << "\n";
            return 0;
    }

This will print out “beginning and end.”, having combined the three strings. A trap to watch out for is to make sure you don’t try to use std::string operators on char* strings. The addition operators on a std::string will concatenate the strings as desired, but on a char* they will attempt to add to the pointer. In most situations this will give compiler errors, but in certain situations where the pointer arithmetic happens to be legal, it’ll compile but do something very very wrong.

So:

    std::string foo = "apple " + "banana ";

would have to be rewritten to something like:

    std::string foo = "apple ";
    foo += "banana";

or to:

    std::string foo = std::string("apple ") + "banana ";

otherwise we are trying to add two char* types together, which just doesn’t work.

Same String?

STL strings have their own compare function that replaces strcmp():

    string1.compare(string2)

where string1 must have type std::string and string2 can be any of a range of stringish types including std::string or char*. Like strcmp(), this returns a negative number if string1 < string2, zero if they’re the same, or a positive number if string1 > string2.

Going beyond this however, most comparison operators have been set up to “just work” on strings, so you can say:

    if (string1 < string2)
            std::cout << "string1 first\n";
    else if (string1 == string2)
            std::cout << "string1 and string2 are equal\n";
    else if (string1 > string2)
            std::cout << "string2 first\n";
    else
            throw("Error, should not get here");

This is very convenient. If you tried this with char* strings,
the comparison operators would compile fine (since pointers can be compared), but would not give the answers you wanted.

Speedy Strings?

Given the convenience of string libraries like std::string and the hours you can waste debugging trivial problems like buffer overruns and making sure hand-allocated strings get freed, the only argument remaining in favor of char* or do-it-yourself strings is speed.

The std::string library optimizes speed in some very tricky ways, including using copy-on-demand to only really copy a string when it sees it should have copied it. This all happens behind the scenes, but reading up on the details of how they work will allow you to use the standard string functions in the fastest way possible if you’re worried about speed.

The standard string class is only a little slower than handling strings yourself, but is much less likely to produce bugs. For writing most applications, optimizing for minimum bugs is far more important than optimizing for speed. Truly speed critical situations like inner loops in action games are the only places where there’s a reason not to use std::string.

Vectors

Arrays in C++ are problematic, for much the same reasons that strings are. They’re so prone to size blowout and there is no bounds checking to keep you honest. If you say a[5]=2 on an array with only 3 elements, C will happily overwrite whatever was in memory beside the array. Lets just
hope it wasn’t anything important like the return address for the function you’re in…

Also, arrays are tricky to return from functions. An array that is local to a function has its memory recycled when the function returns, so if you’re passing back a pointer to that memory its only a matter of time before that recycled memory gets used for something else and the array filled with wrong data. The pain of debugging array problems can be avoided by using the modern C++ replacement: Vectors.

C++ defines a “std::vector” template. If you haven’t used templates before, they’re just entities in a C++ program that let you substitute in type names to customize them for a particular job.

So you can customize a vector to store floating point values by typing:

            std::vector<float> name;

or you can make a vector of strings using:

            std::vector<std::string> name;

Here’s a more detailed example which shows how to create a vector of integers (eg3.cpp):

    #include <iostream>
    #include <vector>

    int main()
    {
            std::vector<int> Vicky(5);

            Vicky[0] = 5;
            Vicky[1] = 9;
            Vicky[2] = 1;
            Vicky[3] = 3;
            Vicky[4] = 2;

            int i;
            for (i = 0; i < 5; i++)
            {
                    std::cout << "Element " << i
                            << " = " << Vicky[i] << "\n";
            }

            return 0;
    }

This will print out:

Element 0 = 5
Element 1 = 9
Element 2 = 1
Element 3 = 3
Element 4 = 2

The usage of a vector is similar to a normal array, so it’d be easy to replace an array in your program with this kind of vector.

As with STL strings, you can safely return vectors from functions and assign them with “=” and generally throw them around as if they were as simple a type as “int“.

One important difference between vectors and arrays is that vectors offer an “at()” operator. If you said “Vicky.at(3)” instead of “Vicky[3]” then the function would check for you whether the array was big enough to have an element with index 3. If not, it throws an exception. This means that whether your program handles the exception or not, it will at least not press on as though nothing was wrong.

Unlike arrays, vectors always know their size so you don’t have to pass a separate parameter for the size. You can ask a vector how many elements it has using the “size()” function: “int elements = Vicky.size();“.

You can also resize vectors, either using a “resize()” function to set a definite size, or simply by adding an element at a time and letting the vector grow to match.

    Vicky.resize(30); // Grow or shrink Vicky so it now has 30 elements

or

    Vicky.push_back(7); // Grow by one element at the end of the array,
            // and set the new element's value to 7.

There’s a matching pop_back() function that removes the last element.

Another advantage over normal arrays is that there are insert() and erase() functions that can add or delete array elements anywhere in the array, and automatically make room or compact surrounding elements back together to close a gap. If you’re using these a lot though, perhaps you should consider using a std::list

Lists

The “std::list” template is an easy way to handle linked lists. Compared with vectors, there’s a down side and an up side:

The down side is that locating an arbitrary element involves scanning through each element in turn from some known point in the list. The built in functions make this convenient, but not fast.

The up side is that you can insert or erase elements anywhere in the list very quickly, because in a linked list you can just add the element and re-arrange a few pointers. There’s no need to make room for a new element by shuffling all the other elements along.

Equivalent functions for vectors and lists have the same name. This is partly so that you can remember the function names, but it also allows you to change your mind about which data type you’re using, provided the new data type has the functions you need. If you find you’re inserting elements frequently, it’s easy to replace a vector with a list.

Here’s an example of a list (eg4.cpp):

    #include <iostream>
    #include <list>

    int main()
    {
            std::list<int> Linus;

            Linus.push_back(5);
            Linus.push_back(9);
            Linus.push_back(1);
            Linus.push_back(3);
            Linus.push_back(2);

            std::list<int>::iterator i;
            for (i = Linus.begin(); i != Linus.end(); ++i)
            {
                    std::cout << "Element is " << *i << "\n";
            }

            return 0;
    }

The “begin()” and “end()” style iteration setup is common to nearly all the container classes, so you can loop through vectors, lists and deques the same way.

Deques

“Deque” is short for “Double Ended Queue”.

A “std::deque” is like a vector because random access to elements is fast, but also like a list because you can add or remove elements quickly from the start or end of a deque.

The down side is that inserting in the middle of a deque may be slow (it is for queue-like entities after all), and you can’t assume the storage is kept in a continuous block like you can with vectors.

Deques are usually implemented internally as a kind of tree of small array “chunks”, but as with all the container classes the standards only specify what kinds of operations should be fast, and leave library writers to decide the details of how they work.

Here’s an example of a deque (eg5.cpp):

    #include <iostream>
    #include <deque>

    int main()
    {
            std::deque<int> David;

            David.push_back(5);
            David.push_back(9);
            David.push_back(1);
            David.push_back(3);
            David.push_back(2);

            while ( ! David.empty() )
            {
                    std::cout << "Element is " << David.front() << "\n";
                    David.pop_front();
            }

            return 0;
    }

Maps

I’ve saved the best for last. One way to think of arrays and vectors is as a “mapping” from index numbers to values. You put in a number, it gives you a value. The value can have any type because you can have arrays of any type.

What if you want to do the lookup by name instead of number? Or by date? That’s where maps come in.

The “std::map” template requires you to specify two types instead of one:

  1. A data type for stored values. This is the same as the value type for lists, vectors etc.
  2. A data type for the “key”, the value used to look up the data. For vectors this is always “int”, so you didn’t have to specify a type. For maps, it can be any type that you can compare.

For example, we can compare the “std::string” type because it has a less-than (”<“) operator defined. That means we can do neat tricks like this (eg6.cpp):

    #include <iostream>
    #include <map>
    #include <string>

    int main()
    {
            std::map<std::string, int> Matt;

            Matt["January"] = 1;
            Matt["February"] = 2;
            Matt["March"] = 3;
            Matt["April"] = 4;
            Matt["May"] = 5;
            Matt["June"] = 6;
            Matt["July"] = 7;
            Matt["August"] = 8;
            Matt["September"] = 9;
            Matt["October"] = 10;
            Matt["November"] = 11;
            Matt["December"] = 12;

            std::cout << "May is the " << Matt["May"] << "th month\n";

            return 0;
    }

If you’ve ever done any Perl programming, maps are like Perl’s “hash” feature. There’s all kinds of things you can do with them once you get in the habit of using them.

I’ve done word counts (map a string word to an integer count), file contents caching (map a string filename to a string storing the file’s contents), checking whether certain ID numbers have been seen before (mapping from an int ID number to bool seenflag map), etc.

Containers of Containers

Because you can safely throw containers around like any of the basic types, they are safe to place inside each other. You can have a map of a vector of lists of strings, and when it goes out of scope its destructor will magically delete all the vectors, lists and strings stored inside the map.

The only catch is if you store pointers to things, then the pointer itself is destroyed but the thing it points to isn’t.

This is partly the practicalities of deleting the item being stored, and partly necessary for situations where you are referring to another object that you don’t want to delete. For example, you might have a vector of strings representing the contents of a file, and have a map of pointers to strings letting you identify which line certain words appear on. If it was a map of strings, the strings would be deleted when the map was. But it’s a map of pointers to strings so the pointer is deleted without being followed, and the original vector of strings is left intact.

Wrap-up

The STL, and other (less standard) libraries like it, offer a convenient and less bug-prone way to throw around strings, linked lists, arrays, and more within your program. The algorithms are fast and the objects created can be passed as parameters or accepted as return values without worrying about pointers. The pointer stuff needed to make them fast is all handled for you so you can just focus on getting the job done, not on writing your hundredth variation on a linked list.

This was just a brief introduction to get you started with the STL, there’s a lot more to learn. Good resources include:

  • The Wikipedia entry on the STL.
  • CPP Reference .com has a page about the STL.
  • SGI have a good page on STL programming.
Posted by sarah at 12:16 am in: Libraries , Tutorials (5885 views)

1 Comment

  1. Hmm… how could I miss this one? A good reason to stop development and revise your knowledge.

    Comment by Alex — On 12-2-2007 at 4:41:08 PM

Please use the DP Forums for further discussion of this topic.