To share this page click on the buttons below;

Arrays and strings

Very often (let's say always) a programmer needs to group variables of the same type together: the most straightforward example of that are strings of text. We saw that we can define a variable of type char to contain one character, but we want our programs to be able to deal with text not just one character by one (it will be extremely boring to define a variable for each character of a text, isn'it?) instead we can define an array of char to deal with an entire text. Commonly an array of char is referred as a string.

Arrays are not limited to chars, but you can define arrays made by whatever type. For example an image might be treated as an array of integers.

Although I will not specifically deal with this argument here, I would like to point out shortly that arrays are pointers. This is the first time we actually meet pointers although somehow in a hidden form: let's keep this information because it will be important when I will finally describe pointers in detail.

Arrays

You declare an array of a certain type in a very similar way you declare a variable: the only difference is that at the end of the variable name you put a square parenthesis and inside that you put the dimension of the array (i.e. how many variables of the same type the array will hold).

/* an array of 256 chars */
char str[256];

/* an array 20142 integers */
int image[20142];

/* an array of 12 float */
float numbers[12];

/* an array of 32 double */
double realnumbers[32];

The first array of chars has name str and will be able to hold 256 characters. In the same way image is an array that holds 20142 integers, numbers is an array capable of memorize 12 floats and realnumbers is an array of 32 doubles.

With that notation you are telling the compiler to reserve enough space in memory to contain the whole array, the trick is the same used for variables: the name of the array is the human readable form of the starting address of the array and the dimension of the array will be n times the dimension reserved for a single variable of the same type (where n is the dimension of the array, i.e. the number written in square brackets).

Bear in mind that this space is really reserved in memory: all that space will not be available for other purposes, it is reserved before the program execution either you use it all or not. We will see later that there are techniques that allows you to grabs space during the execution of the program, this will permit you to not overestimate the dimension of your arrays.

When you define an array the compiler reserves the space in memory for that array in a unique consecutive space in memory so that all the variables the array is composed of, are memorize one after one in memory.

You can access every single element of an array by indexing the array, i.e. you put after the name of the array a pair of square brackets and inside that, the index of the element you want to access. The indexes of an array always start from 0 and goes to the dimension of the array minus 1. For example suppose you defined an array of integers named img which has 20 elements (so you declared it with img[20];): you access the first element of the array with img[0] and the last element of the array with img[19]. You cannot access the array outside of that boundaries and it is your responsibility do not exceed these limits. So pay attention!

The following example (available here) shows all these concepts about arrays:

#include <stdio.h>

/* program to show the use of arrays */

void main(void) {
    /* an array that contains 4 integer */
    int a[4];
    int b = 0;

    printf("len of array: %d\n",sizeof(a));

    /* print all the integers of the array */
    printf("\na[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n", a[0],a[1],a[2],a[3]);

    /* set the first integer in the array */
    a[0] = 4;

    /* set the second integer in the array */
    a[1] = 3;

    /* set the third integer in the array */
    a[2] = 2;

    /* set the fourth integer in the array */
    a[3] = 1;

    /* print all the integers of the array */
    printf("\na[0] = %d, a[1] = %d, a[2] = %d, a[3] = %d\n", a[0],a[1],a[2],a[3]);

    printf("\nb = %d\n",b);
    /* assign the content of the second element of the array to b */
    b = a[2];
    printf("\nb = %d\n",b);  
}

Just a couple of notes: when we print the element of the array for the first time (the second printf of the program) we are using the array before the initialization of the elements of the array, if you run the program in this situation you will see that random numbers are printed. They are just the random content of the memory and of course you cannot rely on those values, i.e. you cannot assume which is the content of the variable inside your array before their initialization (the very same happens for variables and it for this reason that is a good practice to always properly initialize the content of a variable or an array before its use).

Then we are initializing the content of all the elements of the array one by one and when we write b = a[2]; we are using the value of the third element of the array (the index is 2 but we start counting from 0!) to put that value inside the variable b.

There is a way we can define an array just in the same way we define a variable, but in this case we have to provide a list of values enclosed in braces, separated by commas. For example we can define (and so proper initialize) our array a by changing the line 7 of the example with the following one:

int a[4] = {3,4,5,6};

If you modify the example like so, you will see that now the values of the array are no more random, but they assume the values you wrote inside the braces.

If you use this technique you can avoid to explicitly write the dimension of the array enclosed in square brackets. The compiler infers the dimension of the array from the following list of values.

int a[] = {3,4,5,6};

Stay within boundaries

An array with n elements has (integer) indexes from 0 to n-1 and it is forbidden to indexing an array with a number which is outside these boundaries.

It is important to understand why. I hope an image might help here.

Visual reprentation of a C array

Variables, functions and arrays (and everything else) are somewhere in memory, which is a continuous series of cells, each one identified by an address. An array is identified by the compiler by its address and its dimension and when you declare it, the compiler reserves the space somewhere in the memory as it does for everything else. So before the array starts (in memory) and after the array stops (in memory) there is something else (a variable or another array or a function). If you use indexes which are outside the boundaries of your array you are accessing something which has no relation with the array itself. If you read it, your program is accessing random data and probably it will not work. If you write it, you are overwriting something else and your program will probably crash. C is the lowest between the high level programming language and it does not implement in native way any protection on the indexes you use.

It is probably worth to note here how the elements of the array are accessed. The array is just the starting address of the reserved memory associated with array. When you ask for the element 0 the compiler give you the content of what is present at that address. When you ask for i-th element the compiler takes the starting address of the array and sum to that the displacement, i.e. the dimension of the variables in the array between the starting point and the index you asked (the compiler knows the space because it knows the type of the variables of the array) and simply returns what it founds at that address (which is starting address + displacement).

The size of an array

You can obtain the dimension of an array using the operator sizeof as we did for single variables. Pay attention that sizeof will return the length of the array in bytes. If you want to know how many elements the array is holding (the size of the array in terms of the type you stored) you need to divide the size of the array (in byte) by the size of the variables held (in bytes) and that, of course, can be obtained again with sizeof. The following short example shows you the concept:

#include <stdio.h>

/* program to show the use of arrays */

void main(void) {   
   int a[] = {2,13,41,4};

   printf("\n  lenght of the array in bytes:  %d\n", sizeof(a));

   printf("  number of elements of the array: %d\n", sizeof(a) / sizeof(int));
}

It turns out that you can apply the sizeof not only to a variable but to a type too: that will give you the size of the that type in bytes.

Strings

Strings in C are arrays of chars with some little particularities. The C approach to string is a low level approach and does not take into account some security issues that can come out from this approach.

To understand the special nature of strings we need to focus on the problem they have to deal with. Strings are made to deal with text and texts have always different dimensions which cannot be estimated in advance, besides operations on strings (concatenating different part of text, for example) can change the dimension of the text stored into the string. But a C array only has the notion of the dimension of the array (i.e. the maximum number of characters the array can contain) and it does not hold the number of elements really used by the string of text. To keep the information of the length of the string (i.e. the number of actual characters of text stored into the string, which obviously is always lower than the dimensions of the array) the C language introduced the concept of null terminated string.

The idea is very simple: a special character is defined to signal the end of the text inside the string. So usually a string is defined as an array of char a little bigger than the text we want to handle and we put this special character at the end of our text, that is the signal that the text is finished.

If you want to store a text composed by n characters you need an array of chars with n+1 elements because you need to reserve the space for the termination character.

The termination char is defined as ‘\0‘ (with single quote) which as a value of 0 (remember that chars are integer number associated with a character), from the value the fact that C strings are null terminated: a 0 into the sequence of integer coding the text signals that the string is finished.

You can define and initialize a string letter by letter:

char ch[] = {'a','b','c','d','e','f','g','h','i','j','\0'};

But that is boring and you must remember the termination string, handier you can simply enclose the text of your string in double quote. The termination character is automatically added in this case.

char ch_1[] = "That is a string of text";

You can see that the space for the termination character is really reserved with:

printf("elements of ch_1: %d\n", sizeof(ch_1)/sizeof(char));

although it is not displayed when you print the string (the placeholder for the string in the printf function is %s):

printf("%s\n",ch_1)

What happen if you forgot or overwrite the termination character and you try to print the string? Something bad: the printf will try to print the string from its start to the first termination character, in other words the printf look for a 0 (the termination character) and so it continues to print whatever it founds after the string. If after but near to the string position in memory there is a 0, probably just some garbage (the content of the memory from the end of the string to this first 0) will be printed after the actual content of the of the string, but if a 0 is not casually present nearby the printf will probably try to access some reserved memory and your program will crash.

Strings functions and practices

In this paragraph I will show some functions commonly used in strings manipulation along with some common practices. The complete example program will be available at the end of the paragraph.

All these functions are available from the standard C library (that is available by default) you just need to include the appropriate header file:

#include <string.h>

To reset the content of a string and to initialize all the content of the string to the null terminator you can use the memset function: it is a good practice to memset the content of your string to the null terminator at the beginning, this way you can fill the string with what you want and it will be already always correctly null terminated (unless of course you write more chars the string can contain, which is bad).

memset(str,'\0',sizeof(str));

The first argument of memset is the string, the second is the value you want to set all the elements (you can use memset to set all the string to whatever value you want) and the third is the length in byte of the string (but you can set also only a part of the string).

Since the null terminator character ‘\0‘ is associated with the value 0 you can achieve the same result with following sintax:

memset(str2,0,sizeof(str2));

Now we have our strings "nullified" and we can write something inside. Can we simply assign a string with a text like this?

str = "prova";

The answer is no: our string str is an address in memory but also our string "prova" is (which is a constant string not "assigned" to a variable name) is an address in memory. So here we are asking the compiler to make two different locations in memory the same: obviously that is not possible.

To put something in our string we can print in our string like we do on the screen using the sprintf : this function works exactly in the same way printf except that it has an additional first argument which is the string we want to print on.

sprintf(str,"the size of str (in bytes) is %d",sizeof(str));

Here it comes a problem: what if we write something that is longer the size of the string? Well, again, that is bad, because we are overwriting the content of the memory which has not relation with the string (and our program will probably crash) and again is the responsibility of the programmer to ensure that this does not happen. For this reason an "improved" version of the function is available: snprintf. Again the behavior is still the same but now an additional parameter (the second after the string to write on) is given to the function to communicate the size of the string. This way the function can stop to write in the string when it is about to overwrite something else.

snprintf(str2, sizeof(str2), " that is all folks! but this exceed the dimension of the string!");

The snprintf just write at most n − 1 bytes (where n is the size of the string in bytes) because it correctly leaves room for the terminator character.

The fact that there are two version of the same function is quite common for string functions because of the lack of security in the C approach, for this reason over the time new versions (with additional parameters) are added trying to reduce potential problems. In the standard C library there is the first unsafe version (the one without n) and a secured version (the one with n) for some of the string functions. Additional secured functions can be found in specific C library available with different compilers.

To know the real size of a string (i.e. not the length, the reserved space in memory, but how many characters of text the string is holding) you can use the strlen function.

printf("lenght of str %d\n",strlen(str));

You can concatenate two strings (i.e. appending the content of the second string to the first one) using the function strcat:

strcat(str,str2);

Or you can copy just the first n characters of the second string using the strncat version:

strncat(str2," only the first 5 characters will be appended",5);

There are many other string functions, you can have a look on the web to see what the C string manipulation approach can offer. For example a good source of information can be found here C strings;

Finally this is the complete program that summarizes all the previous concepts (available here):

#include <stdio.h>
#include <string.h>

/* program to demonstrate some string functions */

void main(void) {

    char ch[] = {'a','b','c','d','e','f','g','h','i','j','\0'};
    char ch_1[] = "That is a string of text";
    char str[256];
    char str2[25];
    /* this is a pointer (for the moment just use it!) */
    char *look_for;

    /* the elements of the strings */
    printf("elements of ch: %lu\n", sizeof(ch)/sizeof(char));
    printf("elements of ch_1: %lu\n", sizeof(ch_1)/sizeof(char));
    printf("elements of str: %lu\n", sizeof(str)/sizeof(char));
    printf("elements of str2: %lu\n", sizeof(str2)/sizeof(char));

    /* the content of the string */
    printf("%s\n",ch);
    printf("%s\n",ch_1);
    printf("str: %s\n",str);   /* very bad! printing unitialized string */
    printf("str2: %s\n",str2); /* very bad! printing unitialized string */

    /* memset the string to 0 */
    memset(str,'\0',sizeof(str));
    memset(str2,0,sizeof(str2));

    /* printing the strings */
    printf("str: %s\n",str);
    printf("str2: %s\n",str2);

    /* writing something on the strings*/
    sprintf(str,"the size of str (in bytes) is %lu",sizeof(str));
    snprintf(str2, sizeof(str2), " that is all folks! but this exceed the dimension of the string!");

    /* printing the strings */
    printf("str: %s\n",str);
    printf("str2: %s\n",str2);

    /* print the length of the text contained into the string */
    printf("length of str %lu\n",strlen(str));
    printf("length of str2: %lu\n", strlen(str2));

    snprintf(str2, sizeof(str2), " that is all folks!");
    /* concatenating 2 strings */
    strcat(str,str2);
    strncat(str2," only the first 5 characters will be appended",5);

    /* printing the strings */
    printf("str: %s\n",str);
    printf("str2: %s\n",str2);

    /* print the length of the text contained into the string */
    printf("length of str %lu\n",strlen(str));
    printf("length of str2: %lu\n", strlen(str2));

    look_for = strstr(str," that is all folks!");
    if(look_for != NULL) {
        printf("The \"%s\" string is present into the string \"%s\"\n", str2, str);
        printf("loo_for is a string: %s\n", look_for);
    }

    look_for = strstr(str,"NOT present");
    if(look_for != NULL) {
        /* that code will not be executed because look_for is null */
        printf("The \"%s\" string is present into the string \"%s\"\n", str2, str);
        printf("loo_for is a string: %s\n", look_for);
    }
    else {
        printf("String not found\n");
        /* you cannot print look_for here!!!!!!*/
    }
}

A couple of comments about printf placeholders

You probably noticed something unknown in the string held by the printf function. This is a short paragraph about that.

unsigned long placeholder

The placeholder to print an unsigned long is %ul, since sizeof (and others functions) return this kind of type (it actually returns a type size_t, but we will see later that that is an unsigned long) I used this placeholder to print the size of the strings.

printing double quotes

Double quotes have a special meaning: they enclose a string so if you actually want to print a double quote you cannot simply put a double quote inside the string because this will be seen as the end of the string. You have to escape the double quote (escape means that you are explaining to the compiler that the double quote has a different meaning): to do that you simply put in front of the double quote a \.

There are other characters that have to be escaped into a string: we will see them when we will meet them.