Copying Strings in C
In C language, you cannot use the assignment operator = to copy the content of one string into another string.
For this purpose, two functions have been introduced in the C standard library: strcpy and strncpy.
These two functions are defined in the header file string.h and allow you to copy a string into another string. In this lesson, we will see how these two functions work and how we can use them safely.
Function strcpy - Copying a String
The strcpy function copies a string into another string. The destination string must be large enough to contain the source string.
To use it, you need to include the header file string.h:
#include <string.h>
Its prototype is as follows:
char *strcpy(char *dest, const char *src);
The function copies the content of src into dest and returns dest. Specifically, the function copies all characters of src into dest, including the termination character '\0'. For this reason, you must ensure that the destination string has enough space to contain the source string. The function returns a pointer to the destination string, that is, dest.
In particular, if the source string is composed of n characters, the destination string must have at least n + 1 characters to contain the source string and the termination character '\0'.
The existence of this function in the standard library compensates for the fact that in C language there is no assignment operator for strings. Specifically, assigning one string to another string does not copy the content of the source string into the destination string, but copies the pointer to the source string into that of the destination string. This means that if you assign one string to another string, and then modify the source string, the destination string will also be modified.
char s1[] = "Hello";
char s2[] = "World";
s2 = s1; // s2 points to the same string as s1
s1[0] = 'J'; // s2[0] becomes 'J'
printf("%s %s\n", s1, s2); // prints "Jello Jello"
Moreover, an initialization expression of this kind is incorrect:
char s1[128];
s1 = "Hello"; // ERROR
The assignment operator = cannot be used to copy strings
Since strings in C language are, in essence, pointers to character arrays, the assignment operator = cannot be used to copy the content of one string into another string. Instead, the assignment operator = copies the pointer to the source string into that of the destination string.
The following code is incorrect:
/* ERROR */
char s1[128];
s1 = "Hello";
For this reason, in both cases it is necessary to use the strcpy function to copy a string into another string.
char s1[] = "Hello";
char s2[] = "World";
strcpy(s2, s1); // copies the string of s1 into s2
s1[0] = 'J'; // s2[0] remains 'H'
printf("%s %s\n", s1, s2); // prints "Jello Hello"
Note that, in the example above, s2 and s1 are two arrays with the same number of characters. Therefore, copying the string s1 into s2 does not cause problems.
char s1[128];
strcpy(s1, "Hello"); // copies the string "Hello" into s1
In most cases, the result of strcpy is discarded. There are cases where, instead, it is useful to use that result. For example, suppose we want to initialize two strings with the same string literal. We can concatenate two invocations of the strcpy function as follows:
char s1[128];
char s2[128];
strcpy(s2, strcpy(s1, "Hello")); // copies the string "Hello" into s1 and into s2
Function strcpy
The strcpy function copies a string into another string.
It is defined in the header file string.h:
#include <string.h>
Its signature is as follows:
char *strcpy(char *dest, const char *src);
destis a pointer to the destination string;srcis a pointer to the source string.
The function copies the content of src into dest and returns dest. Specifically, the function copies all characters of src into dest, including the termination character '\0'.
When using the strcpy function, you must ensure that:
srcpoints to a valid location;desthas enough space to contain the source string.
The strcpy function is not safe
The strcpy function does not check if the destination string is large enough to contain the source string. If the destination string is not large enough, the strcpy function could write beyond the bounds of the destination string, causing a segmentation fault. For this reason, it is necessary to ensure that the destination string has enough space to contain the source string.
To avoid these problems, the strncpy function was introduced in the C standard. When writing programs, it is advisable to always use the strncpy function instead of strcpy.
Function strncpy - Copying a String with Control
The strncpy function copies a string into another string, so it is similar to the strcpy function. The difference is that the strncpy function accepts one more parameter, that is, the maximum number of characters to copy.
Its prototype is as follows:
char *strncpy(char *dest, const char *src, size_t n);
The function copies the content of src into dest and returns dest. Specifically, the function copies at most n characters of src into dest. The function returns a pointer to the destination string, that is, dest.
For example:
#define MAX_LEN 128
char s1[] = "Hello World!";
char s2[MAX_LEN];
/* Copies at most 128 characters of s1 into s2 */
strncpy(s2, s1, MAX_LEN);
In this example, as long as the string s2 is large enough to contain the string s1, the strncpy function will copy the string s1 into s2. If the string s2 is not large enough to contain the string s1, the strncpy function will copy only the first n characters of s1 into s2. In this case, at most 128 characters will be copied (including the string terminator).
For this reason, the strncpy function also hides a security problem. If the destination string is not large enough to contain the source string, the strncpy function does not copy the termination character '\0'. A safer way to invoke it is as follows:
#define MAX_LEN 128
char s1[] = "Hello World!";
char s2[MAX_LEN];
/* Copies at most 127 characters of s1 into s2 */
strncpy(s2, s1, MAX_LEN - 1);
/* Adds the string terminator */
s2[MAX_LEN - 1] = '\0';
This way we ensure that there is at least one space for the termination character '\0' which, however, we will have to insert manually. Using this technique, we will always be sure that the string s2 is terminated correctly.
Function strncpy
The strncpy function copies a string into another string, limiting the number of characters copied.
It is defined in the header file string.h:
#include <string.h>
Its signature is as follows:
char *strncpy(char *dest, const char *src, size_t n);
destis a pointer to the destination string;srcis a pointer to the source string;nis the maximum number of characters to copy.
The function copies at most n characters of src into dest and returns dest. If the source string is shorter than n, the function copies all characters of src into dest, including the termination character '\0'. If the source string is longer than n, the function copies only the first n characters of src into dest.
When using the strncpy function, you must ensure that:
srcpoints to a valid location;desthas enough space to contain at leastncharacters, including the termination character'\0'.
The strncpy function is not safe
The strncpy function does not copy the termination character '\0' if the destination string is not large enough to contain the source string. For this reason, it is necessary to ensure that the destination string has enough space to contain at least n characters, including the termination character '\0'.
Using strncpy
A safe way to use the strncpy function is as follows:
- Assuming that the destination string can contain at most
ncharacters (including the terminator'\0'), invoke thestrncpyfunction withn - 1as the third argument; - After invoking
strncpy, manually add the terminator'\0'at the end of the destination string.
#define MAX_LEN 128
char destination[MAX_LEN];
strncpy(destination, source, MAX_LEN - 1);
destination[MAX_LEN - 1] = '\0';
Implementation of strcpy and strncpy
Although they are library functions, it is very instructive to implement strcpy and strncpy from scratch. Implementing them from scratch allows us to better understand how strings work in C language. It also allows us to understand why these functions are vulnerable and how we can protect ourselves from these problems.
Let's start with a possible implementation of strcpy which we will call my_strcpy:
1 2 3 4 5 6 7 8 9 10 11 12 13 | |
Let's analyze the code:
- Line 2: We initialize a pointer
pto the destination string. This will allow us to return the destination string at the end of the function. -
Line 4-8: We copy the characters of the source string into the destination string until we reach the termination character
'\0'. For this we use awhileloop that scrolls through the characters ofsourceand copies them intodestination.Note that, at lines 6 and 7, we increment the pointers
destinationandsourceto move to the next character. -
Line 10: We add the termination character
'\0'at the end of the destination string. - Line 12: We return the destination string.
This is, more or less, the implementation of strcpy in the C standard library. If we observe it carefully, we can notice what the security problems of this function are:
-
The first problem is that the function does not check the validity of the pointers. Specifically, it does not check if
sourceanddestinationpoint to valid memory locations. If one of the two pointers is not valid, the behavior of the function is undefined. -
The second problem is that the function does not check if the destination string has enough space to contain the source string. After all, it would have no way to do so. In C, unfortunately, strings do not contain, by default, information regarding their length. For this reason, if the destination string is not large enough to contain the source string, the
strcpyfunction will write beyond the bounds of the destination string, causing a segmentation fault.Specifically, suppose that
destinationis a string that can contain at most 10 characters. Ifsourceis a string of 17 characters, for example"Hello, how are you?", this happens:-
Initially, assuming that
destinationcontains only null characters, the destination string is as follows:+----+----+----+----+----+----+----+----+----+----+ | \0 | \0 | \0 | \0 | \0 | \0 | \0 | \0 | \0 | \0 | +----+----+----+----+----+----+----+----+----+----+ ^ | pThe pointer
ppoints to the first location. -
The function starts copying the first characters of
sourceintodestination. After copying the first 10 characters, the destination string is as follows:+----+----+----+----+----+----+----+----+----+----+ | H | e | l | l | o | , | | h | o | w | +----+----+----+----+----+----+----+----+----+----+ ^ | pThe pointer
ppoints to the tenth character ofdestination. -
But, at this point, since the string
sourceis not terminated, the function continues to copy the characters ofsourceinto the adjacent memory locations:+----+----+----+----+----+----+----+----+----+----+ + + + | H | e | l | l | o | , | | h | o | w | | a | r | +----+----+----+----+----+----+----+----+----+----+ + + + ^ | p -
These locations, however, do not belong to the destination string. What happens in these cases is not predictable and can cause undefined behaviors. In the worst case, the program could terminate with a segmentation fault.
-
The strncpy function was introduced in the C standard library to solve the second problem. Let's see an implementation of strncpy which we will call my_strncpy:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | |
Let's analyze the code:
- Line 2: We initialize a pointer
pto the destination string. This will allow us to return the destination string at the end of the function. -
Line 4-9: We copy at most
ncharacters of the source string into the destination string until we reach the termination character'\0'. To do this, we use awhileloop that scrolls through the characters ofsourceand copies them intodestination. The loop terminates when we have copiedncharacters or when we reach the termination character'\0'.-
Note that, at lines 6, 7 and 8, we increment the pointers
destinationandsourceto move to the next character and decrementnto keep track of the number of characters copied. -
The condition of the
while, in fact, checks that there are still characters to copy (n > 0) and that we have not reached the termination character'\0'(*source != '\0').
-
-
Line 11-15: At this point, two situations may have occurred:
-
We have copied all characters of
sourceintodestination, but the source string is not terminated. In this case, we must add the termination character'\0'at the end of the destination string. To do this, we use a secondwhileloop that adds the termination character'\0'a number of times equal to the remaining characters to copy (n). -
We have copied
ncharacters ofsourceintodestination, but the source string is not terminated. In this case the secondwhileloop is not executed.
-
-
Line 17: We return the destination string.
This is also, approximately, the implementation of strncpy in the C standard library. Although this function improves security compared to strcpy, it is not immune from problems. Let's see what they are:
-
The
strncpyalso does not check the validity of the pointers. If one of the two pointers is not valid, the behavior of the function is undefined. -
It does not check the validity of the parameter
n. Ifnis greater than the actual length ofdestination, even in this case the function will write beyond the bounds of the destination string.
In Summary
In C language we cannot use the assignment operator, =, to copy the content of one string into another string. This is because, in C, strings are pointers to character arrays. Therefore, the assignment operator = copies the pointer to the source string into that of the destination string. To copy the content of one string into another string, we must use the library functions strcpy or strncpy.
- The
strcpyfunction copies a string into another string. The destination string must be large enough to contain the source string. - The
strncpyfunction copies a string into another string, limiting the number of characters copied.
In the next lesson we will see another important function of the C standard library for string manipulation: strlen.