Comparing Strings in C
One of the most common operations that can be done with strings in C language is the comparison between two strings.
Comparing two strings means establishing whether a string is less than, greater than or equal to another string. In C language, to do this the strcmp and strncmp functions are used which are defined in the string.h library.
In this lesson we will see how these two functions work and how it is possible to implement them manually.
Lexicographic order and string comparison
Before being able to see how it is possible to compare two strings in C language, it is necessary to dwell on what exactly this operation means.
Usually, when talking about comparing two sequences of characters, or strings, one immediately thinks of alphabetical order. A bit like words are ordered within a dictionary.
In practice, if we have two words, for example tree and bag, we can say that tree is less than bag because the letter t, that is the initial letter of tree, comes before in alphabetical order compared to the letter b, which is the initial letter of bag:
In the case where the words we are comparing start with the same letter, then we must compare the second letter, and so on, until we find a difference.
For example, if we compare the words tree and trap, we can say that tree is greater than trap. This is because the two words start with the same two letters, t and r, but the third letter of tree is an e, while the third letter of trap is an a. And since the letter e comes after the letter a in alphabetical order, we can say that tree is greater than trap:
Note that the alphabetical comparison we have shown does not depend on the length of the two words. In fact, if we compare the words tree and trees, we can say that tree is less than trees (since the letter e comes before the letter s in alphabetical order... wait, let me reconsider). Actually, tree would be less than trees because tree is a prefix of trees, not because of letter comparison.
In C language, as in the vast majority of programming languages, the principle behind string comparison is similar to what we have just described. However, the comparison is not done based on alphabetical order, but based on lexicographic order.
Lexicographic order
When comparing two strings in C language the following rules are respected:
- A string
s1is less than a strings2if the first characters ofi s1are equal to the first characters ofi s2, but the character ofi+1 s1is less than the character ofi+1 s2; - A string
s1is less than a strings2if all characters ofs1are equal to the first characters ofi s2, buts1is shorter thans2.
Let's leave aside for a moment what it means that a character is less than another. Let's focus, now, on the two rules we have just listed.
The first rule, substantially, is a generalization of the alphabetical ordering we saw before. In practice, the characters of the two strings are compared one at a time, starting from the first character. If a difference is found between two characters, then it can be said which of the two strings is greater.
The second rule, instead, concerns the case where a string represents the prefix of the other. In this case, the shorter string is less than the longer one. For example, the string house is less than household. In fact, household has the same 5 initial letters of house, but is longer.
That said, now we need to understand what it means that a character is less than another. In C language, characters are, essentially, numbers. Each character is represented by an integer number, which is its ASCII code. Therefore, when comparing two characters, in reality their ASCII codes are compared.
Moreover, a string in C is not composed only of letters, but can contain numbers, symbols and control characters. In this case, the comparison between two strings takes place by comparing the ASCII codes of the characters one at a time.
Based on this, the result is that the comparison between two strings can lead to surprises. In fact, in ASCII numbers are represented by codes ranging from 48 to 57, while uppercase letters are represented by codes ranging from 65 to 90 and lowercase letters from 97 to 122. The consequence is that a digit is less than an uppercase letter, which in turn is less than a lowercase letter.
For example, the string 123ABC is less than the string ABC, because the character 1 has an ASCII code less than the character A.
Similarly the string ABC is less than the string abc, because the character A has an ASCII code less than the character a.
Let's see other examples:
"House" < "house"
"123" < "ABC"
"123" < "abc"
"HOUSE" < "house"
"household" > "house"
"house" == "house"
Punctuation characters also fall into this mechanism. For example the space ␣ has an ASCII code less than any letter or digit. Therefore, the string My House is less than the string Household.
These rules collectively take the name of lexicographic order.
Recapping:
Lexicographic order
In C language the lexicographic order is used to compare two strings. A string s1 is less than a string s2 if:
- The first
characters ofi s1are equal to the first characters ofi s2, but the character ofi+1 s1is less than the character ofi+1 s2; - All characters of
s1are equal to the first characters ofi s2, buts1is shorter thans2.
The comparison between individual characters takes place by comparing their ASCII numerical codes.
That said, we can move on to see how it is possible to compare two strings in C language.
Function strcmp - String comparison
In C language, to compare two strings the strcmp function is used which is defined in the string.h library.
The strcmp function accepts two strings as arguments and returns an integer value that represents the result of the lexicographic comparison between the two strings.
The strcmp function has the following signature:
int strcmp(const char *s1, const char *s2);
Where s1 and s2 are the two strings to compare. The result of the comparison is returned as an integer value:
- If
s1is less thans2, then the returned value is less than zero; - If
s1is greater thans2, then the returned value is greater than zero; - If
s1is equal tos2, then the returned value is zero.
Let's see an example:
#include <stdio.h>
#include <string.h>
int main() {
char s1[] = "house";
char s2[] = "household";
int result = strcmp(s1, s2);
if (result < 0) {
printf("The string s1 is less than s2\n");
} else if (result > 0) {
printf("The string s1 is greater than s2\n");
} else {
printf("The two strings are equal\n");
}
return 0;
}
In this example, the string s1 is less than s2, so the result of the comparison is less than zero. Consequently, the program prints to screen the message The string s1 is less than s2.
Function strcmp
The strcmp function is used to compare two strings in C language.
It is defined in the string.h library:
#include <string.h>
The strcmp function has the following signature:
int strcmp(const char *s1, const char *s2);
Where s1 and s2 are the two strings to compare. The result of the comparison is returned as an integer value:
- If
s1is less thans2, then the returned value is less than zero; - If
s1is greater thans2, then the returned value is greater than zero; - If
s1is equal tos2, then the returned value is zero.
Function strncmp - String comparison with prefix
The strcmp function compares two strings entirely. However, sometimes it can be useful to compare only a prefix of the two strings.
To do this, the strncmp function can be used which is defined in the string.h library.
The strncmp function accepts three arguments: two strings and an integer that represents the number of characters to compare.
The strncmp function has the following signature:
int strncmp(const char *s1, const char *s2, size_t n);
Where s1 and s2 are the two strings to compare and n is the number of characters to compare.
For the rest, the function is similar to the strcmp function. The result of the comparison is returned as an integer value and follows the same rules as the strcmp function.
Let's see an example:
#include <stdio.h>
#include <string.h>
int main() {
char s1[] = "house";
char s2[] = "household";
/* First comparison: compares only the first 5 characters */
int result1 = strncmp(s1, s2, 5);
if (result1 < 0) {
printf("The first 5 characters of s1 are less than the first 5 characters of s2\n");
} else if (result1 > 0) {
printf("The first 5 characters of s1 are greater than the first 5 characters of s2\n");
} else {
printf("The first 5 characters of the two strings are equal\n");
}
/* Second comparison: compares only the first 6 characters */
int result2 = strncmp(s1, s2, 6);
if (result2 < 0) {
printf("The first 6 characters of s1 are less than the first 6 characters of s2\n");
} else if (result2 > 0) {
printf("The first 6 characters of s1 are greater than the first 6 characters of s2\n");
} else {
printf("The first 6 characters of the two strings are equal\n");
}
return 0;
}
In this example, the program performs two comparisons between the strings s1 and s2:
- The first comparison compares only the first 5 characters of the two strings. The result of the comparison is equal to zero since the first 5 characters of the two strings are equal. The program prints to screen the message
The first 5 characters of the two strings are equal. - The second comparison compares the first 6 characters of the two strings. The result of the comparison is less than zero since
s1is composed of only 5 characters, whiles2is composed of 9. Consequently, the program prints to screen the messageThe first 6 characters of s1 are less than the first 6 characters of s2.
Function strncmp
The strncmp function is used to compare two strings in C language, considering at most the first n characters.
It is defined in the string.h library:
#include <string.h>
The strncmp function has the following signature:
int strncmp(const char *s1, const char *s2, size_t n);
Where s1 and s2 are the two strings to compare and n is the number of characters to compare. The result of the comparison is returned as an integer value:
- If the first
ncharacters ofs1are less than the firstncharacters ofs2, then the returned value is less than zero; - If the first
ncharacters ofs1are greater than the firstncharacters ofs2, then the returned value is greater than zero; - If the first
ncharacters ofs1are equal to the firstncharacters ofs2, then the returned value is zero.
Implementation of strcmp and strncmp
At the educational level, it is always interesting to see how the standard functions of a programming language work. In this case, let's see how the strcmp function and the strncmp function could be implemented.
Let's start with a possible implementation of the strcmp function which we will call my_strcmp:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | |
The implementation is very interesting. The function is divided into two parts since it must take into account the case where a string has a shorter length than the other.
In the first part, lines 6-15, the function compares the characters of the two strings one at a time as long as the characters are different from the terminator, \0, and there is no difference. If a difference is found, then the function returns an integer value that represents the result of the comparison. Otherwise, the pointers are incremented.
The second part, lines 18-24, is reached when exiting the while loop, without a value having been returned via return. This means that one of the following cases has occurred:
- Both strings are terminated. In this case, the two strings are equal and the function returns zero; In fact, if there had been a difference, it would have been detected in the
whileloop; - The string
s1is terminated, but the strings2is not. In this case, the strings1is less thans2and the function returns a value less than zero; - The string
s2is terminated, but the strings1is not. In this case, the strings1is greater thans2and the function returns a value greater than zero.
Note that the difference between characters, at lines 7 and 9, is evaluated in numerical way, that is simply by comparing the ASCII codes of the two characters through the comparison operators < and >.
Let's now move to the implementation of the strncmp function which we will call my_strncmp:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | |
The implementation of the my_strncmp function is similar to that of the my_strcmp function. The main difference is that the my_strncmp function compares at most the first n characters of the two strings.
First, we added a counter i that keeps track of the number of characters compared. This counter is incremented at each iteration of the while loop (line 16). We modified the condition of the while loop (line 7) so that the loop terminates when the maximum number of characters to compare, n, is reached.
The second part of the function, lines 20-28, is similar to that of the my_strcmp function. However, we added an initial condition (line 20) that checks if the number of characters compared is equal to n. In that case, the function returns zero, regardless of whether the two strings are terminated or not.
In Summary
We have studied in this lesson that:
- The comparison between two strings in C language takes place based on lexicographic order;
- The lexicographic order is based on the comparison of individual characters of the two strings, starting from the first character;
- The comparison between individual characters takes place by comparing their ASCII numerical codes;
- Based on lexicographic order, a string
s1is less than a strings2if the first characters ofi s1are equal to the first characters ofi s2, but the character ofi+1 s1is less than the character ofi+1 s2; - To compare two strings in C language the
strcmpfunction is used which returns an integer value that represents the result of the comparison; - The
string.hlibrary also provides thestrncmpfunction which allows comparing only a prefix of the two strings.
Moreover, we have seen how the strcmp and strncmp functions could be implemented in C language.