Comparing Strings in C

One of the most common operations that can be done with strings in C language is the comparison between two strings.

Comparing two strings means establishing whether a string is less than, greater than or equal to another string. In C language, to do this the strcmp and strncmp functions are used which are defined in the string.h library.

In this lesson we will see how these two functions work and how it is possible to implement them manually.

Lexicographic order and string comparison

Before being able to see how it is possible to compare two strings in C language, it is necessary to dwell on what exactly this operation means.

Usually, when talking about comparing two sequences of characters, or strings, one immediately thinks of alphabetical order. A bit like words are ordered within a dictionary.

In practice, if we have two words, for example tree and bag, we can say that tree is less than bag because the letter t, that is the initial letter of tree, comes before in alphabetical order compared to the letter b, which is the initial letter of bag:

\underline{\text{t}}\text{ree} < \underline{\text{b}}\text{ag}

In the case where the words we are comparing start with the same letter, then we must compare the second letter, and so on, until we find a difference.

For example, if we compare the words tree and trap, we can say that tree is greater than trap. This is because the two words start with the same two letters, t and r, but the third letter of tree is an e, while the third letter of trap is an a. And since the letter e comes after the letter a in alphabetical order, we can say that tree is greater than trap:

\textbf{tr}\underline{\text{e}}\text{e} > \textbf{tr}\underline{\text{a}}\text{p}

Note that the alphabetical comparison we have shown does not depend on the length of the two words. In fact, if we compare the words tree and trees, we can say that tree is less than trees (since the letter e comes before the letter s in alphabetical order... wait, let me reconsider). Actually, tree would be less than trees because tree is a prefix of trees, not because of letter comparison.

In C language, as in the vast majority of programming languages, the principle behind string comparison is similar to what we have just described. However, the comparison is not done based on alphabetical order, but based on lexicographic order.

Lexicographic order

When comparing two strings in C language the following rules are respected:

  1. A string s1 is less than a string s2 if the first i characters of s1 are equal to the first i characters of s2, but the character i+1 of s1 is less than the character i+1 of s2;
  2. A string s1 is less than a string s2 if all characters of s1 are equal to the first i characters of s2, but s1 is shorter than s2.

Let's leave aside for a moment what it means that a character is less than another. Let's focus, now, on the two rules we have just listed.

The first rule, substantially, is a generalization of the alphabetical ordering we saw before. In practice, the characters of the two strings are compared one at a time, starting from the first character. If a difference is found between two characters, then it can be said which of the two strings is greater.

The second rule, instead, concerns the case where a string represents the prefix of the other. In this case, the shorter string is less than the longer one. For example, the string house is less than household. In fact, household has the same 5 initial letters of house, but is longer.

That said, now we need to understand what it means that a character is less than another. In C language, characters are, essentially, numbers. Each character is represented by an integer number, which is its ASCII code. Therefore, when comparing two characters, in reality their ASCII codes are compared.

Moreover, a string in C is not composed only of letters, but can contain numbers, symbols and control characters. In this case, the comparison between two strings takes place by comparing the ASCII codes of the characters one at a time.

Based on this, the result is that the comparison between two strings can lead to surprises. In fact, in ASCII numbers are represented by codes ranging from 48 to 57, while uppercase letters are represented by codes ranging from 65 to 90 and lowercase letters from 97 to 122. The consequence is that a digit is less than an uppercase letter, which in turn is less than a lowercase letter.

For example, the string 123ABC is less than the string ABC, because the character 1 has an ASCII code less than the character A.

Similarly the string ABC is less than the string abc, because the character A has an ASCII code less than the character a.

Let's see other examples:

"House" < "house"
"123" < "ABC"
"123" < "abc"
"HOUSE" < "house"
"household" > "house"
"house" == "house"

Punctuation characters also fall into this mechanism. For example the space has an ASCII code less than any letter or digit. Therefore, the string My House is less than the string Household.

These rules collectively take the name of lexicographic order.

Recapping:

Definition

Lexicographic order

In C language the lexicographic order is used to compare two strings. A string s1 is less than a string s2 if:

  1. The first i characters of s1 are equal to the first i characters of s2, but the character i+1 of s1 is less than the character i+1 of s2;
  2. All characters of s1 are equal to the first i characters of s2, but s1 is shorter than s2.

The comparison between individual characters takes place by comparing their ASCII numerical codes.

That said, we can move on to see how it is possible to compare two strings in C language.

Function strcmp - String comparison

In C language, to compare two strings the strcmp function is used which is defined in the string.h library.

The strcmp function accepts two strings as arguments and returns an integer value that represents the result of the lexicographic comparison between the two strings.

The strcmp function has the following signature:

int strcmp(const char *s1, const char *s2);

Where s1 and s2 are the two strings to compare. The result of the comparison is returned as an integer value:

  • If s1 is less than s2, then the returned value is less than zero;
  • If s1 is greater than s2, then the returned value is greater than zero;
  • If s1 is equal to s2, then the returned value is zero.

Let's see an example:

#include <stdio.h>
#include <string.h>

int main() {
    char s1[] = "house";
    char s2[] = "household";

    int result = strcmp(s1, s2);

    if (result < 0) {
        printf("The string s1 is less than s2\n");
    } else if (result > 0) {
        printf("The string s1 is greater than s2\n");
    } else {
        printf("The two strings are equal\n");
    }

    return 0;
}

In this example, the string s1 is less than s2, so the result of the comparison is less than zero. Consequently, the program prints to screen the message The string s1 is less than s2.

Definition

Function strcmp

The strcmp function is used to compare two strings in C language.

It is defined in the string.h library:

#include <string.h>

The strcmp function has the following signature:

int strcmp(const char *s1, const char *s2);

Where s1 and s2 are the two strings to compare. The result of the comparison is returned as an integer value:

  • If s1 is less than s2, then the returned value is less than zero;
  • If s1 is greater than s2, then the returned value is greater than zero;
  • If s1 is equal to s2, then the returned value is zero.

Function strncmp - String comparison with prefix

The strcmp function compares two strings entirely. However, sometimes it can be useful to compare only a prefix of the two strings.

To do this, the strncmp function can be used which is defined in the string.h library.

The strncmp function accepts three arguments: two strings and an integer that represents the number of characters to compare.

The strncmp function has the following signature:

int strncmp(const char *s1, const char *s2, size_t n);

Where s1 and s2 are the two strings to compare and n is the number of characters to compare.

For the rest, the function is similar to the strcmp function. The result of the comparison is returned as an integer value and follows the same rules as the strcmp function.

Let's see an example:

#include <stdio.h>
#include <string.h>

int main() {
    char s1[] = "house";
    char s2[] = "household";

    /* First comparison: compares only the first 5 characters */
    int result1 = strncmp(s1, s2, 5);

    if (result1 < 0) {
        printf("The first 5 characters of s1 are less than the first 5 characters of s2\n");
    } else if (result1 > 0) {
        printf("The first 5 characters of s1 are greater than the first 5 characters of s2\n");
    } else {
        printf("The first 5 characters of the two strings are equal\n");
    }

    /* Second comparison: compares only the first 6 characters */
    int result2 = strncmp(s1, s2, 6);

    if (result2 < 0) {
        printf("The first 6 characters of s1 are less than the first 6 characters of s2\n");
    } else if (result2 > 0) {
        printf("The first 6 characters of s1 are greater than the first 6 characters of s2\n");
    } else {
        printf("The first 6 characters of the two strings are equal\n");
    }

    return 0;
}

In this example, the program performs two comparisons between the strings s1 and s2:

  1. The first comparison compares only the first 5 characters of the two strings. The result of the comparison is equal to zero since the first 5 characters of the two strings are equal. The program prints to screen the message The first 5 characters of the two strings are equal.
  2. The second comparison compares the first 6 characters of the two strings. The result of the comparison is less than zero since s1 is composed of only 5 characters, while s2 is composed of 9. Consequently, the program prints to screen the message The first 6 characters of s1 are less than the first 6 characters of s2.
Definition

Function strncmp

The strncmp function is used to compare two strings in C language, considering at most the first n characters.

It is defined in the string.h library:

#include <string.h>

The strncmp function has the following signature:

int strncmp(const char *s1, const char *s2, size_t n);

Where s1 and s2 are the two strings to compare and n is the number of characters to compare. The result of the comparison is returned as an integer value:

  • If the first n characters of s1 are less than the first n characters of s2, then the returned value is less than zero;
  • If the first n characters of s1 are greater than the first n characters of s2, then the returned value is greater than zero;
  • If the first n characters of s1 are equal to the first n characters of s2, then the returned value is zero.

Implementation of strcmp and strncmp

At the educational level, it is always interesting to see how the standard functions of a programming language work. In this case, let's see how the strcmp function and the strncmp function could be implemented.

Let's start with a possible implementation of the strcmp function which we will call my_strcmp:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
int my_strcmp(const char *s1, const char *s2) {
    char *p1 = s1;
    char *p2 = s2;

    /* First part */
    while (*p1 != '\0' && *p2 != '\0') {
        if (*p1 < *p2) {
            return -1;
        } else if (*p1 > *p2) {
            return 1;
        }

        p1++;
        p2++;
    }

    /* Second part */
    if (*p1 == '\0' && *p2 == '\0') {
        return 0;
    } else if (*p1 == '\0') {
        return -1;
    } else {
        return 1;
    }
}

The implementation is very interesting. The function is divided into two parts since it must take into account the case where a string has a shorter length than the other.

In the first part, lines 6-15, the function compares the characters of the two strings one at a time as long as the characters are different from the terminator, \0, and there is no difference. If a difference is found, then the function returns an integer value that represents the result of the comparison. Otherwise, the pointers are incremented.

The second part, lines 18-24, is reached when exiting the while loop, without a value having been returned via return. This means that one of the following cases has occurred:

  1. Both strings are terminated. In this case, the two strings are equal and the function returns zero; In fact, if there had been a difference, it would have been detected in the while loop;
  2. The string s1 is terminated, but the string s2 is not. In this case, the string s1 is less than s2 and the function returns a value less than zero;
  3. The string s2 is terminated, but the string s1 is not. In this case, the string s1 is greater than s2 and the function returns a value greater than zero.

Note that the difference between characters, at lines 7 and 9, is evaluated in numerical way, that is simply by comparing the ASCII codes of the two characters through the comparison operators < and >.

Let's now move to the implementation of the strncmp function which we will call my_strncmp:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
int my_strncmp(const char *s1, const char *s2, size_t n) {
    char *p1 = s1;
    char *p2 = s2;

    /* First part */
    size_t i = 0;
    while (*p1 != '\0' && *p2 != '\0' && i < n) {
        if (*p1 < *p2) {
            return -1;
        } else if (*p1 > *p2) {
            return 1;
        }

        p1++;
        p2++;
        i++;
    }

    /* Second part */
    if (i == n) {
        return 0;
    } else if (*p1 == '\0' && *p2 == '\0') {
        return 0;
    } else if (*p1 == '\0') {
        return -1;
    } else {
        return 1;
    }
}

The implementation of the my_strncmp function is similar to that of the my_strcmp function. The main difference is that the my_strncmp function compares at most the first n characters of the two strings.

First, we added a counter i that keeps track of the number of characters compared. This counter is incremented at each iteration of the while loop (line 16). We modified the condition of the while loop (line 7) so that the loop terminates when the maximum number of characters to compare, n, is reached.

The second part of the function, lines 20-28, is similar to that of the my_strcmp function. However, we added an initial condition (line 20) that checks if the number of characters compared is equal to n. In that case, the function returns zero, regardless of whether the two strings are terminated or not.

In Summary

We have studied in this lesson that:

  • The comparison between two strings in C language takes place based on lexicographic order;
  • The lexicographic order is based on the comparison of individual characters of the two strings, starting from the first character;
  • The comparison between individual characters takes place by comparing their ASCII numerical codes;
  • Based on lexicographic order, a string s1 is less than a string s2 if the first i characters of s1 are equal to the first i characters of s2, but the character i+1 of s1 is less than the character i+1 of s2;
  • To compare two strings in C language the strcmp function is used which returns an integer value that represents the result of the comparison;
  • The string.h library also provides the strncmp function which allows comparing only a prefix of the two strings.

Moreover, we have seen how the strcmp and strncmp functions could be implemented in C language.