Introduction to the scanf function in the C language

The scanf function is a standard library function of the C language that allows reading data from standard input (usually the keyboard) and storing it in variables.

It is one of the most commonly used functions for input in C, and is often used in combination with the printf function, which is used for output instead. For this reason, in this lesson we will focus on using scanf to read data from input.

Key Takeaways
  • The scanf function reads data from standard input and stores it in variables.
  • The scanf function uses a format string to specify the type of data to read.
  • The format specifiers of scanf are similar to those of printf.
  • It is necessary to pass the addresses of variables to scanf using the address operator &.
  • scanf automatically ignores whitespace in input.
  • It is important that the number and types of format specifiers match the arguments passed to scanf.
  • Forgetting the address operator & with scanf is a common error that can lead to unpredictable program behavior.
  • scanf does not consume all input, leaving unread characters available for subsequent calls.
  • scanf performs pattern recognition to match input to format specifiers.
  • Ordinary characters in the format string influence how scanf reads input.

Syntax and usage of the scanf function

Just as the printf function prints data on output in a formatted way, the scanf function reads data from input according to a format specified by the user. To do this, the scanf function also expects as input a format string that specifies the type of data to read and how to interpret it.

The format strings used by scanf are similar to those of printf. They can contain both ordinary characters (which are ignored during reading) and format specifiers (which indicate the type of data to read). The format specifiers are identical to those used by printf.

However, there is a small difference: to read a double value you must use the %lf specifier (instead of %f as in printf). We will understand the reason for this difference later, in upcoming lessons.

In many cases, calls to scanf contain exclusively format specifiers. Let's consider the following example:

int a, b;
double x, y;

scanf("%d%d%lf%lf", &a, &b, &x, &y);

Suppose the user enters the following line of input followed by pressing the Enter key:

42 7 3.14 2.71

The scanf function will read the line and convert its characters into the numbers they represent. Afterwards, the converted values will be stored in the variables a, b, x and y. In this case, a will contain the value 42, b will contain the value 7, x will contain the value 3.14 and y will contain the value 2.71.

This pattern of using exclusively format specifiers is very common when using the scanf function. Conversely, with the printf function it is more common to use a combination of format specifiers and ordinary characters to format output in a readable way.

Definition

Function scanf

The scanf function reads data from standard input (usually the keyboard) and stores it in variables according to a format specified by the user.

The syntax of the function is as follows:

scanf(format_string, &argument1, &argument2, ...);

Where format_string is a string that specifies the format of the data to read, and argument1, argument2, ... are the variables in which to store the read data.

Although at first glance the use of scanf may seem simple, it hides several pitfalls and complexities that are important to know in order to avoid common errors when reading input.

First of all, when a programmer uses the scanf function, they must always ensure that the number of format specifiers matches the number of arguments passed to the function. Furthermore, it is essential that the data types specified in the format string match the types of variables in which the data will be stored. In fact, the C language standard does not impose checks on either the number of arguments or data types. Therefore, if the programmer makes an error in this regard, the program's behavior will be undefined.

The second important thing to note is that, unlike printf, the scanf function requires that the addresses of the variables in which to store the read data be passed to it. This is necessary because scanf must be able to modify the content of variables, and to do so it needs to know their address in memory. To get the address of a variable, you use the & operator (address operator). This operator is (almost) always necessary when using scanf. We will return to this point later when we study pointers.

Note

Never forget the address operator & with scanf

Forgetting to use the address operator & before scanf arguments is a common error that can lead to unpredictable if not disastrous program behavior. One of the most common results of this error is the crash of the program during execution.

Another possible result is that scanf writes the read data to wrong memory locations, while the original variable remains unchanged.

Omitting the address operator & is such a common programming error in C that many modern compilers emit a warning when they detect this error.

For example, gcc emits the following warning:

warning: format '%d' expects argument of type 'int *', but argument 2 has type 'int' [-Wformat=]

In this warning, the compiler reports that the format specifier %d expects an argument of type int * (pointer to integer), but the argument actually passed is of type int (integer). We will return to pointers later in the course.

The scanf function is a very powerful and flexible function, but its correct use requires attention to detail. Many professional C language programmers avoid using scanf and prefer other input reading techniques, such as reading entire lines that are then parsed with specific functions.

In this course, at least in the introductory lessons, we will continue to use scanf for simplicity, because it allows easy reading of numerical input from the console. However, many of the programs we will create in upcoming lessons do not behave correctly if the input is not exactly what is expected.

We will see, when we study input and output more thoroughly, how it is possible to check whether a call to scanf was successful and how to handle input errors more robustly. However, for now, we will omit such checks that would unnecessarily burden the example programs.

How scanf works

We have seen how to use the scanf function to read data from standard input. However, this function is much more powerful than it seems.

Essentially, you can consider scanf as a pattern-matching function (that is, pattern recognition). It tries to match groups of characters read from input to format specifiers present in the format string.

Like the printf function, scanf is also controlled by the format string. When invoked, scanf begins processing the characters of the format string from left to right. Whenever it encounters a format specifier, scanf tries to locate in the input a group of characters that matches that format. In doing so, it automatically ignores whitespace (spaces, tabs and newline characters) present in the input if necessary.

When it finds an element that matches the specified format, scanf converts that group of characters into the appropriate data type and stores it in the variable whose address was passed as an argument. Conversely, it stops only if it encounters a character that cannot in any way be matched to the specified format. If everything goes well, scanf continues processing the format string until it has processed all the format specifiers present.

Let's consider an example. Suppose we have the following code:

int x, y;
scanf("%d%d", &x, &y);

Suppose, first, that the user enters the following line of input:

  12   34

Therefore, the input consists of 2 initial spaces, followed by the number 12, followed by 3 spaces, followed by the number 34.

When scanf processes this input, it begins processing the format string. The first format specifier is %d, which indicates that it must read a decimal integer. scanf ignores the initial spaces and finds the number 12, converts it to an integer and stores it in variable x.

Subsequently, scanf continues processing the format string and encounters the second format specifier %d. It ignores the spaces after the number 12 and finds the number 34, converts it to an integer and stores it in variable y.

At this point, scanf has processed all format specifiers present in the string and terminates execution. In the end, variable x will contain the value 12 and variable y will contain the value 34.

Conversely, suppose the user enters the following line of input:

  56  abc78

In this case, scanf begins processing the format string and encounters the first format specifier %d. It ignores the initial spaces and finds the number 56, converts it to an integer and stores it in variable x.

Subsequently, scanf continues processing the format string and encounters the second format specifier %d. It ignores the space after the number 56 and finds the character sequence abc78. However, this sequence cannot be converted to a decimal integer, as it begins with letters. Therefore, scanf stops at this point and does not further process the format string. In the end, variable x will contain the value 56, while variable y will remain unchanged.

The important point is that scanf ignores any whitespace: by this we mean spaces, tabs and newline characters. Therefore, if we had passed as input:

78
   90

scanf would have still correctly read the numbers 78 and 90, ignoring the spaces and the newline character between them.

In practice, from the function's point of view, the above input corresponds to:

78\n   90

Where \n represents the newline character.

Successive calls to scanf

There is an important detail to consider when making successive calls to scanf. The scanf function, in order to correctly interpret input, always peeks at the next character in the input to determine whether to ignore it or not.

So, if for example we have the following code:

int a, b;

scanf("%d", &a);
scanf("%d", &b);

And the user enters the following input:

12
34

That is, enters the number 12, presses Enter, then enters the number 34 and presses Enter again, what happens is that the first call to scanf reads the number 12 and stores the value in variable a. Afterwards, the function sees the newline character (\n) generated by pressing the Enter key but leaves it in the input, as it is not necessary to ignore it in this case.

The subsequent call to scanf begins reading from the input precisely from the newline character. However, since scanf automatically ignores whitespace, it ignores the newline character and reads the number 34, storing it in variable b. However, it will ignore and leave in the input the newline character generated by the second press of the Enter key.

This is a fundamental detail to remember, especially when, as we will see, you combine calls to scanf with other input functions that do not automatically ignore whitespace.

Definition

The scanf function does not consume all input

Often, beginner programmers expect the scanf function to consume all input up to the end of the line. However, this is not true. The scanf function reads only the characters necessary to satisfy the format specifiers present in the format string. Everything that remains in the input after reading is not consumed and remains available for subsequent calls to scanf or other input functions including the newline character generated by pressing the Enter key.

Rules of scanf for numbers

Now, we ask ourselves what are the specific rules that scanf adopts for reading numbers. In particular, we ask ourselves which characters scanf considers valid when reading a number.

When we ask the function to read an integer (for example, with the format specifier %d), it first looks for an optional digit or a + or - sign. At this point, it continues reading subsequent characters as long as it encounters valid digits. As soon as it encounters a character that is not a digit, scanf stops and converts the group of characters read into an integer.

Conversely, when we ask the function to read a floating-point number, the function looks for:

  • A + or - sign (optional), followed by
  • A sequence of digits (optional) possibly containing a single decimal point . (optional), followed by
  • A scientific exponent (optional) starting with the letter e or E, followed by a + or - sign (optional) and a sequence of digits.

In this sense, the format specifiers for floating-point numbers, %lf, %lg and %le, are equivalent and do not distinguish between the various input formats. All three accept floating-point numbers in decimal or scientific notation.

When the scanf function encounters a character that cannot be part of a number, it puts it back in the input and stops. This means that the character is not consumed and remains available for subsequent format specifiers or for other calls to scanf or other input functions.

Let's clarify this concept with an example. Suppose we have the following code:

int a, b;
double x, y;

scanf("%d%d%lf%lf", &a, &b, &x, &y);

At this point, suppose the user enters the following input:

1-30.4-5.0e2a

Here's how scanf would process this input:

  1. The first format specifier is %d, so scanf expects an integer.

    The first non-whitespace character is 1, which is a valid digit. So, scanf reads the character 1 and continues reading subsequent characters. But the next character is -, which is not a valid digit.

    So, scanf stops here and puts back the character - in the input. Afterwards, it converts the character 1 into an integer, storing it in variable a.

  2. The second format specifier is %d, so scanf expects another integer.

    The first non-whitespace character is -, which is a valid sign for an integer. So, scanf reads the character - and continues reading subsequent characters. The next character is 3, which is a valid digit, so scanf also reads this character. The next character is 0, which is also a valid digit, so scanf also reads this character.

    The next character is ., which is not a valid digit. So, scanf stops here, puts back . in the input, and converts the character sequence -30 into an integer, storing it in variable b.

  3. The third format specifier is %lf, so scanf expects a floating-point number.

    The first non-whitespace character is ., which is a valid decimal point for a floating-point number. So, scanf reads the character . and continues reading subsequent characters. The next character is 4, which is a valid digit, so scanf also reads this character. The next character is -, which is neither a digit nor a valid decimal point.

    So, scanf stops here, puts back - in the input and converts the character sequence .4 into a floating-point number, storing it in variable x which now contains the value 0.4.

  4. The fourth format specifier is %lf, so scanf expects another floating-point number.

    The first non-whitespace character is -, which is a valid sign for a floating-point number. So, scanf reads the character - and continues reading subsequent characters. The next character is 5, which is a valid digit, so scanf also reads this character. The next character is ., which is a valid decimal point, so scanf also reads this character. The following characters are 0, e, 2, which are all valid for a floating-point number in scientific notation, so scanf also reads these characters.

    Finally, the next character is a, which is neither a digit, nor a decimal point, nor part of a valid scientific exponent. So, scanf stops here, puts back a in the input and converts the character sequence -5.0e2 into a floating-point number, storing it in variable y which now contains the value -500.0.

At the end of this processing, the variables will contain the following values:

  • a will contain the value 1
  • b will contain the value -30
  • x will contain the value 0.4
  • y will contain the value -500.0

In the input, instead, the character a and the newline character generated by pressing the Enter key will remain. These characters can be read by subsequent calls to scanf or other input functions.

Ordinary characters in the format string

The fact that the scanf function performs a sort of pattern recognition (pattern-matching) implies that the format string can also contain ordinary characters, in addition to format specifiers.

The actions that the scanf function performs when it encounters an ordinary character in the format string depend on the type of character: whether or not it is whitespace. Let's see this in detail.

  • Whitespace:

    When scanf encounters one or more whitespace characters (space, tab or newline character) in the format string, it reads and ignores all whitespace characters present in the input until it encounters a non-whitespace character which is put back in the input to be processed by the next format specifier.

    The important thing to note is that the number of whitespace characters in the format string does not matter: one, two or more whitespace characters have the same effect.

    The consequence is that inserting a whitespace character in the format string does not impose the fact that the input must necessarily contain one. In fact, a whitespace corresponds to any amount of whitespace in the input, including the possibility that this number is zero.

    Therefore, the following code:

    int a, b;
    scanf("%d %d", &a, &b);
    

    is equivalent to:

    int a, b;
    scanf("%d%d", &a, &b);
    

    That is, both codes read two integers separated by any amount of whitespace.

  • Non-whitespace ordinary characters:

    When scanf encounters a non-whitespace ordinary character in the format string, it compares this character with the next character in the input.

    If the two characters match, scanf consumes both (in other words, reads and discards them) and continues processing the format string.

    Conversely, if the two characters do not match, scanf stops and puts back the non-matching character in the input and stops without further processing the format string.

To clarify this concept, let's consider an example where we want to read a date in format DD/MM/YYYY. To do this, we can use the following code:

int day, month, year;
scanf("%d/%d/%d", &day, &month, &year);

Suppose the user enters the following input:

     15/08/2023

The scanf function processes this input as follows:

  1. The first format specifier is %d, so scanf discards the initial whitespace and reads the number 15, storing it in variable day.
  2. Subsequently, scanf encounters the character / in the format string. It compares this character with the next character in the input, which is also /. Since the two characters match, scanf consumes both and continues processing the format string.
  3. The second format specifier is %d, so scanf reads the number 08, storing it in variable month.
  4. Subsequently, scanf again encounters the character / in the format string. It compares this character with the next character in the input, which is also /. Since the two characters match, scanf consumes both and continues processing the format string.
  5. The third format specifier is %d, so scanf reads the number 2023, storing it in variable year.

At the end of this processing, the variables will contain the following values:

  • day will contain the value 15
  • month will contain the value 8
  • year will contain the value 2023

However, the same code above would not work correctly if the user entered the following input:

15 / 08 / 2023

This is because, after reading the number 15, scanf encounters the character / in the format string and compares it with the next character in the input, which in this case is a space. Since the two characters do not match, scanf stops and puts back the space in the input, without further processing the format string.

To work correctly with this type of input, we would need to modify the format string to include whitespace before and after the / characters, as follows:

int day, month, year;
scanf("%d / %d / %d", &day, &month, &year);

This way, scanf will automatically ignore the whitespace present in the input before and after the / characters, allowing the date to be read correctly even if the user enters additional spaces.

Common errors with scanf

Although the printf and scanf functions are similar in syntax and use analogous format specifiers, there are some important differences that can lead to common errors when using scanf. Especially when confusing their use.

A first common error is to use the address operator & also with the printf function. As we have seen, printf does not require the addresses of variables, but their values. Therefore, using the & operator with printf is an error that leads to unpredictable program behavior:

/* ERROR */
int a = 42;
printf("The value of a is: %d\n", &a); // ERROR

What happens in this case is that printf interprets the address of a as an integer, printing a value that makes no sense.

A second aspect concerns the fact that scanf automatically discards all whitespace in the input. Therefore, the format strings of scanf must, almost always, contain only format specifiers. Many errors arise from the fact that beginner programmers expect that the format strings of scanf should be formatted the same way as those of printf. The consequence is that, by doing so, unexpected behaviors are obtained.

For example, consider the following code:

int a, b;
scanf("%d, %d", &a, &b);

In this example, scanf first looks for an integer (%d) which it stores in variable a. Subsequently, it looks for the character , in the format string and if the user enters a space instead, scanf will stop, because the next character in the input does not match what is expected. By doing so, variable a will contain the read value, while variable b will remain unchanged.

Note

Never insert a newline character at the end of the scanf format string

Although the format strings passed to the printf function often end with a newline character (\n), doing the same with the scanf function is not a good idea.

For the scanf function, in fact, a newline character in the format string is interpreted as whitespace. The effect is that scanf will try to ignore all whitespace characters in the input until it encounters a non-whitespace character.

This can lead to unexpected behaviors. For example, consider the following code:

int a;
scanf("%d\n", &a);

In this example, scanf reads an integer and stores it in variable a. However, due to the newline character (\n) at the end of the format string, scanf will try to ignore all subsequent whitespace characters in the input. Consequently, if the user presses Enter after entering the number, scanf will remain waiting for further input, as it considers the newline character as whitespace to ignore.

In other words, the program will appear to freeze, waiting for the user to enter further non-whitespace characters.

To avoid this problem, it is best to never include a newline character (\n) at the end of the scanf format string. Instead, you should simply use:

int a;
scanf("%d", &a);

Example

Let's conclude this lesson with a complete example that shows the use of the scanf function to read different types of data from standard input.

Suppose we want to create a program that reads two fractions from input and calculates their product. A fraction consists of a numerator and a denominator, both integers. For simplicity, suppose the user enters the fractions in the format N/D, where N is the numerator and D is the denominator.

Here is a possible code to create this program:

#include <stdio.h>

int main() {
    // Numerator and denominator of the first fraction
    int num1, den1;
    // Numerator and denominator of the second fraction
    int num2, den2;
    // Numerator and denominator of the product
    int prod_num, prod_den;

    // Read the first fraction from input
    printf("Enter the first fraction (N/D): ");
    scanf("%d/%d", &num1, &den1);

    // Read the second fraction from input
    printf("Enter the second fraction (N/D): ");
    scanf("%d/%d", &num2, &den2);

    // Calculate the product of the two fractions
    prod_num = num1 * num2;
    prod_den = den1 * den2;

    // Print the result
    printf("The product of the fractions is: %d/%d\n", prod_num, prod_den);

    return 0;
}

By trying to compile and run this program, the user can enter two fractions in the format N/D, and the program will calculate and print the product of the two fractions. For example:

Enter the first fraction (N/D): 3/4
Enter the second fraction (N/D): 2/5
The product of the fractions is: 6/20