The char Type in C
A last fundamental, or primitive, type in C language is the char type.
Through it we can represent a single alphanumeric character. Furthermore, the C language treats characters as integer numbers, allowing us to perform arithmetic operations on them.
In this lesson we will study the char type in detail, analyzing char variables, operations on char values and escape sequences.
char Type
The char type is a data type that represents a single alphanumeric character.
The value of a variable of type char can vary from one system to another, since different systems can use different character sets.
However, nowadays the most widespread character set is ASCII (American Standard Code for Information Interchange), which represents alphanumeric characters in 7 bits. Therefore it is capable of representing 128 characters.
For example, in ASCII, the character '0' which represents the number 0 has the value 48, the character 'A' has the value 65, the character 'a' has the value 97, etc.
ASCII is often extended to 8 bits, exploiting the additional bit. In this way it is possible to represent up to 256 characters with a single byte. There is no single extension of ASCII, but there are different standards, such as Latin-1 or ISO-8859-1 that extend ASCII to support accented characters and symbols.
In general, however, the char type is a data type that occupies 1 byte of memory, therefore 8 bits. On the vast majority of systems, the character set used is ASCII or one of its extensions. Therefore, when programming in C, one can safely assume to work with ASCII encoding.
There are other types of encodings capable of representing a greater number of characters, such as Unicode. We will deal with how to handle Unicode characters in C in future lessons.
To declare a variable of type char the syntax is as follows:
char character;
char constants are written between single quotes, for example:
char character1 = 'A';
char character2 = '0';
char space = ' ';
char Type
The char type in C language is a type capable of representing a single alphanumeric character:
char character;
The C language standard requires that a value of type char always occupies a single byte of memory.
char constants are written between single quotes:
char character = 'A';
Operations on Values of Type char
In C language, working with variables and values of type char is very simple.
This is due to the simple fact that in C char values are treated as integer numbers.
After all, since each individual character is encoded in binary, based on the ASCII standard or one of its derivatives, it follows that the corresponding binary code itself can be seen as an integer number.
In the ASCII standard, as we mentioned above, character values are integer numbers between 0 and 127.
When a character appears in an expression in C, the compiler substitutes it with the corresponding ASCII value. We can even assign a numeric value to a variable of type char, or assign a value of type char to a variable of type int.
Let's take the following example:
int i = 'A';
In this case, the variable i will contain the value 65, which is the ASCII value of the character 'A'. After the assignment, we can use the variable i in any numeric expression.
Let's take another example:
/* ch will be 'A', i.e. 65 */
char ch = 'A';
/* Increments the value of ch by 1 */
/* ch will be 'B', i.e. 66 */
ch++;
/* Increments the value of ch by 5 */
/* ch will be 'G', i.e. 71 */
ch += 5;
In this case, the variable ch is incremented by 1 and then by 5. Since characters are encoded sequentially, ch will go from 'A' to 'B' and then to 'G'.
char values, being numbers, can be used in relational expressions.
For example, the following code checks if a certain character is a lowercase letter:
char character;
/* ... */
if (character >= 'a' && character <= 'z') {
printf("The character is a lowercase letter\n");
} else {
printf("The character is not a lowercase letter\n");
}
When used in expressions of this type, the corresponding numeric value is used. This is possible thanks to the fact that characters are encoded sequentially. Therefore, for example, 'a' has a lower value than 'b', which has a lower value than 'c', etc.
We can exploit this characteristic to our advantage.
For example, we can easily write lines of code that transform an uppercase character into a lowercase character and vice versa.
char character;
/* ... */
if (character >= 'a' && character <= 'z') {
character -= 32; /* Transforms the character to uppercase */
} else if (character >= 'A' && character <= 'Z') {
character += 32; /* Transforms the character to lowercase */
}
Or, we can implement a for loop that iterates through all lowercase letters of the alphabet:
for (char c = 'a'; c <= 'z'; c++) {
printf("%c ", c);
}
Obviously, treating char as numbers can also cause errors. For example, some expressions might be devoid of any meaning:
'a' * 'c' + 'd' / 'b';
Furthermore, there is always the fact that another system uses a different encoding from ASCII, so the portability of our code is reduced.
Signed and Unsigned Characters
Since C treats char as integer numbers, it should not be surprising that char can be signed or unsigned.
Signed char can represent numbers in the range from -128 to 127, while unsigned char can represent numbers in the range from 0 to 255.
The C standard does not specify whether a char is signed or unsigned. Some compilers treat char as signed integers, others as unsigned integers. Other compilers even allow choosing whether to treat char as signed or unsigned integers.
In most cases this distinction does not matter, provided we use char to represent characters.
If, instead, we use char to represent small numbers, it is better to specify whether we want char to be signed or unsigned.
To do this, simply explicitly indicate whether we want a signed or unsigned char with the keywords signed and unsigned.
signed char c1 = -1;
unsigned char c2 = 255;
Never Assume That char Are Signed or Unsigned
Never make assumptions about whether the char type is signed or not. If necessary, explicitly specify whether you want a signed or unsigned char with the keywords signed and unsigned.
Since char are integer numbers, C groups them together with int, short, long and long long as integer arithmetic types.
Escape Sequences
A char constant is usually represented as a single character enclosed in single quotes.
However, there are special characters that cannot be represented directly with a single character for two reasons:
- they are invisible characters, i.e. non-printable;
- or they are characters that cannot be entered through the keyboard.
An example is the newline character which corresponds to the Enter key on the keyboard.
To allow programs to handle all possible characters, C uses a special notation to represent these special characters, called escape sequence.
Escape Sequences
Escape Sequences are a notation that allows representing special characters that cannot be inserted directly in a string or literal character.
Escape sequences start with a backslash (\) followed by one or more characters.
Escape Sequences are divided into two types:
- Escape Sequences for Special Characters;
- Numeric Escape Sequences.
Let's see them in detail.
Escape Sequences for Special Characters
Escape sequences for special characters are represented by a backslash (\) followed by a character.
The following table lists the escape sequences recognized by the C language:
| Escape Sequence | Description |
|---|---|
\' |
Single Quote |
\" |
Double Quote |
\? |
Question Mark |
\\ |
Backslash |
\a |
Beep |
\b |
Backspace |
\f |
Form Feed |
\n |
Newline |
\r |
Carriage Return |
\t |
Tab |
\v |
Vertical Tab |
The first two sequences in the table represent characters that cannot be inserted in a string or literal character as they would be interpreted as delimiters.
For example, if we wanted to represent a string containing a double quote, we would have to use the escape sequence \".
printf("This is a \"quotation\"");
The escape character \? represents the question mark but is rarely used.
The character \\ allows inserting a backslash within a string that would otherwise be interpreted as the beginning of an escape sequence.
The remaining sequences, instead, represent some ASCII control characters, such as the newline character (\n), the carriage return character (\r), the tab character (\t), etc.
Numeric Escape Sequences
Escape Sequences for special characters are very useful, but they do not cover all possible special characters.
They cover the most common non-printable characters. Furthermore, escape sequences are useless for representing characters that have values greater than the maximum value representable in ASCII, i.e. 128.
What to do, then, if we want to represent a character that is not covered by escape sequences?
The answer is: use Numeric Escape Sequences.
Through them it is possible to represent any character based on its ASCII value.
Let's take, for example, the special character ESC which, in ASCII, has value 27. We can represent this character through a numeric escape sequence in two ways:
-
An Octal numeric escape sequence:
Just convert the corresponding number to octal and prepend the character
\or\0.Returning to the example of the
ESCcharacter, in octal its value is 33. Therefore, we can represent theESCcharacter as:char esc = '\033';or:
char esc = '\33'; -
A Hexadecimal numeric escape sequence:
Just convert the corresponding number to hexadecimal and prepend the character
\x.Returning to the example of the
ESCcharacter, in hexadecimal its value is 1B. Therefore, we can represent theESCcharacter as:char esc = '\x1B';Note that the
xmust be lowercase, while the number can be written in uppercase or lowercase:char esc = '\x1b';
Numeric Escape Sequences
Numeric Escape Sequences are a special notation that allows representing any character based on its ASCII value.
They can be:
- Octal: the syntax is
\or\0followed by an octal number; - Hexadecimal: the syntax is
\xfollowed by a hexadecimal number.
In Summary
In this lesson we introduced the char type in C language.
The main points to remember are:
- The
chartype is a data type that represents a single alphanumeric character; - The
chartype occupies 1 byte of memory; charvalues are treated as integer numbers;charvalues can be signed or unsigned;- Escape sequences allow representing special characters that cannot be inserted directly in a string or literal character;
- Escape sequences are divided into escape sequences for special characters and numeric escape sequences.
In the next lesson we will study the standard library functions for character manipulation.