ASCII character map
ASCII, the history
The most used computer standard is without doubt ASCII,
the American Standard Code for Information Interchange. When people started
to develop computers, they had to define a way to represent certain types of information in
a digital format. For numbers this was relatively easy, but text representation was
far more difficult. Morse code was developed in the 19th century, but
could not be easily adapted to the binary system in computers because the codes
used for characters have different lengths and there is no obvious sorting method.
IBM came in the sixties of the previous century with it's own solution EBCDIC,
Extended Binary Coded Decimal Interchange Code used on their mainframes and AS/400
systems. But this system had some drawbacks. The letters of the alphabet were placed in
blocks which is not very useful for sorting.
At the same time that IBM was developing her EBCDIC solution, others computer developers
were creating their own.
evident that exchanging data between various computer systems would be a huge problem if this
diversity would not stop. It was therefore that
Bob Bemer—now often called the Father of ASCII—compiled
all different coding methods in a huge list.
It was this list that made computer manufacturers realize that something had to be done
about this situation quiclky. Bob Bemer started standardization committees, and the first implementation
of ASCII was introduced in 1963. Extensions for foreign languages were adopted to ASCII in 1967, and in
1968 it finally became an official government standard.
Nowadays 100% of all computers use the ASCII coding as their primary coding system. Extensions for foreign
languages are all coded as a superset of ASCII. Therefore we can say without doubt that ASCII is the
most used computer standard in the world.
ASCII character set table
The ASCII character set has been adopted as the standard in information
exchange. The first 32 characters and the last one are control codes, the
others are printable characters. The control codes DC1 (XON)
and DC3 (XOFF)
are used in software flow control applications. The following table shows
the ASCII character set.
ASCII control codes in detail
- 0 – NUL – Null character
- The NUL character in the ASCII character set was originally ment to be treated as
a NOP, a character to be ignored. This would be useful on paper tapes where additional information
had to be added in between existing information. However, some printing devices had the
NUL implemented as a wite space instead. Later on, the importance of the null character
increased significantly when it was defined as the string terminator in the C programming language.
It made it possible to define strings of infinite length in programming languages. Until then most languages
like Pascal defined a string as a length indicator, followed by an array that contained the characters.
- 1 – SOH – Start of heading
- If the communication primarily exists of commands and messages, the SOH can be used
to mark the beginning of each message header. In the original 1963 definition of the ASCII standard
the name start of message was used, which has been renamed to start of heading
in the final release. Nowadays we often see the SOH used in serial RS232 communications
where there is a master-slave configuration. Each command from the master starts with the SOH.
This makes it possible for the slave or slaves to resynchronize on the next command when data errors
occured. Without a clear marking of the start of each command a resync might be problematic to implement.
- 2 – STX – Start of text
3 – ETX – End of text
- A message based communication protocol
will probably use messages with a header containing addressing information, followed
by the actual content. The ASCII STX indicates the start of the content part in such a
message. This control code automatically ends a previous header, i.e. there is no control code to close
a header started by SOH. The end of the message content is signalled with control character
ETX. The actual contents of a message are not defined by the ASCII standard and are protocol dependent.
Interesting to note is, that in the 1963 draft of the standard, naming conventions differed.
STX was in this draft called EOA, end of address and
ETX started its life as EOM, end of message. This is because in
the original draft a message always contained a start and stop control character. The new definition allowed
to use only the SOH to send a fixed length command, without the need to end the command
with a trailing control code. In fact, in current serial protocols we see this commonly used where fixed
length messages are sent without a distinction between the header and content.
- 4 – EOT – End of transmission
- 5 – ENQ – Enquiry
- 6 – ACK – Acknowledgment
- 7 – BEL – Audible bell
- The BEL code is an interesting one in the ASCII set as it is not primarily used for data
coding or device control. Instead it is used to attract human attention with an audible sound.
It was intended to be used on both computers and devices like printers. In the programming language C
the control code \a is used the bell signal.
- 8 – BS – Backspace
- The functionality of the backspace has changed over time. In the beginning it was primarily ment to
move the cursor one character backwards on printers and teletypes to make accents on characters possible. For
example to generate the character â one could send the sequence
aBS^ to the printer.
This method was a practical copy of the way how characters with accents were handled on mechanical typewriters,
but when CRT's were introduced it was no longer supported in that way. Therefore now the backspace is most
often used to not only reposition the cursor, but also delete the actual contents on that position. You can
use this control character as \b in the C programming language.
- 9 – HT – Horizontal tab
- The HT control character in the ASCII character set is defined for layout purposes.
It instructs the output device to proceed to the next table column. Table column width is flexible, but on many
devices the distance between table columns defaults to 8. The use of the horizontal tab not only
reduced the work for data typists, but also introduced a method to reduce the amount of storage space
necessary for formatted texts. We will now laugh about it, but keep in mind that the ASCII standard was
developed 40 years ago when every byte of storage was valuable, and compression methods like ZIP, didn't exist.
The control character HT is available as \t in the C programming language.
- 10 – LF – Line feed
- The line feed character is one of the characters in the ASCII character set that has been misused.
Originaly, the LF character was ment to move the head of a printer one line down. A second
control character CR would then be used to move the printing head to the left margin. This
is the way it was implemented in many serial protocols and in operating systems like MS-DOS and Windows. On the other hand
the C programming language and Unix operating system redefined this character as newline which
ment a combination of line feed and carriage return. You can argue about which use is wrong.
The way C and Unix handle it is certainly more natural from a programming point of view. On the other hand is
the MS-DOS implementation closer to the original definition. It would have been better if both line feed and
newline were part of the original ASCII definition because the first defines a typical device control
functionality where the latter is a logical text separator. But this separation is not the case. Nowadays people
tend to use the LF character mainly as newline function and most software that handles plain ASCII text
files is capable of handling both single LF and CR/LF combinations.
The control character is in the programming language C available as \n.
- 11 – VT – Vertical tab
- The vertical tab is like the horizontal tab defined to reduce the amount of work for
creating layouts, and also reduce the amount of storage space for formatted text pages. The VT
control code is used to jump to the next marked line. To be honest, I have never seen a situation or application
where this functionality was implemented. In most situations a sequence of LF codes is used
- 12 – FF – Form feed
- The form feed code FF was designed to control the behaviour of printers. When receiving this code
the printer moves to the next sheet of paper. The behaviour of the control code on terminals depends
on the implementation. Some clear the screen, whereas others only display the ^L
characters or perform a line feed instead.
The shell environments Bash and Tcsh have implemented the ASCII form feed as a clear screen command.
The form feed is implemented as \f in the C programming language.
- 13 – CR – Carriage return
- The carriage return in the ASCII character set in its original form is ment to move the printing head
back to the left margin without moving to the next line. Over time this code has also been assigned to the
enter key on keyboards to signal that the input of text is finished. With screen oriented representation
of data, people wanted that entering data would also imply that the cursor positioned to the next line.
Therefore, in the C programming language and the unix operating system, a redefinition of the LF
control code has taken place to newline. Often software now silently translates an entered
CR to the LF ASCII code when the data is stored.
- 14 – SO – Shift out
15 – SI – Shift in
- Even as early as in the sixties, the people who defined the ASCII character set understood
that it would be valuable to make the character set not only available for the English alphabet,
but also for foreign ones. The shift in and shift out were defined for this
purpose. Originaly it was ment to switch between the cyrillic alphabet and latin. The cyrillic
ASCII definition which uses the shift characters is KOI-7. Later on these control codes were also
used to change the typeface on printers. In this use SO produced double wide characters
where condensed printing was selected with SI.
- 16 – DLE – Data link escape
- It is sometimes necessary in an ongoing data communication to send control characters. There
are situations where those control characters might be understood as part of the normal data stream.
The DLE has been defined in the ASCII standard for these situations. If this character
is detected in a datastream, the receiving party knows, that one or more of the following characters
must be interpreted in a different way than the other characters in the stream. The exact interpretation
of the following characters is not part of the ASCII definition, just the availability to break out of
a communication stream with the data link escape. In the Hayes communication protocol for modems, the
data link escape has been defined as silence+++silence. In my opinion
it would have been a better idea if the Hayes protocol had used the DLE instead, as
it does not need to embedded by communication silence, and it would fit within an existing standard.
However, the developers of Hayes decided otherwise and now the +++ sequence is used
far more often then the original DLE.
- 17 – DC1 – Device control 1 / XON – Transmission on
- Although originally defined as DC1, this ASCII control code is now better
known as the XON code
software flow control
in serial communications. The main use is restarting the transmission
after the communication has been stopped by the XOFF control code. People who used to work with serial
terminals probably remember that sometimes when data errors occured, it helped to hit the Ctrl-Q key. This
is because this key-sequence in fact generates the XON control code, which unlocks a blocked communication
when terminal or host computer accidentaly interpreted an errornous character as XOFF.
- 18 – DC2 – Device control 2
- 19 – DC3 – Device control 3 / XOFF – Transmission off
- 20 – DC4 – Device control 4
- 21 – NAK – Negative acknowledgment
- 22 – SYN – Synchronous idle
- 23 – ETB – End of transmission block
- 24 – CAN – Cancel
- 25 – EM – End of medium
- The EM is used at the end of a serial storage medium like paper tape or magnetic reels.
It indicates the logical end of the data. It is not necessary that this is also the physical end of
the data carrier.
- 26 – SUB – Substitute character
- 27 – ESC – Escape
- The escape character is one of the inventions in the ASCII standard that was proposed by Bob Bemer.
It is used to start an extended sequence of control codes. In this way it was not necessary to put all
thinkable control codes in the ASCII standard. As new technologies would need new control commands, the
ESC would be present to be the starting character of these multi-character commands. Escape
codes are widely used in printers and terminals to control device settings like fonts, text positioning and
colors. If ESC had been absent in the original ASCII definition, the standard would likely
have been superseeded by some other standard in the past. The escape possibility allowed developers to literaly
escape from the standard where necessary, but use it whenever possible.
- 28 – FS – File separator
- The file separator FS is an interesting control code, as it gives us insight in the
way that computer technology was organized in the sixties. We are now used to random access media like RAM
and magnetic disks, but when the ASCII standard was defined, most data was serial. I am not only talking about
serial communications, but also about serial storage like punch cards, paper tape and magnetic tapes.
In such a situation it is clearly efficient to have a single control code to signal the separation of two files.
The FS was defined for this purpose.
- 29 – GS – Group separator
- Data storage was one of the main reasons for some control codes to get in the ASCII definition. Databases
are most of the time setup with tables, containing records. All records in one table have the same type, but
records of different tables can be different. The group separator GS is defined to
separate tables in a serial data storage system. Note that the word table wasn't used at that moment
and the ASCII people called it a group.
- 30 – RS – Record separator
- Within a group (or table) the records are separated with RS or record separator.
- 31 – US – Unit separator
- The smallest data items to be stored in a database are called units in the ASCII definition. We
would call them field now. The unit separator separates these fields in a serial data storage
environment. Most current database implementations require that fields of most types have a fixed length.
Enough space in the record is allocated to store the largest possible member of each field, even if this is
not necessary in most cases. This costs a large amount of space in many situations. The US
control code allows all fields to have a variable length. If data storage space is limited—as
in the sixties—this is a good way to preserve valuable space. On the other hand is serial storage
far less efficient than the table driven RAM and disk implementations of modern times. I can't imagine a
situation where modern SQL databases are run with the data stored on paper tape or magnetic reels...
- 32 – SP – White space
- You can argue if the space character is a real control character as it is so widely used in normal
texts. But, as the horizontal tab and backspace are also called control characters in the ASCII set, I think it
is most natural to call the white space or forward space
also a control character. Afterall it doesn't represent
a character by itself, but merely a command to the output device to proceed one position forward,
clearing the information in the current field. In many applications
like wordprocessors the white space is also a character that can cause lines to wrap, and web browsers combine
multiple spaces to just one output character. This stengthens my belief
that it is not just representing a unique character, but an information carrier for devices and applications.
- 127 – DEL – Delete
- One might question why all control codes in the ASCII character set have low values, but
the DEL control code has value 127. This is, because this specific character was defined for deleting data
on paper tapes. Most paper tapes in that time used 7 holes to code the data. The value 127 represents
a binary pattern were all seven bits are high, so when using the DEL character on an existing paper tape, all holes
are punched and existing data is erased.
Hindsight is an exact science.
FAGIN'S RULE ON PAST PREDICTIONS