Giving Every Character a Number

You already know that computers store everything as binary — patterns of 0s and 1s. But how does a computer store text? How does it know the difference between the letter “A”, the number “7”, and a question mark?

The answer is surprisingly simple: every character is given a unique number. When you type the letter “A” on your keyboard, the computer does not store the shape of the letter — it stores the number 65. When it needs to display that character on screen, it looks up number 65 in a table and draws the corresponding shape.

This system of mapping characters to numbers is called character encoding. Think of it like a secret code book — everyone agrees that “A” means 65, “B” means 66, a space means 32, and so on. As long as every computer uses the same code book, text can be stored, transmitted, and displayed correctly everywhere.

There are two main character encoding standards you need to know for your GCSE:

Key Concept: Character Encoding Character encoding is an agreed-upon system that assigns a unique number to every character (letters, digits, punctuation, symbols). Computers store these numbers in binary. Without character encoding, a computer would have no way to know what text you typed — it would just see meaningless numbers.

ASCII — The Original Standard

ASCII stands for American Standard Code for Information Interchange. It was created in 1963 and quickly became the universal way to encode text in early computers. ASCII uses 7 bits per character, which means it can represent 27 = 128 different characters (numbered 0 to 127).

Those 128 characters include:

Key ASCII Code Ranges

You do not need to memorise the entire ASCII table, but you must know these key values:

CharacterASCII CodeBinary (7-bit)
Space320100000
0480110000
1490110001
9570111001
A651000001
B661000010
Z901011010
a971100001
b981100010
z1221111010

How ASCII Enables Alphabetical Sorting

Notice that the letters are in order: A=65, B=66, C=67, and so on. This is not a coincidence — it was designed this way deliberately. Because “A” has a smaller code number than “B”, a computer can sort words alphabetically simply by comparing the numbers. The computer does not need to “know” the alphabet — it just compares the codes.

The Uppercase/Lowercase Pattern

There is a neat relationship between uppercase and lowercase letters in ASCII:

ASCII Case Pattern
  A = 65    a = 97    difference = 32
  B = 66    b = 98    difference = 32
  C = 67    c = 99    difference = 32
  ...
  Z = 90    z = 122   difference = 32

  The lowercase version is ALWAYS 32 more than the uppercase.

This means that to convert an uppercase letter to lowercase, a computer simply adds 32 to the ASCII code. To convert lowercase to uppercase, it subtracts 32. This is incredibly efficient and still used in programming today.

The Limitation of ASCII

With only 128 characters, ASCII can represent English text and nothing more. It has no room for:

In the 1960s this was fine — computers were mostly used by English-speaking scientists and engineers. But as computers spread around the world, 128 characters was nowhere near enough. Something far bigger was needed.

Did You Know? Before graphical interfaces existed, people used ASCII characters to create images called ASCII art. By carefully arranging letters, numbers, and symbols, artists could draw pictures using nothing but text. ASCII art is still popular today in programming culture — many command-line tools display ASCII art logos when they start up. Here is a tiny example:
  /\_/\
 ( o.o )
  > ^ <
 /|   |\
(_|   |_)

Unicode — The Universal Standard

As the internet connected people across the globe, it became clear that the world needed a single encoding system that could handle every writing system. The solution was Unicode, first published in 1991.

Unicode uses up to 32 bits per character, giving it the theoretical capacity to represent over 4 billion characters. In practice, Unicode currently defines over 149,000 characters covering:

Backwards Compatibility with ASCII

A brilliant design decision: the first 128 Unicode characters are identical to ASCII. This means that A is still 65, a is still 97, and 0 is still 48 in Unicode. Any old ASCII text file is automatically valid Unicode too. This is called backwards compatibility — the new system works with everything the old system created.

UTF-8: The Most Common Encoding

Unicode defines the numbers (called code points), but it needs an encoding scheme to turn those numbers into actual bytes stored on disk. The most widely used encoding is UTF-8, which uses a clever variable-length system:

This is incredibly efficient: English text takes exactly the same space in UTF-8 as in ASCII (1 byte per character), while characters from other languages use only as many bytes as they need.

Emojis in Unicode

Every emoji has a unique Unicode code point, written with a “U+” prefix followed by a hexadecimal number:

EmojiNameCode PointUTF-8 Bytes
😀Grinning FaceU+1F6004 bytes
👍Thumbs UpU+1F44D4 bytes
Red HeartU+27643 bytes
🚀RocketU+1F6804 bytes
🎮Video Game ControllerU+1F3AE4 bytes

Why do emojis look different on different devices? The Unicode standard only defines the code point and a general description (e.g. “grinning face”). Each company — Apple, Google, Samsung, Microsoft — designs its own artwork for each emoji. That is why the same thumbs-up emoji (U+1F44D) can look quite different on an iPhone versus an Android phone. The code is the same, but the visual design is different.

Storage Comparison

How Much Space Does Text Take?
  Text: "Hello"
  ASCII:  H(72) e(101) l(108) l(108) o(111)  = 5 bytes
  UTF-8:  H(72) e(101) l(108) l(108) o(111)  = 5 bytes  (same!)

  Chinese text: 你好 ("hello" in Chinese)
  UTF-8:  你(3 bytes) 好(3 bytes)              = 6 bytes

  Emoji: 😀
  UTF-8:  😀(4 bytes)                        = 4 bytes

  Key insight: UTF-8 is efficient because simple characters
  use fewer bytes, and complex ones use more.
Key Concept: ASCII vs Unicode Comparison
FeatureASCIIUnicode (UTF-8)
Bits per character7 bits (stored in 1 byte)8 to 32 bits (1 to 4 bytes)
Total characters128Over 149,000
Languages supportedEnglish onlyEvery written language
EmojisNoneThousands
Storage (English text)1 byte per character1 byte per character
Storage (other scripts)Not supported2–4 bytes per character
Backwards compatible?Yes (first 128 = ASCII)
Did You Know? The Unicode Consortium (the organisation that manages Unicode) receives proposals for new emojis every year. Each proposal must include evidence of expected usage, explain why existing emojis cannot fill the role, and show that the emoji would be used frequently in text communication. It takes about two years from proposal to appearing on your phone. As of 2024, Unicode includes over 3,600 emoji characters!

Interactive Exercises

Exercise 1: ASCII Code Lookup

A random character will appear below. Type its ASCII code number.

Hint: A=65, a=97, 0=48. Letters and digits are in order from there.

A
Score: 0 / 0

Exercise 2: ASCII Decoder

The ASCII codes below spell out a word. Decode them and type the word.

Remember: A=65, a=97. Work out each letter from the code.

72 101 108 108 111
Score: 0 / 0

Exercise 3: Text to ASCII Converter

Type any text below and see its ASCII/Unicode codes, binary representation, and byte count in real time.

Test Yourself

Click on each question to reveal the answer. Try to work it out yourself first!

Q1: How many characters can 7-bit ASCII represent?

Answer: 128 characters.

With 7 bits, the number of possible combinations is 27 = 128. These are numbered 0 to 127.

Q2: What is the ASCII code for 'A'? And for 'a'?

Answer: A = 65, a = 97.

The difference between an uppercase and its lowercase equivalent is always 32. So to go from ‘A’ (65) to ‘a’ (97), add 32. This pattern holds for every letter in the alphabet.

Q3: Why is ASCII limited? Give two reasons.

Answer:

  1. ASCII only supports English characters. It cannot represent letters from other languages such as Chinese, Arabic, Hindi, or even accented European letters like é or ü.
  2. With only 128 characters, there is no room for additional symbols, emojis, or the characters needed by the billions of people who do not write in English.
Q4: How does Unicode solve ASCII's limitations?

Answer: Unicode uses up to 32 bits per character, allowing it to represent over 149,000 characters. This includes every modern writing system (Chinese, Arabic, Japanese, Hindi, Korean, etc.), historical scripts, mathematical symbols, and emojis. Unicode is also backwards compatible with ASCII — the first 128 Unicode characters are identical to ASCII, so existing ASCII text works without any changes.

Q5: What is UTF-8 and why is it commonly used?

Answer: UTF-8 is the most widely used encoding scheme for Unicode. It uses a variable-length system: ASCII characters take only 1 byte, while other characters use 2, 3, or 4 bytes as needed. This makes it efficient because English text is no larger than in ASCII, but it can still represent every Unicode character. UTF-8 is used by the vast majority of websites, emails, and modern software.

Q6: Decode these ASCII codes: 80 121 116 104 111 110

Answer:Python

Working: 80=P, 121=y, 116=t, 104=h, 111=o, 110=n.

Method: P is the 16th letter of the alphabet, and uppercase letters start at 65, so P = 65 + 15 = 80. The lowercase letters: y = 97 + 24 = 121, t = 97 + 19 = 116, h = 97 + 7 = 104, o = 97 + 14 = 111, n = 97 + 13 = 110.

Q7: A text file contains 500 ASCII characters. How many bytes is the file?

Answer: 500 bytes.

Each ASCII character is stored using 1 byte (8 bits, though only 7 are needed for the code — the 8th bit is typically 0). So 500 characters × 1 byte = 500 bytes. In UTF-8, these same 500 ASCII characters would also be 500 bytes, since UTF-8 uses 1 byte for any character in the 0–127 range.

Key Vocabulary

Make sure you know all of these terms for your exam:

TermDefinition
ASCIIAmerican Standard Code for Information Interchange — a 7-bit character encoding that represents 128 characters including English letters, digits, punctuation, and control characters.
UnicodeA universal character encoding standard that can represent over 149,000 characters from every writing system in the world, plus emojis and symbols.
UTF-8The most common Unicode encoding scheme. Uses variable-length encoding: 1 byte for ASCII characters, 2–4 bytes for other characters. Backwards compatible with ASCII.
Character EncodingAn agreed-upon system that assigns a unique number to each character (letter, digit, symbol) so that computers can store, transmit, and display text.
Code PointThe unique number assigned to a character in Unicode, written with a “U+” prefix and a hexadecimal value (e.g. U+0041 for ‘A’, U+1F600 for the grinning face emoji).
Backwards CompatibleWhen a newer system is designed to work correctly with data or files created for an older system. Unicode is backwards compatible with ASCII because its first 128 code points are identical.

Exam Tips

Exam Tip 1: Memorise Three Key Codes You do not need to memorise the full ASCII table, but you must know these three starting points: A = 65, a = 97, and 0 = 48. From these, you can work out any letter or digit. For example, ‘G’ is the 7th letter, so G = 65 + 6 = 71. The digit ‘5’ = 48 + 5 = 53.
Exam Tip 2: ASCII vs Unicode — The Compare Question A very common exam question asks you to compare ASCII and Unicode. A strong answer covers four points: (1) ASCII uses 7 bits per character, Unicode uses up to 32 bits. (2) ASCII represents 128 characters, Unicode represents over 149,000. (3) ASCII only covers English, Unicode covers every language and emojis. (4) Unicode is backwards compatible with ASCII. Always include a point about storage: ASCII files are smaller for the same English text, but Unicode is necessary for multilingual content.
Exam Tip 3: File Size Calculations In ASCII, each character = 1 byte, so the file size in bytes equals the number of characters. A 2,000-character ASCII file = 2,000 bytes. In UTF-8, English characters are still 1 byte each, but characters from other languages may be 2, 3, or 4 bytes. If a question specifies “ASCII text,” you can assume 1 byte per character. Always show your working: number of characters × bytes per character = file size.
Exam Tip 4: Common Mistakes to Avoid
  • Saying “Unicode uses 2 bytes per character.” This is a common misconception. UTF-8 uses a variable number of bytes (1 to 4). Only the older UTF-16 encoding uses a minimum of 2 bytes.
  • Forgetting backwards compatibility. If asked “what is the advantage of Unicode over ASCII,” always mention that Unicode is backwards compatible — it does not break existing ASCII files.
  • Confusing the character with its code. The digit character ‘7’ has ASCII code 55, not 7. The character ‘0’ has code 48, not 0. This trips up many students.

Past Paper Questions

Try these exam-style questions, then click to reveal the mark scheme answer.

Explain the difference between ASCII and Unicode character encoding. [2] marks

Mark scheme:

  • ASCII uses 7 bits per character and can represent 128 characters (1)
  • Unicode uses up to 32 bits per character and can represent characters from every language in the world (1)
The ASCII code for the letter 'A' is 65. What is the ASCII code for the letter 'D'? [1] mark

Mark scheme:

  • 68 (65 + 3) (1)
Give one advantage and one disadvantage of using Unicode instead of ASCII. [2] marks

Mark scheme:

  • Advantage: Unicode can represent characters from all languages / supports emojis and special symbols (1)
  • Disadvantage: Unicode uses more bits per character so files are larger / requires more storage space (1)

Character Encoding in Everyday Life

Character encoding is something you use every single day without thinking about it:

Have you ever seen strange characters like “�” or “é” on a website? That happens when the encoding is wrong — the browser is trying to decode bytes using one encoding scheme when the file was saved in a different one. Understanding character encoding helps you understand and fix these problems.

From the simple 128-character ASCII table of the 1960s to Unicode’s 149,000+ characters today, character encoding is a brilliant example of how computer science evolves to meet the needs of a connected world.

Video Resources

Further Reading