ASCII & Unicode

Giving Every Character a Number

You already know that computers store everything as binary — patterns of 0s and 1s. But how does a computer store text? How does it know the difference between the letter “A”, the number “7”, and a question mark?

The answer is surprisingly simple: every character is given a unique number. When you type the letter “A” on your keyboard, the computer does not store the shape of the letter — it stores the number 65. When it needs to display that character on screen, it looks up number 65 in a table and draws the corresponding shape.

This system of mapping characters to numbers is called character encoding. Think of it like a secret code book — everyone agrees that “A” means 65, “B” means 66, a space means 32, and so on. As long as every computer uses the same code book, text can be stored, transmitted, and displayed correctly everywhere.

There are two main character encoding standards you need to know for your GCSE:

ASCII — the original standard, designed in the 1960s for English-only text
Unicode — the modern universal standard that covers every writing system on Earth (plus emojis!)

Key Concept: Character Encoding Character encoding is an agreed-upon system that assigns a unique number to every character (letters, digits, punctuation, symbols). Computers store these numbers in binary. Without character encoding, a computer would have no way to know what text you typed — it would just see meaningless numbers.

Explore

ASCII — The Original Standard

ASCII stands for American Standard Code for Information Interchange. It was created in 1963 and quickly became the universal way to encode text in early computers. ASCII uses 7 bits per character, which means it can represent 2⁷ = 128 different characters (numbered 0 to 127).

Those 128 characters include:

Uppercase letters: A (65) to Z (90) — 26 characters
Lowercase letters: a (97) to z (122) — 26 characters
Digits: 0 (48) to 9 (57) — 10 characters
Space: code 32
Punctuation and symbols: ! @ # $ % & * ( ) and many more
Control characters: codes 0–31 (things like “new line”, “tab”, and “backspace” — invisible characters that control formatting)

Key ASCII Code Ranges

You do not need to memorise the entire ASCII table, but you must know these key values:

Character	ASCII Code	Binary (7-bit)
Space	32	0100000
0	48	0110000
1	49	0110001
9	57	0111001
A	65	1000001
B	66	1000010
Z	90	1011010
a	97	1100001
b	98	1100010
z	122	1111010

How ASCII Enables Alphabetical Sorting

Notice that the letters are in order: A=65, B=66, C=67, and so on. This is not a coincidence — it was designed this way deliberately. Because “A” has a smaller code number than “B”, a computer can sort words alphabetically simply by comparing the numbers. The computer does not need to “know” the alphabet — it just compares the codes.

The Uppercase/Lowercase Pattern

There is a neat relationship between uppercase and lowercase letters in ASCII:

ASCII Case Pattern

  A = 65    a = 97    difference = 32
  B = 66    b = 98    difference = 32
  C = 67    c = 99    difference = 32
  ...
  Z = 90    z = 122   difference = 32

  The lowercase version is ALWAYS 32 more than the uppercase.

This means that to convert an uppercase letter to lowercase, a computer simply adds 32 to the ASCII code. To convert lowercase to uppercase, it subtracts 32. This is incredibly efficient and still used in programming today.

The Limitation of ASCII

With only 128 characters, ASCII can represent English text and nothing more. It has no room for:

Accented letters (like é, ü, ñ)
Other alphabets (Chinese, Arabic, Japanese, Hindi, Greek, Russian…)
Mathematical symbols beyond the basics
Emojis

In the 1960s this was fine — computers were mostly used by English-speaking scientists and engineers. But as computers spread around the world, 128 characters was nowhere near enough. Something far bigger was needed.

Did You Know? Before graphical interfaces existed, people used ASCII characters to create images called ASCII art. By carefully arranging letters, numbers, and symbols, artists could draw pictures using nothing but text. ASCII art is still popular today in programming culture — many command-line tools display ASCII art logos when they start up. Here is a tiny example:

  /\_/\
 ( o.o )
  > ^ <
 /|   |\
(_|   |_)

Understand

Unicode — The Universal Standard

As the internet connected people across the globe, it became clear that the world needed a single encoding system that could handle every writing system. The solution was Unicode, first published in 1991.

Unicode uses up to 32 bits per character, giving it the theoretical capacity to represent over 4 billion characters. In practice, Unicode currently defines over 149,000 characters covering:

Every modern writing system: Latin, Chinese, Arabic, Japanese (Kanji, Hiragana, Katakana), Hindi (Devanagari), Korean (Hangul), Cyrillic (Russian), Greek, Hebrew, Thai, and many more
Historical scripts: Egyptian hieroglyphs, Cuneiform, Runic
Mathematical symbols, musical notation, and technical symbols
Emojis: thousands of them, from smiley faces to flags to food

Backwards Compatibility with ASCII

A brilliant design decision: the first 128 Unicode characters are identical to ASCII. This means that A is still 65, a is still 97, and 0 is still 48 in Unicode. Any old ASCII text file is automatically valid Unicode too. This is called backwards compatibility — the new system works with everything the old system created.

UTF-8: The Most Common Encoding

Unicode defines the numbers (called code points), but it needs an encoding scheme to turn those numbers into actual bytes stored on disk. The most widely used encoding is UTF-8, which uses a clever variable-length system:

1 byte for ASCII characters (codes 0–127) — identical to ASCII
2 bytes for accented European letters, Greek, Cyrillic, Arabic, Hebrew
3 bytes for Chinese, Japanese, and Korean characters
4 bytes for emojis, rare historical scripts, and mathematical symbols

This is incredibly efficient: English text takes exactly the same space in UTF-8 as in ASCII (1 byte per character), while characters from other languages use only as many bytes as they need.

Emojis in Unicode

Every emoji has a unique Unicode code point, written with a “U+” prefix followed by a hexadecimal number:

Emoji	Name	Code Point	UTF-8 Bytes
😀	Grinning Face	U+1F600	4 bytes
👍	Thumbs Up	U+1F44D	4 bytes
❤	Red Heart	U+2764	3 bytes
🚀	Rocket	U+1F680	4 bytes
🎮	Video Game Controller	U+1F3AE	4 bytes

Why do emojis look different on different devices? The Unicode standard only defines the code point and a general description (e.g. “grinning face”). Each company — Apple, Google, Samsung, Microsoft — designs its own artwork for each emoji. That is why the same thumbs-up emoji (U+1F44D) can look quite different on an iPhone versus an Android phone. The code is the same, but the visual design is different.

Storage Comparison

How Much Space Does Text Take?

  Text: "Hello"
  ASCII:  H(72) e(101) l(108) l(108) o(111)  = 5 bytes
  UTF-8:  H(72) e(101) l(108) l(108) o(111)  = 5 bytes  (same!)

  Chinese text: 你好 ("hello" in Chinese)
  UTF-8:  你(3 bytes) 好(3 bytes)              = 6 bytes

  Emoji: 😀
  UTF-8:  😀(4 bytes)                        = 4 bytes

  Key insight: UTF-8 is efficient because simple characters
  use fewer bytes, and complex ones use more.

Key Concept: ASCII vs Unicode Comparison

Feature	ASCII	Unicode (UTF-8)
Bits per character	7 bits (stored in 1 byte)	8 to 32 bits (1 to 4 bytes)
Total characters	128	Over 149,000
Languages supported	English only	Every written language
Emojis	None	Thousands
Storage (English text)	1 byte per character	1 byte per character
Storage (other scripts)	Not supported	2–4 bytes per character
Backwards compatible?	—	Yes (first 128 = ASCII)

Did You Know? The Unicode Consortium (the organisation that manages Unicode) receives proposals for new emojis every year. Each proposal must include evidence of expected usage, explain why existing emojis cannot fill the role, and show that the emoji would be used frequently in text communication. It takes about two years from proposal to appearing on your phone. As of 2024, Unicode includes over 3,600 emoji characters!

Explore

Interactive Exercises

Exercise 1: ASCII Code Lookup

A random character will appear below. Type its ASCII code number.

Hint: A=65, a=97, 0=48. Letters and digits are in order from there.

Score: 0 / 0

Exercise 2: ASCII Decoder

The ASCII codes below spell out a word. Decode them and type the word.

Remember: A=65, a=97. Work out each letter from the code.

72 101 108 108 111

Score: 0 / 0

Exercise 3: Text to ASCII Converter

Type any text below and see its ASCII/Unicode codes, binary representation, and byte count in real time.

Practice

Test Yourself

Click on each question to reveal the answer. Try to work it out yourself first!

Q1: How many characters can 7-bit ASCII represent?

Answer: 128 characters.

With 7 bits, the number of possible combinations is 2⁷ = 128. These are numbered 0 to 127.

Q2: What is the ASCII code for 'A'? And for 'a'?

Answer: A = 65, a = 97.

The difference between an uppercase and its lowercase equivalent is always 32. So to go from ‘A’ (65) to ‘a’ (97), add 32. This pattern holds for every letter in the alphabet.

Q3: Why is ASCII limited? Give two reasons.

Answer:

ASCII only supports English characters. It cannot represent letters from other languages such as Chinese, Arabic, Hindi, or even accented European letters like é or ü.
With only 128 characters, there is no room for additional symbols, emojis, or the characters needed by the billions of people who do not write in English.

Q4: How does Unicode solve ASCII's limitations?

Answer: Unicode uses up to 32 bits per character, allowing it to represent over 149,000 characters. This includes every modern writing system (Chinese, Arabic, Japanese, Hindi, Korean, etc.), historical scripts, mathematical symbols, and emojis. Unicode is also backwards compatible with ASCII — the first 128 Unicode characters are identical to ASCII, so existing ASCII text works without any changes.

Q5: What is UTF-8 and why is it commonly used?

Answer: UTF-8 is the most widely used encoding scheme for Unicode. It uses a variable-length system: ASCII characters take only 1 byte, while other characters use 2, 3, or 4 bytes as needed. This makes it efficient because English text is no larger than in ASCII, but it can still represent every Unicode character. UTF-8 is used by the vast majority of websites, emails, and modern software.

Q6: Decode these ASCII codes: 80 121 116 104 111 110

Answer: “Python”

Working: 80=P, 121=y, 116=t, 104=h, 111=o, 110=n.

Method: P is the 16th letter of the alphabet, and uppercase letters start at 65, so P = 65 + 15 = 80. The lowercase letters: y = 97 + 24 = 121, t = 97 + 19 = 116, h = 97 + 7 = 104, o = 97 + 14 = 111, n = 97 + 13 = 110.

Q7: A text file contains 500 ASCII characters. How many bytes is the file?

Answer: 500 bytes.

Each ASCII character is stored using 1 byte (8 bits, though only 7 are needed for the code — the 8th bit is typically 0). So 500 characters × 1 byte = 500 bytes. In UTF-8, these same 500 ASCII characters would also be 500 bytes, since UTF-8 uses 1 byte for any character in the 0–127 range.

Understand

Key Vocabulary

Make sure you know all of these terms for your exam:

Term	Definition
ASCII	American Standard Code for Information Interchange — a 7-bit character encoding that represents 128 characters including English letters, digits, punctuation, and control characters.
Unicode	A universal character encoding standard that can represent over 149,000 characters from every writing system in the world, plus emojis and symbols.
UTF-8	The most common Unicode encoding scheme. Uses variable-length encoding: 1 byte for ASCII characters, 2–4 bytes for other characters. Backwards compatible with ASCII.
Character Encoding	An agreed-upon system that assigns a unique number to each character (letter, digit, symbol) so that computers can store, transmit, and display text.
Code Point	The unique number assigned to a character in Unicode, written with a “U+” prefix and a hexadecimal value (e.g. U+0041 for ‘A’, U+1F600 for the grinning face emoji).
Backwards Compatible	When a newer system is designed to work correctly with data or files created for an older system. Unicode is backwards compatible with ASCII because its first 128 code points are identical.

Explore

Exam Tips

Exam Tip 1: Memorise Three Key Codes You do not need to memorise the full ASCII table, but you must know these three starting points: A = 65, a = 97, and 0 = 48. From these, you can work out any letter or digit. For example, ‘G’ is the 7th letter, so G = 65 + 6 = 71. The digit ‘5’ = 48 + 5 = 53.

Exam Tip 2: ASCII vs Unicode — The Compare Question A very common exam question asks you to compare ASCII and Unicode. A strong answer covers four points: (1) ASCII uses 7 bits per character, Unicode uses up to 32 bits. (2) ASCII represents 128 characters, Unicode represents over 149,000. (3) ASCII only covers English, Unicode covers every language and emojis. (4) Unicode is backwards compatible with ASCII. Always include a point about storage: ASCII files are smaller for the same English text, but Unicode is necessary for multilingual content.

Exam Tip 3: File Size Calculations In ASCII, each character = 1 byte, so the file size in bytes equals the number of characters. A 2,000-character ASCII file = 2,000 bytes. In UTF-8, English characters are still 1 byte each, but characters from other languages may be 2, 3, or 4 bytes. If a question specifies “ASCII text,” you can assume 1 byte per character. Always show your working: number of characters × bytes per character = file size.

Exam Tip 4: Common Mistakes to Avoid

Saying “Unicode uses 2 bytes per character.” This is a common misconception. UTF-8 uses a variable number of bytes (1 to 4). Only the older UTF-16 encoding uses a minimum of 2 bytes.
Forgetting backwards compatibility. If asked “what is the advantage of Unicode over ASCII,” always mention that Unicode is backwards compatible — it does not break existing ASCII files.
Confusing the character with its code. The digit character ‘7’ has ASCII code 55, not 7. The character ‘0’ has code 48, not 0. This trips up many students.

Exam Practice

Past Paper Questions

Try these exam-style questions, then click to reveal the mark scheme answer.

Explain the difference between ASCII and Unicode character encoding. [2] marks

Mark scheme:

ASCII uses 7 bits per character and can represent 128 characters (1)
Unicode uses up to 32 bits per character and can represent characters from every language in the world (1)

The ASCII code for the letter 'A' is 65. What is the ASCII code for the letter 'D'? [1] mark

Mark scheme:

68 (65 + 3) (1)

Give one advantage and one disadvantage of using Unicode instead of ASCII. [2] marks

Mark scheme:

Advantage: Unicode can represent characters from all languages / supports emojis and special symbols (1)
Disadvantage: Unicode uses more bits per character so files are larger / requires more storage space (1)

Reflect

Character Encoding in Everyday Life

Character encoding is something you use every single day without thinking about it:

Every text message you send is encoded as a sequence of Unicode numbers, transmitted as bytes, and decoded back into characters on the other person’s phone.
Every web page you visit declares its encoding (almost always UTF-8) in its HTML header, so your browser knows how to interpret the bytes into readable text.
Every emoji you send is a Unicode code point. When you send a thumbs-up to a friend with a different phone, they receive the same code point (U+1F44D) but see their device’s own artwork for it.
Every programming language uses character encoding when working with strings. Python 3, for example, uses Unicode by default, which is why you can write print("你好") and it works perfectly.

Have you ever seen strange characters like “�” or “Ã©” on a website? That happens when the encoding is wrong — the browser is trying to decode bytes using one encoding scheme when the file was saved in a different one. Understanding character encoding helps you understand and fix these problems.

From the simple 128-character ASCII table of the 1960s to Unicode’s 149,000+ characters today, character encoding is a brilliant example of how computer science evolves to meet the needs of a connected world.

Video Resources

▶️

Craig 'n' Dave: Representing TextASCII and Unicode character sets explained

On This Page

Giving Every Character a Number

ASCII — The Original Standard

Key ASCII Code Ranges

How ASCII Enables Alphabetical Sorting

The Uppercase/Lowercase Pattern

The Limitation of ASCII

Unicode — The Universal Standard

Backwards Compatibility with ASCII

UTF-8: The Most Common Encoding

Emojis in Unicode

Storage Comparison

Interactive Exercises

Exercise 1: ASCII Code Lookup

Exercise 2: ASCII Decoder

Exercise 3: Text to ASCII Converter

Test Yourself

Key Vocabulary

Exam Tips

Past Paper Questions

Character Encoding in Everyday Life

Video Resources

Further Reading