How does text become numbers?
We said a computer only knows numbers. Yet we type letters, and letters show up on the screen. How does a machine that only knows numbers handle letters at all?
Every letter has a number on it
The secret is simpler than you'd think.
People got together
and assigned each letter a number.
A is 65,
B is 66,
and so on.
So when you press A on the keyboard,
the computer doesn't get a letter,
it gets the number 65.
Press a letter and its number appears.
From A to Z,
lowercase too,
digits and symbols
all have their own number.
This agreement that numbers the letters
is called ASCII.
The first table computers made
to handle letters.
The number turns back into 0s and 1s
But think back to last time.
We said a computer stores numbers
as 0s and 1s too.
So the number 65
ends up stored as
eight switches being on or off.
One letter
becomes eight 0s and 1s.
A is number 65, and that becomes eight 0s and 1s.
So one letter usually takes
eight switches,
that is, one byte.
The byte we learned last time
shows up again here.
Letters, in the end, were 0s and 1s too.
Let's open up the table
Let me show you
what the ASCII table
actually looks like.
Letters line up in number order.
Tap one
and you can check its number.
Uppercase A is 65,
lowercase a is 97,
the digit 0 is 48.
Lowercase numbers run higher than uppercase, by 32.
Here's the fun part:
uppercase and lowercase
are exactly 32 apart.
A is 65, a is 97.
Not a coincidence,
they set it up that way
to make things easier later.
Eight switches aren't enough
Eight switches
can represent
256 things.
Fit in the English alphabet, digits, symbols,
and there's still room to spare.
So at first this was enough,
since it was made by English speakers.
From 0 to 255, that's 256.
But the world
isn't only English.
Korean alone has over ten thousand characters,
Chinese has tens of thousands.
Add emoji on top.
256 doesn't even come close.
So a new agreement was needed.
Bigger characters use more bytes
The fix is simple.
If one slot isn't enough,
use several.
English stays one byte,
Korean usually takes three,
emoji take four.
This giant table that numbers
every character in the world
is called Unicode.
How many slots a character takes depends on the character.
So even a single character
can take up different room.
One English letter, one slot,
one Korean letter, three.
That's exactly why the count you see in a message
differs from what the computer counts.
A sentence too is just a line of numbers
Now let's look at a short line
all at once.
Each letter turns into a number,
and that number turns back into 0s and 1s.
Every message we send,
if you look inside,
is a long line of numbers like this.
How a computer sees "Hi".
When letters appear on screen, it runs backward.
It reads the 0s and 1s to find the number,
then looks up that number's letter
in the table and draws it.
To our eyes it's letters,
but inside it's numbers from start to finish.