Loading presentation...

Present Remotely

Send the link below via email or IM

Copy

Present to your audience

Start remote presentation

  • Invited audience members will follow you as you navigate and present
  • People invited to a presentation do not need a Prezi account
  • This link expires 10 minutes after you close the presentation
  • A maximum of 30 users can follow your presentation
  • Learn more about this feature in our knowledge base article

Do you really want to delete this prezi?

Neither you, nor the coeditors you shared it with will be able to recover it again.

DeleteCancel

Make your likes visible on Facebook?

Connect your Facebook account to Prezi and let your likes appear on your timeline.
You can change this under Settings & Account at any time.

No, thanks

What the heck is UTF-8?

An introduction into character encoding for anyone!
by

Andrew Rankin

on 5 August 2011

Comments (0)

Please log in to add your comment.

Report abuse

Transcript of What the heck is UTF-8?

What the heck is UTF-8? And other charater encoding related questions… 01001001001000000110110001101111011101100110010100100000011110010110111101110101 I have no idea what this kid is saying! He's gonna to have to learn ASCII. American Standard Code for Information Interchange 100000110000101000011 ABC ABC

ASCII encoder ASCII decoder But what is ASCII? ASCII code was developed in the days of telegraph to encode:
letters, numbers, symbols (and some other stuff) into simple binary code.
It used 7-bits (ones and zeros) to represent each character. That's my boy! ASCII is #$!?*%@ great… but now I want to travel the world! Be careful out there son, and ahhm, watch your language! !!?? Je m'appelle Amélie, je suis française.
Tu veux du café? I am so s.m.r.t. In Europe you're going to need more bits. Come inside to extend your ASCII! ISO-8859-1 (Latin-1)
Covers most of the other characters needed to express European Lanuages. By adding an extra bit, you can have twice as many characters. Ich bin so Europäischen…
C'est génial. ¿Qué te parece? ¡¿?! ¡Oh, no! I see you're confused. Worry not, for Unicode will help you. I want to go home! Unicode A huge list of all characters known to man, woman and android-kind! Hmm… I knew I was going to need more bits! Start off with basic ASCII… …continue with ISO-8859-1… …then add everything else… Character sets needed so far:
ASCII
ISO-8859-1 Character sets learnt so far:
ASCII
ISO-8859-1 Fully Supported:

Afrikaans
Albanian
Basque
Breton
Catalan
English (UK and US)
Faroese
Galician
German
Icelandic
Irish (new orthography)
Italian
Kurdish (The Kurdish Unified Alphabet)
Latin (basic classical orthography)
Leonese
Luxembourgish (basic classical orthography)
Norwegian (Bokmål and Nynorsk)
Occitan
Portuguese (Portuguese [European] and Brazilian)
Rhaeto-Romanic
Scottish Gaelic
Spanish
Swahili
Swedish
Walloon Almost fully supported:

Danish
Dutch
Estonian
Finnish
French
Hungarian
Irish (traditional orthography)
Latin with macrons
Māori
Welsh HEALTH WARNING
To prevent bloating, you'll need to use a clever way to represent Unicode.
I suggest UTF! Hello! 010010000110010101101100011011000110111100100001 I'm going to need more bits! 48 bits (6 bytes) ISO-8859-1 Raw UNICODE* Hello! 000000000000010010000000000000000110010100000000000001101100000000000000011011000000000000000110111100000000000000100001 120 bits (15 bytes) (All you need is 20-bits) Never really done! Hmm… this Unicode seems like a waste of space! How is it possible to:
represent ALL UNICODE characters
Be backwards compatible with ASCII
Keep file sizes sensible ? 0 1 Rebellious teenager Café C = 67 = 01000011
a = 97 = 01100001
f = 102 = 01100110
é = 233 = 11101001

→ 11000011 10101001 UNICODE Positions UTF-8 Translation 0100001101100001011001101100001110101001 40 bits (5 bytes) Ah, that's better That's the power of UTF-8 Now you know! Just use 8 bits for the original ASCII characters. Then do some really clever stuff to represent all the rest. That'll keep American Dad happy! Universal Character Set (UCS) transformation format - 8 bit Character sets needed so far:
ASCII
ISO-8859-1
ISO 8859-5 (Cyrillic) Character sets needed so far:
ASCII
ISO-8859-1
ISO 8859-5 (Cyrillic)
ISO 8859-7 (Greek) Character sets needed so far:
ASCII
ISO-8859-1
ISO 8859-5 (Cyrillic)
ISO 8859-7 (Greek)
Full transcript