jostein.kjønigsen.net

Got Unicode?

Let's get this straight: I'm a unicode-whore, I'm proud of it, and I think I have plenty of reasons to.

Here's our scenario

One planet, one internet, a gazzilion different nations, god knows how many languages, and unfortunately, more character encodings than any sane person can memorize. Does this make any sense?

A few examples

As long as you stick to a single language only while on the net, this shouldn't present any problems what so ever. The troubles come when you try to combine different languages.

Want to quote a german person on a norwegian page? I'm sure there are a character encoding that allows for this. Out of the blue I would assume ISO-1-8859-4.

Want that funky, russian backwards "Я" somewhere in your page? Look up the character encoding for Cyrillic languages. You'll need it.

Want to make a fansite for Yaguchi Mari (矢口真里)? Im sure ISO 3166-1, JIS or SJIS encoding or whatever you choose will make that possible. Oh. Your surname is "Kjønigsen"? Forget about it. Can't be done.

At least not without unicode. By now I hope you get the point.

My point

When this planet has one internet which we all use to communicate, having more than one character encoding is just stupid.

Yes, I said it: Stupid. If you are one of the people who insist on using some native character-encoding, you are one of the persons holding the net back. A bold statement, sure, but I stand by it.

To keep using something antiquated because it hasn't caused troubles yet, ain't exactly looking forward.

Ever tried to communicate with people in different languages over IRC? Then you know what I mean already.

The cure

UTF-8 is a character encoding for unicode that's pretty close to plain ASCII. It allows you to encode symbols from any language with the same encoding.

The reason I can have norwegian characters, russian characters and japanese characters all on the same page here? It's called UTF-8.

You may not need it on your site, but seriously. One net, one encoding. It makes sense, doesn't it?

So stop using whatever silly native character encoding you are using and help bring the net one step further. Get on the unicode bandwagon already.

For users of Linux and other *nix'es out there, some dutch guy called Fruit has made an excellent guide, which should get you going in no time.

It's specific to Debian, but most of the stuff worked fine on my FreeBSD machine.

Thanks to

Various reasons

Misc stuff