hn-classics/_stories/2003/8648541.md

27 lines
19 KiB
Markdown
Raw Permalink Normal View History

---
created_at: '2014-11-23T12:06:50.000Z'
title: UTF-8 history (2003)
url: https://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt
author: olalonde
points: 55
story_text: ''
comment_text:
num_comments: 7
story_id:
story_title:
story_url:
parent_id:
created_at_i: 1416744410
_tags:
- story
- author_olalonde
- story_8648541
objectID: '8648541'
2018-06-08 12:05:27 +00:00
year: 2003
---
2018-02-23 18:19:40 +00:00
[Source](https://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt "Permalink to ")
Subject: UTF-8 history From: "Rob 'Commander' Pike" Date: Wed, 30 Apr 2003 22:32:32 -0700 (Thu 06:32 BST) To: mkuhn (at) acm.org, henry (at) spsystems.net Cc: ken (at) entrisphere.com Looking around at some UTF-8 background, I see the same incorrect story being repeated over and over. The incorrect version is: 1\. IBM designed UTF-8. 2\. Plan 9 implemented it. That's not true. UTF-8 was designed, in front of my eyes, on a placemat in a New Jersey diner one night in September or so 1992. What happened was this. We had used the original UTF from ISO 10646 to make Plan 9 support 16-bit characters, but we hated it. We were close to shipping the system when, late one afternoon, I received a call from some folks, I think at IBM - I remember them being in Austin \- who were in an X/Open committee meeting. They wanted Ken and me to vet their FSS/UTF design. We understood why they were introducing a new design, and Ken and I suddenly realized there was an opportunity to use our experience to design a really good standard and get the X/Open guys to push it out. We suggested this and the deal was, if we could do it fast, OK. So we went to dinner, Ken figured out the bit-packing, and when we came back to the lab after dinner we called the X/Open guys and explained our scheme. We mailed them an outline of our spec, and they replied saying that it was better than theirs (I don't believe I ever actually saw their proposal; I know I don't remember it) and how fast could we implement it? I think this was a Wednesday night and we promised a complete running system by Monday, which I think was when their big vote was. So that night Ken wrote packing and unpacking code and I started tearing into the C and graphics libraries. The next day all the code was done and we started converting the text files on the system itself. By Friday some time Plan 9 was running, and only running, what would be called UTF-8. We called X/Open and the rest, as they say, is slightly rewritten history. Why didn't we just use their FSS/UTF? As I remember, it was because in that first phone call I sang out a list of desiderata for any such encoding, and FSS/UTF was lacking at least one - the ability to synchronize a byte stream picked up mid-run, with less that one character being consumed before synchronization. Becuase that was lacking, we felt free - and were given freedom - to roll our own. I think the "IBM designed it, Plan 9 implemented it" story originates in RFC2279. At the time, we were so happy UTF-8 was catching on we didn't say anything about the bungled history. Neither of us is at the Labs any more, but I bet there's an e-mail thread in the archive there that would support our story and I might be able to get someone to dig it out. So, full kudos to the X/Open and IBM folks for making the opportunity happen and for pushing it forward, but Ken designed it with me cheering him on, whatever the history books say. -rob Date: Sat, 07 Jun 2003 18:44:05 -0700 From: "Rob `Commander' Pike" To: Markus Kuhn cc: henry (at) spsystems.net, ken (at) entrisphere.com, Greger Leijonhufvud Subject: Re: UTF-8 history I asked Russ Cox to dig through the archives. I have attached his message. I think you'll agree it supports the story I sent earlier. The mail we sent to X/Open (I believe Ken did the editing and mailing of that document) includes a new desideratum #6 about discovering character boundaries. We'll never know how much the original X/Open proposal influenced us; the two proposals are very different but do share some characteristics. I don't remember looking at it in detail, but it was a long time ago. I very clearly remember Ken writing on the placemat and wished we had kept it! -rob From: Russ Cox To: r (at) google.com Subject: utf digging Date-Sent: Saturday, June 07, 2003 7:46 PM -0400 bootes's /sys/src/libc/port/rune.c changed from the division-heavy old utf on sep 4 1992. the version that made it into the dump is dated 19:51:55. it was commented the next day but otherwise remained unchanged until nov 14 1996, when runelen was sped up by inspecting the