hn-classics/_stories/2003/8648541.md

---
created_at: '2014-11-23T12:06:50.000Z'
title: UTF-8 history (2003)
url: https://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt
author: olalonde
points: 55
story_text: ''
comment_text: 
num_comments: 7
story_id: 
story_title: 
story_url: 
parent_id: 
created_at_i: 1416744410
_tags:
- story
- author_olalonde
- story_8648541
objectID: '8648541'
year: 2003

---
[Source](https://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt "Permalink to ")

Subject: UTF-8 history From: "Rob 'Commander' Pike"  Date: Wed, 30 Apr 2003 22:32:32 -0700 (Thu 06:32 BST) To: mkuhn (at) acm.org, henry (at) spsystems.net Cc: ken (at) entrisphere.com Looking around at some UTF-8 background, I see the same incorrect story being repeated over and over. The incorrect version is: 1\. IBM designed UTF-8. 2\. Plan 9 implemented it. That's not true. UTF-8 was designed, in front of my eyes, on a placemat in a New Jersey diner one night in September or so 1992. What happened was this. We had used the original UTF from ISO 10646 to make Plan 9 support 16-bit characters, but we hated it. We were close to shipping the system when, late one afternoon, I received a call from some folks, I think at IBM - I remember them being in Austin \- who were in an X/Open committee meeting. They wanted Ken and me to vet their FSS/UTF design. We understood why they were introducing a new design, and Ken and I suddenly realized there was an opportunity to use our experience to design a really good standard and get the X/Open guys to push it out. We suggested this and the deal was, if we could do it fast, OK. So we went to dinner, Ken figured out the bit-packing, and when we came back to the lab after dinner we called the X/Open guys and explained our scheme. We mailed them an outline of our spec, and they replied saying that it was better than theirs (I don't believe I ever actually saw their proposal; I know I don't remember it) and how fast could we implement it? I think this was a Wednesday night and we promised a complete running system by Monday, which I think was when their big vote was. So that night Ken wrote packing and unpacking code and I started tearing into the C and graphics libraries. The next day all the code was done and we started converting the text files on the system itself. By Friday some time Plan 9 was running, and only running, what would be called UTF-8. We called X/Open and the rest, as they say, is slightly rewritten history. Why didn't we just use their FSS/UTF? As I remember, it was because in that first phone call I sang out a list of desiderata for any such encoding, and FSS/UTF was lacking at least one - the ability to synchronize a byte stream picked up mid-run, with less that one character being consumed before synchronization. Becuase that was lacking, we felt free - and were given freedom - to roll our own. I think the "IBM designed it, Plan 9 implemented it" story originates in RFC2279. At the time, we were so happy UTF-8 was catching on we didn't say anything about the bungled history. Neither of us is at the Labs any more, but I bet there's an e-mail thread in the archive there that would support our story and I might be able to get someone to dig it out. So, full kudos to the X/Open and IBM folks for making the opportunity happen and for pushing it forward, but Ken designed it with me cheering him on, whatever the history books say. -rob Date: Sat, 07 Jun 2003 18:44:05 -0700 From: "Rob `Commander' Pike"  To: Markus Kuhn  cc: henry (at) spsystems.net, ken (at) entrisphere.com, Greger Leijonhufvud  Subject: Re: UTF-8 history I asked Russ Cox to dig through the archives. I have attached his message. I think you'll agree it supports the story I sent earlier. The mail we sent to X/Open (I believe Ken did the editing and mailing of that document) includes a new desideratum #6 about discovering character boundaries. We'll never know how much the original X/Open proposal influenced us; the two proposals are very different but do share some characteristics. I don't remember looking at it in detail, but it was a long time ago. I very clearly remember Ken writing on the placemat and wished we had kept it! -rob From: Russ Cox  To: r (at) google.com Subject: utf digging Date-Sent: Saturday, June 07, 2003 7:46 PM -0400 bootes's /sys/src/libc/port/rune.c changed from the division-heavy old utf on sep 4 1992. the version that made it into the dump is dated 19:51:55. it was commented the next day but otherwise remained unchanged until nov 14 1996, when runelen was sped up by inspecting the
Tufte CSS, upgrade jekyll, add metadata, index 2018-02-23 18:58:03 +00:00			`---`
			`created_at: '2014-11-23T12:06:50.000Z'`
			`title: UTF-8 history (2003)`
			`url: https://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt`
			`author: olalonde`
			`points: 55`
			`story_text: ''`
			`comment_text:`
			`num_comments: 7`
			`story_id:`
			`story_title:`
			`story_url:`
			`parent_id:`
			`created_at_i: 1416744410`
			`_tags:`
			`- story`
			`- author_olalonde`
			`- story_8648541`
			`objectID: '8648541'`
Adds year as metadata to all stories 2018-06-08 12:05:27 +00:00			`year: 2003`
Tufte CSS, upgrade jekyll, add metadata, index 2018-02-23 18:58:03 +00:00
			`---`
Initial commit 2018-02-23 18:19:40 +00:00			`[Source](https://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt "Permalink to ")`

			Subject: UTF-8 history From: "Rob 'Commander' Pike" Date: Wed, 30 Apr 2003 22:32:32 -0700 (Thu 06:32 BST) To: mkuhn (at) acm.org, henry (at) spsystems.net Cc: ken (at) entrisphere.com Looking around at some UTF-8 background, I see the same incorrect story being repeated over and over. The incorrect version is: 1\. IBM designed UTF-8. 2\. Plan 9 implemented it. That's not true. UTF-8 was designed, in front of my eyes, on a placemat in a New Jersey diner one night in September or so 1992. What happened was this. We had used the original UTF from ISO 10646 to make Plan 9 support 16-bit characters, but we hated it. We were close to shipping the system when, late one afternoon, I received a call from some folks, I think at IBM - I remember them being in Austin \- who were in an X/Open committee meeting. They wanted Ken and me to vet their FSS/UTF design. We understood why they were introducing a new design, and Ken and I suddenly realized there was an opportunity to use our experience to design a really good standard and get the X/Open guys to push it out. We suggested this and the deal was, if we could do it fast, OK. So we went to dinner, Ken figured out the bit-packing, and when we came back to the lab after dinner we called the X/Open guys and explained our scheme. We mailed them an outline of our spec, and they replied saying that it was better than theirs (I don't believe I ever actually saw their proposal; I know I don't remember it) and how fast could we implement it? I think this was a Wednesday night and we promised a complete running system by Monday, which I think was when their big vote was. So that night Ken wrote packing and unpacking code and I started tearing into the C and graphics libraries. The next day all the code was done and we started converting the text files on the system itself. By Friday some time Plan 9 was running, and only running, what would be called UTF-8. We called X/Open and the rest, as they say, is slightly rewritten history. Why didn't we just use their FSS/UTF? As I remember, it was because in that first phone call I sang out a list of desiderata for any such encoding, and FSS/UTF was lacking at least one - the ability to synchronize a byte stream picked up mid-run, with less that one character being consumed before synchronization. Becuase that was lacking, we felt free - and were given freedom - to roll our own. I think the "IBM designed it, Plan 9 implemented it" story originates in RFC2279. At the time, we were so happy UTF-8 was catching on we didn't say anything about the bungled history. Neither of us is at the Labs any more, but I bet there's an e-mail thread in the archive there that would support our story and I might be able to get someone to dig it out. So, full kudos to the X/Open and IBM folks for making the opportunity happen and for pushing it forward, but Ken designed it with me cheering him on, whatever the history books say. -rob Date: Sat, 07 Jun 2003 18:44:05 -0700 From: "Rob `Commander' Pike" To: Markus Kuhn cc: henry (at) spsystems.net, ken (at) entrisphere.com, Greger Leijonhufvud Subject: Re: UTF-8 history I asked Russ Cox to dig through the archives. I have attached his message. I think you'll agree it supports the story I sent earlier. The mail we sent to X/Open (I believe Ken did the editing and mailing of that document) includes a new desideratum #6 about discovering character boundaries. We'll never know how much the original X/Open proposal influenced us; the two proposals are very different but do share some characteristics. I don't remember looking at it in detail, but it was a long time ago. I very clearly remember Ken writing on the placemat and wished we had kept it! -rob From: Russ Cox To: r (at) google.com Subject: utf digging Date-Sent: Saturday, June 07, 2003 7:46 PM -0400 bootes's /sys/src/libc/port/rune.c changed from the division-heavy old utf on sep 4 1992. the version that made it into the dump is dated 19:51:55. it was commented the next day but otherwise remained unchanged until nov 14 1996, when runelen was sped up by inspecting the