hn-classics/_stories/2001/9549841.md

193 lines
9.6 KiB
Markdown
Raw Permalink Normal View History

---
created_at: '2015-05-15T11:01:20.000Z'
title: Erik Naggum on Attributes in SGML, XML, Lisp (2001)
url: http://www.schnada.de/grapt/eriknaggum-enamel.html
author: networked
points: 72
story_text: ''
comment_text:
num_comments: 54
story_id:
story_title:
story_url:
parent_id:
created_at_i: 1431687680
_tags:
- story
- author_networked
- story_9549841
objectID: '9549841'
2018-06-08 12:05:27 +00:00
year: 2001
---
2018-02-23 18:19:40 +00:00
[Source](http://www.schnada.de/grapt/eriknaggum-enamel.html "Permalink to Erik Naggum on attributes in SGML/XML, Enamel (NML), Lisp")
# Erik Naggum on attributes in SGML/XML, Enamel (NML), Lisp
►● [Impressum][1]
# Erik Naggum on attributes in SGML/XML, Enamel (NML), Lisp
Newsgroups: [comp.lang.lisp][2]
Subject: Re: XML and lisp
From: [Erik Naggum][3] <e...@naggum.net>
Message-ID: <3207626455633924@naggum.net>
Organization: Naggum Software, Oslo, Norway
User-Agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7
Date: Fri, 24 Aug 2001 07:21:01 GMT
NNTP-Posting-Date: Fri, 24 Aug 2001 09:21:01 MET DST
* Tim Bradshaw <t...@tfeb.org>
> ((:reply :title "Lisp is not just a programming language")
> (:body
> (:p "It is also a text-markup language,
> and many other things, as you can see here"
> "For instance with a suitable (small) macro, this is quite legal
> Lisp syntax, which is compiled to *ML. I have written significantly-sized
> documents in this notation."))
> (:signature "--tim"))
As long as we think aloud in alternative syntaxes, I actually prefer to
break the _incredibly_ stupid syntactic-only separation of elements and
attribute values. SGML and its descendants have made a crucial mistake:
For every level of container (there are about 7 of them), there is a new
syntax for _two_ properties of the container: (1) the contents is wrapped
in one syntax, but (2) the "writing on the box" is in quite another.
This means that information and meta-information are massively different
concepts, and this artificial separation runs through the whole SGML
design. Each level offers a new way to write the two differently. This
is what makes it so goddamn hard to reason about SGML documents and to do
reasonably intelligent transformations on them without working your butt
off specifying all sorts of irrelevant stuff that does _nothing_ but get
in your way.
I have come to _loathe_ the half-assed hybrid that some XML-in-Lisp tools
use and produce, because it makes XML just as evil in Lisp as it was in
XML to begin with, and we have gained absolutely nothing in either power
of processing or in abstraction, which is so very un-Lisp-like.
<foo bar="zot">quux</foo>
should be read as
(foo (bar "zot") "quux")
and most definitely _NOT_ as ((:foo :bar "zot") "quux"), which turns this
fairly reasonable structure into a morass of complexity worse than it was
to begin with. And it does _NOT_ help to represent empty elements only
with a keyword. Using three different levels of nesting to represent a
single concept is Just Plain Wrong. Also, using keywords is not a good
idea because there needs to be a lot of related information associated
with elements and attributes, in different contexts, not to mention all
the things they do with their funny "namespaces" these days.
Whether something is an attribute or element is _completely_ arbitrary.
It is based on some arbitrary choices in the design process that reveal
absolutely no inherent qualities. For purely pragmatic reasons, SGML
folks will use attributes for some things and elements for others because
their tools can deal with some things in attributes and some things in
elements. The faulty idea that attributes say something "about" the
element and sub-elements somehow constitute be their contents is the same
premature structuring that premature optimization of code suffers from.
The whole language is incredibly misdesigned in making that distinction.
As for writing SGML/XML/HTML/whatever, I have a simple way to get rid of
the annoying verbosity of these stupid languages while _retaining_ that
mistake between attribute values and elements, because it is quite hard
to make simple regular expression-based conversions retain enough data
about an element to decide what should be attribute and element. An
element has the form <name [attributes] | [contents]>. Attribute have
the form <name | value>. Internal whitespace is only for readability.
XML Enamel (NML) CL
<foo/> <foo> (foo)
<foo bar="zot"/> <foo <bar|zot>> (foo (bar "zot"))
<foo>zot</foo> <foo|zot> (foo "zot")
<foo bar="zot">quux</foo> <foo <bar|zot> |quux> (foo (bar "zot") "quux")
<foo>Hey, &quux;!</foo> <foo|Hey, [quux]!> (foo "Hey, " quux "!")
<foo>AT&amp;T you will</foo> <foo|AT&T you will> (foo "AT&T you will")
<foo><bar>zot</bar></foo> <foo|<bar|zot>> (foo (bar "zot"))
So I have almost none of the annoying and arbitrary quote/escape mania in
attribute values or contents alike, either. Entities I write as [name],
and they end up in the Lisp version as symbols if not the character they
represent purely for syntactic reasons. Writing "code" in this language
is actually amazingly painless compared to the produced noise. Besides,
with a few simple modify-syntax-entry calls in Emacs, I get < and > to
match and blink and I can move up and down the structure very easily.
For processing this stuff in Common Lisp, it is _sometimes_ neat to
convert the single | attribute/content marker into the zero-length
symbol, ||, so pathological cases like
<foo bar="zot"><bar>"zot"</bar></foo>
which could have been written like this to show how arbitrary the
syntactic disctinction in SGML/XML is
<foo <bar|zot>|<bar|zot>>
come out as
(foo (bar "zot") || (bar "zot"))
The really interesting thing is that writing in Enamel and producing XML
is so easy that a simple Perl or Lisp function that takes an Enamel
string as argument and produces XML is quite simple and straight-
forward. This makes for some interesting-looking "scripting" that blows
the mind of the miserable little wrecks that think they have to type the
endtag, the quotes and all the other user-inimical features of SGML/XML.
In my personal view, Lisp "markup" has the disadvantage of needing lots
of quotes, while Enamel has the strong advantage that in <xxx|yyy>, xxx
is always symbolic and yyy is always a string of characters subject to
interpretation by whatever the symbolic part instructs in context.
Since the key feature of markup languages is the separation of text from
markup, the simple idea in Enamel should carry enough force to make this
a fully realizable goal without making an artificial syntactic separation
between information and meta-information at any level. If the syntax is
good enough for the information, it should be good enough for the meta-
information, and I think Enamel is. Fortunately, I do not have to create
a whole new international following and engage in godawful politics to
use a better syntax for XML and the like, since XML and the like are only
used as interchange syntaxes these days. Nobody in their right mind
actuslly writes anything by hand in such stupid languages that require so
much attention to incredibly insignificant details and incomprehensibly
irrelevant redundancy, anyway, do they? :)
Finally, note that in Enamel, a complete element is enclosed in <...> and
that means it can be subject to a nice little Common Lisp reader macro,
and it can be taught to recognize other stuff, as well, such as the neat
concept of interpolating expression values where {expression} occurs.
Still at "internal use" stage, I plan to publish some stuff about Enamel
not too far into the future.
///
maintained by [MrSchnada][4] <webmaster at schnada de> zorglub 2017-06-02 00:52Z
[ [Up][5] | [Top][6] | [Contents][7] | [Endorse][8] | [Donate][9] | [Contact][4] | [Disclosure][10] ]
{ Check [markup][11] ([*][12]) | [links][13] ([*][14]) | [style][15] }
[1]: http://www.schnada.de/impressum.html "legal notice / disclosure"
[2]: http://groups.google.com/group/comp.lang.lisp/msg/4917ba734ce860c4
[3]: http://naggum.no/
[4]: http://www.schnada.de/contact.html
[5]: http://www.schnada.de/quotes/contempt.html
[6]: http://www.schnada.de/index.html
[7]: http://www.schnada.de/index.html#conts
[8]: http://www.schnada.de/hylin/colsa.html#indoors
[9]: http://www.schnada.de/bilders/scan/baalpfig.html
[10]: http://www.schnada.de/impressum.html
[11]: http://validator.w3.org/check?uri=referer
[12]: http://www.htmlhelp.org/cgi-bin/validate.cgi?url=http%3A%2F%2Fwww.schnada.de%2F&warnings=yes&spider=yes
[13]: http://validator.w3.org/checklink?uri=https%3A%2F%2Fwww.schnada.de%2Fgrapt%2Feriknaggum-enamel.html&hide_type=all&depth=&check=Check
[14]: http://www.htmlhelp.org/tools/valet/linktest.cgi?url=http%3A%2F%2Fwww.schnada.de%2F&date=2006-01-01&type=Full
[15]: http://jigsaw.w3.org/css-validator/validator?uri=http%3A%2F%2Fwww.schnada.de%2Fsty%2Fstengel.css&warning=2&profile=css2