nemo/hn-classics

Fork 0

Nemo 2739fd872f Tufte CSS, upgrade jekyll, add metadata, index

2018-02-24 00:28:03 +05:30

55 KiB

Raw Blame History

created_at

title

url

author

points

story_text

comment_text

num_comments

story_id

story_title

story_url

parent_id

created_at_i

_tags

objectID

2016-02-08T06:53:40.000Z

What's new in purely functional data structures since Okasaki? (2010)

http://cstheory.stackexchange.com/questions/1539/whats-new-in-purely-functional-data-structures-since-okasaki

r4um

154

1454914420

story

author_r4um

story_11056704

11056704

Source

reference request - What's new in purely functional data structures since Okasaki? - Theoretical Computer Science Stack Exchange

Stack Exchange Network

Stack Exchange network consists of 172 Q&A communities including Stack Overflow, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Visit Stack Exchange

current community

help chat

[

Theoretical Computer Science ]4

[

Theoretical Computer Science Meta ]7

your communities

more stack exchange communities

company blog

Theoretical Computer Science

Theoretical Computer Science Stack Exchange is a question and answer site for theoretical computer scientists and researchers in related fields. Join them; it only takes a minute:

Here's how it works:

Anybody can ask a question
Anybody can answer
The best answers are voted up and rise to the top

What's new in purely functional data structures since Okasaki?

| ----- | |

up vote 528 down vote favorite

689

Since Chris Okasaki's 1998 book "Purely functional data structures", I haven't seen too many new exciting purely functional data structures appear; I can name just a few:

IntMap (also invented by Okasaki in 1998, but not present in that book)
Finger trees (and their generalization over monoids)

There are also some interesting ways of implementing already known datastructures, such as using "nested types" or "generalized algebraic datatypes" to ensure tree invariants.

Which other new ideas have appeared since 1998 in this area?

reference-request big-list ds.data-structures functional-programming

share|cite|improve this question

asked Sep 21 '10 at 8:04

jkff

4,19431631

|
| |

Nice question. I just had a student asking me about this, and didn't know the answer. – Suresh Venkat♦ Sep 21 '10 at 8:07

This is OK for here, but you might get better answers on Stack Overflow. If you ask there, be sure and link to the discussion here. – Charles Stewart Sep 21 '10 at 13:01
3

Well the Haskell Reddit has seen this so there will be some good answers coming in from there too but excellent question. Just being halfway through Okasaki's book I was wondering the same think myself. +1 – Robert Massaioli Sep 21 '10 at 22:52

This inspired me to ask a related question. – jbapple Sep 22 '10 at 18:44

Okasaki's blog post from 2008 for the 10th anniversary of the book – Kaveh Jan 4 '16 at 12:33

| show 1 more comment

6 Answers 6

active oldest votes

| ----- | |

up vote 528 down vote accepted

New purely functional data structures published since 1998:

2001: **Ideal Hash Trees, and its 2000 predecessor, Fast And Space Efficient Trie Searches, by Phil Bagwell**: Apparently used as a fundamental building block in Clojure's standard library.
2001: **A Simple Implementation Technique for Priority Search Queues, by Ralf Hinze**: a really simple and beautiful technique for implementing this important datastructure (useful, say, in the Dijkstra algorithm). The implementation is particularly beautiful and readable due to heavy use of "view patterns".
2002: **Bootstrapping one-sided flexible arrays, by Ralf Hinze**: Similar to Okasaki's random-access lists, but they can be tuned to alter the time tradeoff between cons and indexing.
2003: **New catenable and non-catenable deques, by Radu Mihaescu and Robert Tarjan**: A new take on some older work (by Kaplan and Tarjan) that Okasaki cites (The most recent version of Kaplan & Tarjan's work was published in 2000). This version is simpler in some ways.
2005: Maxiphobic heaps (paper and code), by Chris Okasaki: Presented not as a new, more efficient structure, but as a way to teach priority queues.
2006: **Purely Functional Worst Case Constant Time Catenable Sorted Lists, by Gerth Stølting Brodal, Christos Makris, and Kostas Tsichlas**: Answers an outstanding question of Kaplan and Tarjan by demonstrating a structure with O(lg n) insert, search, and delete and O(1) concat.
2008: **Confluently Persistent Tries for Efficient Version Control, by Erik D. Demaine, Stefan Langerman, and Eric Price**: Presents several data structures for tries that have efficient navigation and modification near the leaves. Some are purely functional. Others actually improve a long-standing data structure by Dietz et al. for fully persistent (but not confluently persistent or purely functional) arrays. This paper also presente purely functional link-cut trees, sometimes called "dynamic trees".
2010: **A new purely functional delete algorithm for red-black trees, by Matt Might**: Like Okasaki's red-black tree insertion algorithm, this is not a new data structure or a new operation on a data structure, but a new, simpler way to write a known operation.
2012: **RRB-Trees: Efficient Immutable Vectors, by Phil Bagwell and Tiark Rompf**: An extension to Hash Array Mapped Tries, supporting immutable vector concatenation, insert-at, and split in O(lg n) time, while maintaining the index, update, and insertion speeds of the original immutable vector.

Known in 1997, but not discussed in Okasaki's book:

Many other styles of balanced search tree. AVL, brother, rank-balanced, bounded-balance, and many other balanced search trees can be (and have been) implemented purely functionally by path copying. Perhaps deserving special mention are:
- **Biased Search Trees, by Samuel W. Bent, Daniel D. Sleator, and Robert E. Tarjan**: A key element in Brodal et al.'s 2006 paper and Demaine et al.'s 2008 paper.
**Infinite sets that admit fast exhaustive search, by Martín Escardó**: Perhaps not a data structure per se.
**Three algorithms on Braun Trees, by Chris Okasaki**: Braun trees offer many stack operations in worst-case O(lg n). This bound is surpassed by many other data structures, but Braun trees have a cons operation lazy in its second argument, and so can be used as infinite stacks in some ways that other structures cannot.
**The relaxed min-max heap: A mergeable double-ended priority queue and The KD heap: An efficient multi-dimensional priority queue, by Yuzheng Ding and Mark Allen Weiss**: These happen to be purely functional, though this is not discussed in the papers. I do not think the time bounds achieved are any better than those that can be achieved by using finger trees (of Hinze & Paterson or Kaplan & Tarjan) as k-dimensional priority queues, but I think the structures of Ding & Weiss uses less space.
**The Zipper, by Gérard Huet**: Used in many other data structures (such as Hinze & Paterson's finger trees), this is a way of turning a data structure inside-out.
Difference lists are O(1) catenable lists with an O(n) transformation to usual cons lists. They have apparently been known since antiquity in the Prolog community, where they have an O(1) transformation to usual cons lists. The O(1) transformation seems to be impossible in traditional functional programming, but Minamide's hole abstraction, from POPL '98, discusses a way of allowing O(1) append and O(1) transformation within pure functional programming. Unlike the usual functional programming implementations of difference lists, which are based on function closures, hole abstractions are essentially the same (in both their use and their implementation) as Prolog difference lists. However, it seems that for years the only person that noticed this was one of Minamide's reviewers.
Uniquely represented dictionaries support insert, update, and lookup with the restriction that no two structures holding the same elements can have distinct shapes. To give an example, sorted singly-linked lists are uniquely represented, but traditional AVL trees are not. Tries are also uniquely represented. Tarjan and Sundar, in "Unique binary search tree representations and equality-testing of sets and sequences", showed a purely functional uniquely represented dictionary that supports searches in logarithmic time and updates in O(sqrt{n}) time. However, it uses Theta(n lg n) space. There is a simple representation using Braun trees that uses only linear space but has update time of Theta(sqrt{n lg n}) and search time of Theta(lg^2 n)

Mostly functional data structures, before, during, and after Okasaki's book:

Many procedures for making data structures persistent, fully persistent, or confluently persistent: Haim Kaplan wrote an excellent survey on the topic. See also above the work of Demaine et al., who demonstrate a fully persistent array in O(m) space (where m is the number of operations ever performed on the array) and O(lg lg n) expected access time.
1989: **Randomized Search Trees by Cecilia R. Aragon and Raimund Seidel**: These were discussed in a purely functional setting by Guy E. Blelloch and Margaret Reid-Miller in Fast Set Operations Using Treaps and by Dan Blandford and Guy Blelloch in Functional Set Operations with Treaps (code). They provide all of the operations of purely functional fingertrees and biased search trees, but require a source of randomness, making them not purely functional. This may also invalidate the time complexity of the operations on treaps, assuming an adversary who can time operations and repeat the long ones. (This is the same reason why imperative amortization arguments aren't valid in a persistent setting, but it requires an adversary with a stopwatch)
1997: **Skip-trees, an alternative data structure to Skip-lists in a concurrent approach, by Xavier Messeguer and Exploring the Duality Between Skip Lists and Binary Search Trees, by Brian C. Dean and Zachary H. Jones**: Skip lists are not purely functional, but they can be implemented functionally as trees. Like treaps, they require a source of random bits. (It is possible to make skip lists deterministic, but, after translating them to a tree, I think they are just another way of looking at 2-3 trees.)
1998: All of the amortized structures in Okasaki's book! Okasaki invented this new method for mixing amortization and functional data structures, which were previously thought to be incompatible. It depends upon memoization, which, as Kaplan and Tarjan have sometimes mentioned, is actually a side effect. In some cases (such as PFDS on SSDs for performance reasons), this may be inappropriate.
1998: **Simple Confluently Persistent Catenable Lists, by Haim Kaplan, Chris Okasaki, and Robert E. Tarjan**: Uses modification under the hood to give amortized O(1) catenable deques, presenting the same interface as an earlier (purely functional, but with memoization) version appearing in Okasaki's book. Kaplan and Tarjan had earlier created a purely functional O(1) worst-case structure, but it is substantially more complicated.
2007: As mentioned in another answer on this page, semi-persistent data structures and persistent union-find by Sylvain Conchon and Jean-Christophe Filliâtre

Techniques for verifying functional data structures, before, during, and after Okasaki's book:

Phantom types are an old method for creating an API that does not allow certain ill-formed operations. A sophisticated use of them can be found in Oleg Kiselyov and Chung-chieh Shan's Lightweight Static Capabilities.
Nested types are not actually more recent than 1998 - Okasaki even uses them in his book. There are many other examples that are not in Okasaki's book; some are new, and some are old. They include:
- Stefan Kahrs's Red-black trees with types (code)
- Ross Paterson's AVL trees (mirror)
- Chris Okasaki's From fast exponentiation to square matrices: an adventure in types
- Richard S. Bird and Ross Peterson's de Bruijn notation as a nested datatype
- Ralf Hinze's Numerical Representations as Higher-Order Nested Datatypes.
GADTs are not all that new, either. They are a recent addition to Haskell and some MLs, but they have been present, I think, in various typed lambda calculi since the 1970s.
2004-2010: Coq and Isabelle for correctness. Several people have used theorem provers to verify the correctness of purely functional data structures. Coq can extract these verifications to working code in Haskell, OCaml, and Scheme; Isabelle can extract to Haskell, ML, and OCaml.
- Coq:
  - Pierre Letouzey and Jean-Christophe Filliâtre formalized red-black and AVL(ish) trees, finding a bug in the OCaml standard library in the process.
  - I formalized Brodal and Okasaki's asymptotically optimal priority queues.
  - Arthur Charguéraud formalized 825 of the 1,700 lines of ML in Okasaki's book.
- Isabelle:
  - Tobias Nipkow and Cornelia Pusch formalized AVL trees.
  - Viktor Kuncak formalized unbalanced binary search trees.
  - Peter Lammich published The Isabelle Collections framework, which includes formalizations of efficient purely functional data structures like red-black trees and tries, as well as data structures that are less efficient when used persistently, such as two-stack-queues (without Okasaki's laziness trick) and hash tables.
  - Peter Lammich also published formalizations of tree automata, Hinze & Patterson's finger trees (with Benedikt Nordhoff and Stefan Körner), and Brodal and Okasaki's purely functional priority queues (with Rene Meis and Finn Nielsen).
  - René Neumann formalized binomial priority queues.
2007: Refined Typechecking with Stardust, by Joshua Dunfield: This paper uses refinement types for ML to find errors in SMLNJ's red-black tree delete function.
2008: Lightweight Semiformal Time Complexity Analysis for Purely Functional Data Structures by Nils Anders Danielsson: Uses Agda with manual annotation to prove time bounds for some PFDS.

The Soft Heap: An Approximate Priority Queue with Optimal Error Rate, by Bernard Chazelle: This data structure does not use arrays, and so has tempted first the #haskell IRC channel and later Stack Overflow users, but it includes delete in o(lg n), which is usually not possible in a functional setting, and imperative amortized analysis, which is not valid in a purely functional setting.
Balanced binary search trees with O(1) finger updates. In Making Data Structures Persistent, James R Driscoll, Neil Sarnak, Daniel D. Sleator, and Robert E. Tarjan present a method for grouping the nodes in a red-black tree so that persistent updates require only O(1) space. The purely functional deques and finger trees designed by Tarjan, Kaplan, and Mihaescu all use a very similar grouping technique to allow O(1) updates at both ends. AVL-trees for localized search by Athanasios K. Tsakalidis works similarly.
Faster pairing heaps or better bounds for pairing heaps: Since Okasaki's book was published, several new analyses of imperative pairing heaps have appeared, including Pairing heaps with O(log log n) decrease Cost by Amr Elmasry and Towards a Final Analysis of Pairing Heaps by Seth Pettie. It may be possible to apply some of this work to Okasaki's lazy pairing heaps.
Deterministic biased finger trees: In Biased Skip Lists, by Amitabha Bagchi, Adam L. Buchsbaum, and Michael T. Goodrich, a design is presented for deterministic biased skip lists. Through the skip list/tree transformation mentioned above, it may be possible to make deterministic biased search trees. The finger biased skip lists described by John Iacono and Özgür Özkan in Mergeable Dictionaries might then be possible on biased skip trees. A biased finger tree is suggested by Demaine et al. in their paper on purely functional tries (see above) as a way to reduce the time-and space bounds on finger update in tries.
The String B-Tree: A New Data Structure for String Search in External Memory and its Applications by Paolo Ferragina and Roberto Grossi is a well studied data structure combining the benefits of tries and B-trees.

share|cite|improve this answer

edited May 23 '17 at 11:33

community wiki

[ 41 revs, 5 users 94%
jbapple ]118

|
| |

I don't remember checking the "community wiki" box on this answer. Is there any way to undo that? – jbapple Sep 22 '10 at 20:10

@jbapple: after a certain number of edits, all posts become community wiki. That's an impressively thorough review there. Thank you. – Novelocrat Sep 23 '10 at 3:07

Great list! Which makes me wish Okasaki would publish a second edition. – Radu GRIGore Sep 30 '10 at 10:33

Note that Isabelle/HOL can generate code for SML, OCaml, Haskell, Scala. The Haskabelle tool can also import Haskell into Isabelle/HOL. – Makarius Mar 4 '13 at 11:31

The terminology of "programm extraction" is one of Coq: you take a constructive proof and make an executable program from it, stripping away some things. In Isabelle this is called "code generation" and works differently, using the HOL specifications as pseudo-code, not the proofs. Proof extraction in Isabelle/HOL according to Berghofer does work like Coq, but is rarely used these days. – Makarius Mar 4 '13 at 11:34

| show 6 more comments

| ----- | |

up vote 59 down vote

To the excellent notes already made, I'll add Zippers.

Huet, Gerard. "Functional Pearl: The Zipper" Journal of Functional Programming 7 (5): 549-554, September 1997.

Wikipedia: Zipper (data structure)

share|cite|improve this answer

edited Nov 13 '13 at 3:11

Rory O'Kane

1033

answered Sep 21 '10 at 18:06

Matt Might

81956

|
| |

Zippers are AWESOME. For many use cases, they allow tree based representations to become the "right" choice for many kinds of data where otherwise it'd be a bit more complicated – Carter Tazio Schonwald Dec 1 '10 at 22:52

An example of their use for XML manipulation: anti-xml.org/zippers.html – Mechanical snail Aug 22 '12 at 20:29

add a comment |

| ----- | |

up vote 36 down vote

Conchon, Filliatre, A Persistent UNION-FIND Data Structure and Semi-persistent Data Structures.

share|cite|improve this answer

edited Apr 26 '14 at 17:07

Huge

1033

answered Sep 21 '10 at 8:13

Radu GRIGore

3,9012265

|
| |

Wow, a persistent UNION-FIND! Thanks! – jkff Sep 21 '10 at 8:17
3

Well, kind of... See the article. – Radu GRIGore Sep 21 '10 at 8:18

... or, if you prefer, see some code (by Matt Parkinson) github.com/septract/jstar/blob/master/src/utils/… – Radu GRIGore Sep 21 '10 at 8:50

Now I see why the "kind of.." comment had an upvote. They have good performance only when one almost exclusively either does not use persistence, or backtracks all the time: if you often use both "new" and "old" versions, you're screwed. Cool rerooting idea though. – jkff Sep 22 '10 at 11:13

Radu's link can now be found at github.com/septract/jstar-old/blob/… – jbapple Jun 17 '12 at 2:36

add a comment |

| ----- | |

up vote 18 down vote

I'd add McBride's version of zippers as derivatives of data types.

share|cite|improve this answer

answered Sep 22 '10 at 2:05

none

1812

|
| |

I love that stuff. It's just so cool that the derivative has an application so wildly different from finding rates of change! – SamB Sep 22 '10 at 22:00
3

SamB, you might also be interested in derivatives of regular expressions (if you didn't already know about them). – jbapple Sep 22 '10 at 23:33

Here's the original paper on derivatives of regular expressions. – jameshfisher Jun 16 '13 at 10:37

add a comment |

| ----- | |

up vote 12 down vote

Rangemaps

It is a specialized data structure, but it can be used as a substitute for Martin Erwig's DIET, with slightly different properties, so at least there is one existing data structure to compare it to. The DIET itself was described in an article in JFP in 1998, so perhaps it is not included in Purely Functional Data Structures.

share|cite|improve this answer

answered Jan 28 '11 at 9:07