hn-classics/_stories/2010/9672257.md

19 KiB
Raw Permalink Blame History

created_at title url author points story_text comment_text num_comments story_id story_title story_url parent_id created_at_i _tags objectID year
2015-06-06T21:04:18.000Z Why Vector Clocks Are Easy (2010) http://basho.com/why-vector-clocks-are-easy/ ninjakeyboard 72 9 1433624658
story
author_ninjakeyboard
story_9672257
9672257 2010

Source

Vector Clocks Explained

Basho logo

  • ACADEMY
  • DOWNLOAD RIAK
  • DOCS
  • CONTACT
  • [ENGLISH][5]
    • [English][6]

    • [日本語][7]

    • [Deutsch][8]

    • [Français][9]

    • [Products][10]

      • [Overview][10]
      • [Riak® KV][11]
        • [Use Cases][12]
        • [Features][13]
        • [Commercial V. OSS][14]
        • [Take Feature Tour][15]
      • [Riak® TS][16]
        • [Use Cases][17]
        • [Features][18]
        • [Commercial V. OSS][19]
        • [Take Feature Tour][20]
    • [Integrations][21]

      •         * [Apache Spark][22]
        
        • [Apache Solr][23]
        • [Apache Mesos][24]
        • [Redis Caching][25]
      •         * [Riak S2][26]
          * [Features][27]
          * [Commercial V. OSS][28]
        
    • Services

      • [Support][29]
      • [Professional Services][30]
    • [Industries][31]

      • [Gaming & Betting][32]
      • [Retail & eCommerce][33]
      • [Telco & Service Providers][34]
      • [Health Services][35]
      • [Software & Technology][36]
      • [Security & Services][37]
      • [Media & Entertainment][38]
      • [Transport, Manufacture & Utilities][39]
    • [Use Cases][40]

      • [IoT/Sensor][41]
      • [Session Data][42]
      • [Time Series][43]
      • [Content & Docs][44]
      • [Business Continuity][45]
      • [Edge Analytics][46]
      • [Messaging & Chat][47]
      • [Metrics / Analytics][48]
    • [Resources][49]

      •         * [Resource Center][49]
          * [Webinars][50]
          * [DataSheets][51]
          * [WhitePapers][52]
          * [Solution Briefs][53]
          * [Case Studies][54]
          * [User Stories][55]
          * [Customer Testimonials][56]
        
      •         * [Customers][57]
        
        • [Video Playlist][58]
        • [LEARN ABOUT:][59]
          • [Big Data Databases][60]
          • [Document Databases][61]
          • [High Availability Databases][62]
          • [Key-Value Databases][63]
          • [NoSQL Databases][64]
          • [Time Series Databases][65]
    • [Company][66]

      •         * [Developers][67]
          * [Docs][3]
          * [Community][68]
          * [Download Riak][3]
          * [Academy][2]
        
      •         * [About Basho][66]
          * [Governance][69]
        
      •         * [Newsroom][70]
          * [Basho Press Releases][71]
          * [Basho In the News][72]
        
        • [Events][73]
        • [Careers][74]
      •         * [Contact][4]
        
        • [Customers][57]
        • [Partners][75]
      •         * [About Our Products][10]
          * [Overview][10]
          * [Riak KV][11]
          * [Riak TS][16]
          * [Integrations][21]
        
    • [Blog][76]

      • [Technical][77]
      • [Business][78]

Basho Blog

  • [Home][79]
  • [Blog][76]

Why Vector Clocks are Easy

Posted January 29, 2010

[by Bryan Fink][80]

Category: [Technical Blog][77]

January 29, 2010

[Vector clocks][81] are confusing the first time youre introduced to them. Its not clear what their benefits are, nor how it is you derive said benefits. Indeed, each Riak developer has had his own set of false starts in making them behave.

The truth, though, is that vector clocks are actually very simple, and a couple of quick rules will get you all the power you need to use them effectively.

The simple rule is: assign each of your actors an ID, then make sure you include that ID and the last vector clock you saw for a given value whenever to store a modification.

The rest of this post will explain why and how to follow that simple rule. First, Ill explain how vector clocks work with a very simple example, and then show how to use them easily in Riak.

Vector Clocks by Example

Weve all had this problem:

Alice, Ben, Cathy, and Dave are planning to meet next week for
dinner. The planning starts with Alice suggesting they meet on
Wednesday. Later, Dave discuss alternatives with Cathy, and they
decide on Thursday instead. Dave also exchanges email with Ben, and
they decide on Tuesday. When Alice pings everyone again to find out
whether they still agree with her Wednesday suggestion, she gets
mixed messages: Cathy claims to have settled on Thursday with Dave,
and Ben claims to have settled on Tuesday with Dave. Dave cant be
reached, and so no one is able to determine the order in which these
communications happened, and so none of Alice, Ben, and Cathy know
whether Tuesday or Thursday is the correct choice.

The story changes, but the end result is always the same: you ask two people for the latest version of a piece of information, and they reply with two different answers, and theres no way to tell which one is really the most recent.

Vector clocks to the rescue, but how? Simple: tag the date choice with a vector clock, and then have each party member update the clock whenever they alter the choice. Start with Alices initial message:

date = Wednesday
vclock = Alice:1

Alice says, “Lets meet Wednesday,” and tags that value as the first version of the message that she has seen. Now Dave and Ben start talking. Ben suggests Tuesday:

date = Tuesday
vclock = Alice:1, Ben:1

Ben left Alices mark alone, but added a mark specifying that it was the first version of the message that he had seen. Dave replies, confirming Tuesday:

date = Tuesday
vclock = Alice:1, Ben:1, Dave:1

Just like Bens modification, Dave just adds his own first-revision mark. Now Cathy gets into the act, suggesting Thursday:

date = Thursday
vclock = Alice:1, Cathy:1

But wait, what happened to Bens and Daves marks? Cathy didnt have a version of the object that had been modified by Ben or Dave, so their marks cant appear in her modification. This means that Dave has two conflicting objects:

date = Tuesday
vclock = Alice:1, Ben:1, Dave:1

and

date = Thursday
vclock = Alice:1, Cathy:1

Dave can tell that these versions are in conflict, because neither vclock “descends” from the other. In order for vclock B to be considered a descendant of vclock A, each marker in vclock A must have a corresponding marker in B that has a revision number greater than or equal to the marker in vclock A. Markers not contained in a vclock can be considered to have revision number zero. So, since the Tuesday value has a Cathy revision of zero while Thursday has a Cathy revision of one, _Tuesday _cannot descend from Thursday. But, since Thursday has Ben and Dave revisions of zero while Tuesday has Bend and Dave revisions of one, Thursday is also not descended from Tuesday. Neither succeeds the other, so Dave has a conflict to sort out.

Luckily, Daves a reasonable guy, and chooses Thursday:

date = Thursday
vclock = Alice:1, Ben:1, Cathy:1, Dave:2

Dave also created a vector clock that is successor to all previously-seen vector clocks: it has revision numbers for every actor equal to or greater than the last revision number he saw for that actor. He emails this value back to Cathy.

So now when Alice asks Ben and Cathy for the latest decision, the replies she receive are, from Ben:

date = Tuesday
vclock = Alice:1, Ben:1, Dave:1

and from Cathy:

date = Thursday
vclock = Alice:1, Ben:1, Cathy:1, Dave:2

From this, she can tell that Dave intended his correspondence with Cathy to override the decision he made with Ben. All Alice has to do is show Ben the vector clock from Cathys message, and Ben will know that he has been overruled. (Dave will, almost certainly, blame his broken email software for failing to inform Ben of the change.)

How to do this in Riak

Now that you understand vector clocks, using them with Riak is easy. Ill use the raw HTTP interface to illustrate.

First, whenever you store a value, include an X-Riak-ClientId header to identify your actor. For Alices first message above, youd say:

curl -X PUT -H "X-Riak-ClientId: Alice" -H "content-type: text/plain"
http://localhost:8098/raw/plans/dinner --data "Wednesday"

When Ben, Cathy, and Dave each GET Alices plans, theyll get the same vector clock (Ive removed some of the other headers for brevity):

curl -i http://localhost:8098/raw/plans/dinner
HTTP/1.1 200 OK
X-Riak-Vclock: a85hYGBgzGDKBVIsrLnh3BlMiYx5rAzLJpw7wpcFAA==
Content-Type: text/plain
Content-Length: 9

Wednesday

The X-Riak-Vclock header contains an encoded version of a vclock that is the same as out earlier example: Alice has modified this value once.

Now when Ben sends his change to Dave, he includes both the vector clock he pulled down (in the X-Riak-Vclock header), and his own X-Riak-Client-Id:

curl -X PUT -H "X-Riak-ClientId: Ben" -H "content-type: text/plain"
-H "X-Riak-Vclock: a85hYGBgzGDKBVIsrLnh3BlMiYx5rAzLJpw7wpcFAA=="
http://localhost:8098/raw/plans/dinner --data "Tuesday"

Dave pulls down a fresh copy, and then confirms Tuesday:

curl -i http://localhost:8098/raw/plans/dinner
...
X-Riak-Vclock: a85hYGBgymDKBVIsrLnh3BlMiYx5rAymfeeO8EGFWRLl30GF/00ACmcBAA==
...
curl -X PUT -H "X-Riak-ClientId: Dave" -H "content-type: text/plain"
-H "X-Riak-Vclock: a85hYGBgymDKBVIsrLnh3BlMiYx5rAymfeeO8EGFWRLl30GF/00ACmcBAA=="
http://localhost:8098/raw/plans/dinner --data "Tuesday"

Cathy, on the other hand, hasnt pulled down a new version, and instead merely updated the plans with her suggestion of Thursday:

curl -X PUT -H "X-Riak-ClientId: Cathy" -H "content-type: text/plain"
-H "X-Riak-Vclock: a85hYGBgzGDKBVIsrLnh3BlMiYx5rAzLJpw7wpcFAA=="
http://localhost:8098/raw/plans/dinner --data "Thursday"

(Thats the same vector clock that Ben used, in that encoded gibberish is making your eyes cross.)

Now, when Dave goes to grab this new copy (after Cathy tells him she has posted it), hell see one of two things. If the “plans” Riak bucket has the allow_mult property set to false, hell see just Cathys update. If allow_mult is true for the “plans” bucket, hell see both his last update and Cathys. Im going to show the allow_mult=true version below, because I think it illustrates the flow better.

curl -i -H "Accept: multipart/mixed" http://localhost:8098/raw/plans/dinner
HTTP/1.1 300 Multiple Choices
X-Riak-Vclock: a85hYGBgzWDKBVIsrLnh3BlMiYx5rAymfeeO8EGFWRLl30GF1fsRwsypF59BhT0mIoTZ/1SYQIUrEcJszUksu9R6kCWyAA==
Content-Type: multipart/mixed; boundary=ZZ3eyjUllBi7GXRRMJsUublFxjn
Content-Length: 368

--ZZ3eyjUllBi7GXRRMJsUublFxjn
Content-Type: text/plain

Tuesday
--ZZ3eyjUllBi7GXRRMJsUublFxjn
Content-Type: text/plain

Thursday
--ZZ3eyjUllBi7GXRRMJsUublFxjn--

Dave sees two values because the vclock that Cathy generated wasnt a successor to the vclock that Dave had generated with his last modification. Riak couldnt choose between them, and therefore kept both values.

Dave picks Thursday, and updates the object, resolving the conflict. Riak has already computed a unified, descendant vector clock for Dave, so he uses the vector clock from the multi-value version he just pulled down, just like before:

curl -X PUT -H "X-Riak-ClientId: Dave" -H "content-type: text/plain"
-H "X-Riak-Vclock: a85hYGBgzWDKBVIsrLnh3BlMiYx5rAymfeeO8EGFWRLl30GF1fsRwsypF59BhT0mIoTZ/1SYQIUrEcJszUksu9R6kCWyAA=="
http://localhost:8098/raw/plans/dinner --data "Thursday"

Now when Alice check for the latest version, she just sees the final decision:

curl -i http://localhost:8098/raw/plans/dinner
HTTP/1.1 200 OK
X-Riak-Vclock: a85hYGBgzWDKBVIsrLnh3BlMiYx5rAymfeeO8EGFWRLl30GF1fvhwmzNSSy71HqgEpUTEerZ/1SYYBFmTr34DCjMBBTOnQwUzgIA
Content-Type: text/plain
Content-Length: 7

Thursday

While Riak couldnt decide whether to choose Cathys modification over Daves earlier modification, it was easy to choose Daves latest modification, because the vclock created was a successor to the vclock in place.

Review

So, vclocks are easy: assign each of your actors an ID (“Alice”, “Ben”, “Cathy”, and “Dave” in these examples), then make sure you include that ID and the last vector clock you saw for a given value whenever to store a modification.

If two actors store changes with vector clocks that dont descend from each other, Riak will store and hand back both values. When descendancy can be calculated, values stored with vector clocks that have been succeeded will be removed.

[-Bryan][82]

[blog][83], [Riak][84], [vector clocks][85]

Share:

Recent Articles

![][86]

[DC/OS 1.9 and Riak Mesos Framework][87]

March 13, 2017

[Pavel Hardak][88]

We are pleased today to announce with Mesosphere, the launch of DC/OS 1.9 and a major upgrade of the integration…

[Read More][87]

[Traditional "Data Lake" Approach May Not Be A Good Choice for IoT Data][89]

March 1, 2017

[Dorothy Pults][90]

The momentum around Apache Spark continues. Spark Summit East was a big success and Basho's own Pavel Hardak was among…

[Read More][89]

[Basho Academy: Install, Code and Go][91]

February 27, 2017

[Dorothy Pults][90]

Are you new to NoSQL and want to find out more about Riak? Basho Academy may be the right place for…

[Read More][91]

[Create Your First Riak TS Table][92]

February 21, 2017

[Dorothy Pults][90]

Customers like Intellicore are doing real-time analysis of IoT data using Riak TS. We want to make it easy for…

[Read More][92]

  • [Products][10]
  • [Integrations][21]
  • [Resources][49]
  • [NoSQL Explained][64]
  • [Industries][31]
  • [Use Cases][40]
  • [About Basho][66]
  • [Newsroom][70]
  • [Blog][76]
  • [DEVELOPERS][67]
  • [Events][73]
  • [Careers][74]
  • Contact
  • [Partners][75]

![][93]

![][94]

![][95]

[5]: javascript: void(0); [6]: http://www.basho.com [7]: http://jp.basho.com/posts/technical/why-vector-clocks-are-easy/ [8]: http://de.basho.com/posts/technical/why-vector-clocks-are-easy/ [9]: http://fr.basho.com/posts/technical/why-vector-clocks-are-easy/ [10]: http://basho.com/products/ [11]: http://basho.com/products/riak-kv/ [12]: http://basho.com/products/riak-kv/#use-cases [13]: http://basho.com/products/riak-kv/#features [14]: http://basho.com/products/riak-kv/#commercial [15]: http://basho.com/products/riak-kv/resiliency/ [16]: http://basho.com/products/riak-ts/ [17]: http://basho.com/products/riak-ts/#use-cases [18]: http://basho.com/products/riak-ts/#features [19]: http://basho.com/products/riak-ts/#commercial [20]: http://basho.com/products/riak-ts/resiliency/ [21]: http://basho.com/products/integrations/ [22]: http://basho.com/products/apache-spark/ [23]: http://basho.com/products/solr/ [24]: http://basho.com/products/apache-mesos/ [25]: http://basho.com/products/redis/ [26]: http://basho.com/products/riak-s2/ [27]: http://basho.com/products/riak-s2/#features [28]: http://basho.com/products/riak-s2/#commercial [29]: http://basho.com/products/support/ [30]: http://basho.com/products/professional-services/ [31]: http://basho.com/industries/ [32]: http://basho.com/industries/gaming-betting/ [33]: http://basho.com/industries/retail-ecommerce/ [34]: http://basho.com/industries/telco-service-providers/ [35]: http://basho.com/industries/health-services/ [36]: http://basho.com/industries/software-technology/ [37]: http://basho.com/industries/security-companies/ [38]: http://basho.com/industries/media-entertainment/ [39]: http://basho.com/industries/transport-manufacture-utilities/ [40]: http://basho.com/use-cases/ [41]: http://basho.com/use-cases/iot-sensor-device-data/ [42]: http://basho.com/use-cases/session-data/ [43]: http://basho.com/use-cases/time-stamped-data-feeds/ [44]: http://basho.com/use-cases/content-and-documents/ [45]: http://basho.com/use-cases/business-continuity/ [46]: http://basho.com/use-cases/edge-analytics/ [47]: http://basho.com/use-cases/messaging/ [48]: http://basho.com/use-cases/metrics-log-analytics/ [49]: http://basho.com/resources/ [50]: http://basho.com/resources/?resource=types&t=WEBINAR [51]: http://basho.com/resources/?resource=types&t=DATASHEET [52]: http://basho.com/resources/?resource=types&t=WHITEPAPER [53]: http://basho.com/resources/?resource=types&t=SOLUTION%20BRIEF [54]: http://basho.com/resources/?resource=types&t=CASE%20STUDY [55]: http://basho.com/resources/?resource=types&t=USER%20STORY [56]: http://basho.com/resources/?resource=types&t=CUSTOMER%20TESTIMONIAL [57]: http://basho.com/about/customers/ [58]: http://basho.com/resources/video/ [59]: http://basho.com# [60]: http://basho.com/resources/big-data-databases/ [61]: http://basho.com/resources/document-databases/ [62]: http://basho.com/resources/high-availability-databases/ [63]: http://basho.com/resources/key-value-databases/ [64]: http://basho.com/resources/nosql-databases/ [65]: http://basho.com/resources/time-series-databases/ [66]: http://basho.com/about/ [67]: http://basho.com/community [68]: http://basho.com/community/ [69]: http://basho.com/about/governance/ [70]: http://basho.com/newsroom/ [71]: http://basho.com/category/press/ [72]: http://basho.com/category/news/ [73]: http://basho.com/events/ [74]: http://basho.com/careers/ [75]: http://basho.com/partners/ [76]: http://basho.com/blog/ [77]: http://basho.com/category/technical/ [78]: http://basho.com/category/business/ [79]: http://basho.com/ [80]: http://basho.com/posts/author/bryan-fink/ [81]: http://en.wikipedia.org/wiki/Vector_clock [82]: http://twitter.com/hobbyist [83]: http://basho.com/tag/blog/ [84]: http://basho.com/tag/riak/ [85]: http://basho.com/tag/vector-clocks/ [86]: http://basho.com/wp-content/themes/basho-v2/assets/images/blog/divider-line.png [87]: http://basho.com/posts/technical/dcos-and-riak-mesos-framework/ [88]: http://basho.com/posts/author/pavelhardak/ [89]: http://basho.com/posts/technical/traditional-data-lake-approach-may-not-be-a-good-choice-for-iot-data/ [90]: http://basho.com/posts/author/dorothy-pults/ [91]: http://basho.com/posts/technical/basho-academy-install-code-and-go/ [92]: http://basho.com/posts/technical/create-your-first-riak-ts-table/ [93]: http://basho.com/wp-content/themes/basho-v2/dist/images/footer-logo.jpg [94]: http://basho.com/wp-content/themes/basho-v2/dist/images/back-top.png [95]: //googleads.g.doubleclick.net/pagead/viewthroughconversion/941303400/?value=0&guid=ON&script=0