We've been working for the past few days on a mesh4x adapter that can synchronize a potentially big KML file at a very granular level (styles, placemarks, folders, etc.) so that you can collaboratively edit these large files without having to resolve spurious "conflicts".
From Ed's blog post:
This could be synchronized peer-to-peer (a KML on your disk to a KML on a USB drive or someone else's box) as well as via a 'cloud' web service. Note this is changing the data inside the KML, it is not just 'file sharing'. The adapter knows about KML and keeps track of versions of fine-grained elements (pushpins, placemarks, polygons) inside the same file. It is an example of how a data mesh could be used to synchronize fine-grained data between applications.
Update: Read more about the latest version (including single-file storage, KMZ support, etc.).
I believe this is one of the first instances of a mesh-style synchronization that really proves the point and possibilities of FeedSync and also Live Mesh. Something that Joel Spolsky clearly didn't get.
This technology is going to change the way we think about applications, data ownership and sharing. It's actually a pity that some people is focusing on the one *sample* application that Microsoft is showing (file/folder sharing) to evaluate it.
I already mentioned why I think Live Mesh is cool and that I think the most important part of it, FeedSync, is being largely ignored by reviewers. Fortunately, there's an extensive interview with the team that goes quite deep in FeedSync and how it works. Go watch it, it's good info.
At the most basic level, FeedSync is a mechanism to associate versioning "headers" to arbitrary objects (items), and an algorithm to merge and detect conflicts based on that header information. Replace "header" with "extension element" and "arbitrary object" with "RSS/Atom Item" and you have the XML feed version of it:
<
item
xmlns:sx
='http://feedsync.org/2007/feedsync'>
<title>Buy groceries</title>
<sx:sync id='0a7903db47fb0fff' updates='1'>
<sx:history sequence='1' by='kzu'/>
</sx:sync>
</item>
The sx:sync element is the versioning header. Every time the item is updated (say, by another user/device), a new sx:history element is added and the updates attribute is incremented:
<
item
>
<
title
>Buy groceries</title>
<sx:sync id='0a7903db47fb0fff' updates='2'>
<sx:history sequence='2' by='vga'/>
<sx:history sequence='1' by='kzu'/>
</sx:sync>
</item>
If v1 of the item was updated simultaneously by two users and you try to merge them, you'd end up with something like the following:
<
item
>
<
title
>Buy groceries</title>
<sx:sync id='0a7903db47fb0fff' updates='2'>
<sx:history sequence='2' by='kzu'/>
<sx:history sequence='1' by='kzu'/>
<sx:conflicts>
<item>
<title>Buy icecream</title>
<customer id='1' />
<sx:sync id='0a7903db47fb0fff' updates='2'>
<sx:history sequence='2' by='vga'/>
<sx:history sequence='1' by='kzu'/>
</sx:sync>
</item>
</sx:conflicts>
</sx:sync>
</item>
Notice the added sx:conflicts element, which contains the conflicting item in its entirety. The algorithm provides a consistent selection of a default "winner" from the merge (in this case my v2 item, rather than Victor's). This gives you a consistent state across the mesh. But note that the merge operation with conflict does NOT resolve the conflict. It just surfaces it in a predictable way.
Needless to say, the default winner might not be what the end user wants, so at that point the application can surface the conflict (i.e. a different icon for the item) and allow him to resolve it in a different way (i.e. doing a full content diff and picking the new state). Alternatively, a specific implementation of an item store/adapter (i.e. the feedsync-enabled store for files) can provide automatic resolution if it so wishes to, such as doing an auto-merge of the file contents.
Pablo has more info on how it works and the way the Microsoft Sync Framework exposes this.
It seems really simple at first glance, doesn't it? The devil is in the details, as usual. It's obviously a RESTful approach to synchronization, where the item payload is the actual representation of the state. This means that upon conflict, all you have is the state. Implementing automatic conflict resolution can be quite tricky as a consequence.
Automatic Conflict Resolution?
As part of our Mesh4x implementation with InSTEDD, we're thinking about ways to increase the chances of automatic conflict resolution, as it's generally desirable and you don't want to be bothering your users for every apparently conflicting change. Think about a very simple example: tags. Say both user A and B add a tag at the same time, and then they synchronize:
Initial state of both A and B (say, created by yet another user, C):
<
item
>
<
title
>Buy groceries</title>
<sx:sync id='0a7903db47fb0fff' updates='1'>
<sx:history sequence='1' by='C'/>
</sx:sync>
</item>
User A changes item by adding a category "to-do" (again, notice the new sx:history):
<
item
>
<
title
>Buy groceries</title>
<category>to-do</category>
<sx:sync id='0a7903db47fb0fff' updates='2'>
<sx:history sequence='2' by='A'/>
<sx:history sequence='1' by='C'/>
</sx:sync>
</item>
Whereas user B changes it by adding a different category "supermarket":
<
item
>
<
title
>Buy groceries</title>
<category>supermarket</category>
<sx:sync id='0a7903db47fb0fff' updates='2'>
<sx:history sequence='2' by='B'/>
<sx:history sequence='1' by='C'/>
</sx:sync>
</item>
When they sync, the algorithm will detect a conflict, as we have the same sequence number but with different "by". It will pick a default winner (say, A's change). But clearly this is a case where we could apply auto-merge semantics if we knew the operation each user performed.
You might be tempted to say: just do a content merge and that's it! But the semantics of a merge of state cannot be applied generically (i.e. a Company element gets an IsBankrupt='true' value from one user, and a new Invoice element from another: at the business logic level, the two changes cannot be merged. If the company went bankrup, you can no longer receive invoices from it), and it's even quite tricky with XML elements that can be moved around, renamed, deleted and re-inserted across multiple versions, etc.
So one approach we're thinking is to add command-pattern like hints that an adapter/store can use as hints for automatic conflict resolution and state reconstruction for merging. In the case above, maybe the XML adapter expresses user A operation as a combination of the item payload, the sync versioning header, and diff information about what was changed (added an element inmediately after the first child element of the root, in terms of XPath):
<
item
>
<
title
>Buy groceries</title>
<category>to-do</category>
<sx:sync id='0a7903db47fb0fff' updates='2'>
<sx:history sequence='2' by='A'/>
<sx:history sequence='1' by='C'/>
</sx:sync>
<diff xmlns=".../xmldiff">
<add match="/*[1]">
<category>to-do</category>
</add>
</diff>
</item>
The new diff element would be very similar in spirit to the XML Diff and Patch tool from Microsoft and would make it easier for the XML adapter to determine if a given change is compatible for auto-merge (i.e. it's not a change in a node that was removed, etc.).
I think different adapters might benefit from surfacing different conflict resolution hints. A database may include schema manipulation statements to perform upgrades, for example.
This is exploratory field as we try to make for the best user experience possible. We don't want to end up in the Groove approach where your choices are: save your changes as a different item, or discard your changes :S.
What do you think?
You have probably read or listened all the (maybe a bit vague) information about the Mesh Operating Environment (MOE): a platform that will allow multiple applications and devices to participate in the Live Mesh.
Let's get to the more technical details now, and leave the end-user marketing to someone else ;).
At the core of Live Mesh is FeedSync, a public spec by Microsoft, evolved from the "old" days of SSE. It didn't get its deserved attention IMO back in the day, even though you could put together the pieces from Ray's announcement and later adoption by other products and figure out Live Mesh was coming sooner or later. Ed Jezierski got me interested in it WAY back, and as a result, I created the first open source implementation of the spec.
To understand its importance you need to first understand what FeedSync is. The most succinct definition I could come up with is:
FeedSync is a definition of a versioning header (or metadata) and its associated creation, update and deletion behavior as well as a generic conflict detection algorithm based on it, which can be attached to an arbitrary piece of data.
The interesting part of it, as you are probably realizing already, is NOT the XML microformat itself, or its representation in an RSS/Atom feed, but rather the behavior and semantics associated with it, which you can rely on to be present at compliant endpoints.
The specification defines the behavior for merging (arguably the most important one) data between two "feeds" (think of these as "streams of data with associated versioning header/metadata") in a consistent manner. It's two-way sync for the masses.
Of course, specs without implementations are seldom of any use. Microsoft is providing their own with the Microsoft Synchronization Framework (which does other things in addition to supporting FeedSync), and now Live Mesh. But nothing forbids the community from building compliant cross-platform open source alternative implementations.
Being an open source fan myself, and being lucky enough to work for an organization whose goal is to produce open source software to help in the humanitarian space, I'm glad to announce that we have already been working actively in this space, and we just released early versions of our ongoing projects at Google Code.
Mesh for X
Or Mesh4x for "short", is an umbrella project by InSTEDD under which we're producing Mesh4j and Mesh4n (Java and .NET respectively) versions of a unified implementation and library design/API for FeedSync synchronization between arbitrary repositories of data (i.e. databases, files, spreadsheets, etc.). We may very well come up with Mesh4py (Phyton), Mesh4r (Ruby) or whatever we or the community need to satisfy data synchronization scenarios.
The Mesh4n version is coming straight from its old place in CodePlex, but you expect it to evolve together with the Mesh4j version. Our goal is to keep feature-parity on both. This version of FeedSync for .NET is being used by Microsoft Humanitarian Systems (MHS) in Afganistan to synchronize disparate data sources in extreme and low connectivity conditions as mentioned by Ted Okada (i.e. to achieve two-way synchronization of disconnected Access databases, Excel files and plain RSS files on a pendrive, between any of them!).
At this time, both libraries provide the basic synchronization algorithm and Sync (metadata) data model, as well as the core interfaces to create your own repository adapters to expose data from arbitrary sources for FeedSync synchronization. We'll be building concrete repository adapters during the next few weeks. The most interesting one for me is the Mobile repository, which should allow you to sync two repositories over cellphones, without cellphone internet connection or even a data plan (which needless to say is not very frequent in the under-developed or even developing world).
We'll also be looking for refactoring opportunities to simplify certain scenarios and accomodate the future roadmap for FeedSync.
Some people have argued this is just another consumer-oriented Microsoft-thinghy that will hardly make a difference in anything. I believe that's a mistake, as it's ignoring the underpinning technology, the fact that it's a public specification, and that it can be applied to really free the data once and for all, regardless of platform, application, format, language, etc. This is the very reason I'm not excited at all by all the Atom Publishing Protocolfuss that's happeningthesedays. Web 2.0/AJAX + APP is SO client-server... but that's for another post, another day.
Stay tunned as I'll be posting more detailed (and practical) information about Mesh4x in the coming weeks, as well as the cool stuff we're doing with InSTEDD as part of its goal of helping in the humanitarian space by applying technology.