Shared data architectures

From: Rowan 8 Sep 2012 14:36
To: ALL1 of 20
Hello, again, everyone.

I have a hypothetical situation where a number of desktop clients (lets say 100s) all on a local network want to display and modify a shared set of data (lets say its small, like 1MB total). Basic requirements:
  • Clients update to changes to the state in near real-time

  • Any client can modify the state, causing all others to update

  • Conflicts arising from multiple incompatible updates to the same data at the same time are resolved in such a way that the user can understand (and, if necessary, amend) them

  • Clients can easily join and leave the group at run time

  • Low enough traffic to not flood the network

  • Clients can't rely on external services being present on the machine - they need to be able to launch them if they need them



I think this must be a pretty common scenario, so I wondered what people's thoughts were on possible solutions to it?

Having one master and several slaves is probably one; when an update is made on a client, a command message is sent to the server, who then relays it on to all the clients. How would people implement that? Some kind of pub/sub setup using AMPQ? Roll your own? Something else?

Master/slave seems a bit passé these days, though. Are there any distributed solutions? I briefly wondered if a distributed database like Riak or Cassandra could work, but joining / leaving the group seems like a faff, and they don't seem to have full replication in mind (as they're set up to deal with huge data sets). Something like a gossip protocol feels like a good way to pass around the updates, though.

This seems like it should be a well-solved problem, and there should be a tool (or set of tools than can be plugged together) to achieve all the grunt work, but I'm not sure what it is.

Anyone have any thoughts?
From: Peter (BOUGHTONP) 8 Sep 2012 14:46
To: Rowan 2 of 20
*waves*

( http://incubator.apache.org/wave/ )
From: Rowan 8 Sep 2012 14:57
To: Peter (BOUGHTONP) 3 of 20
*particles*
From: CHYRON (DSMITHHFX) 8 Sep 2012 15:17
To: Rowan 4 of 20
This sounds sort of like git or some other such revision/version control something something.
EDITED: 8 Sep 2012 15:18 by DSMITHHFX
From: Dan (HERMAND) 8 Sep 2012 15:58
To: CHYRON (DSMITHHFX) 5 of 20

That's probably not a bad idea.

 

Rowan, can you give us some real world context?

From: Rowan 8 Sep 2012 16:10
To: CHYRON (DSMITHHFX) 6 of 20
Hmm. I don't think DVCS stuff is a good fit, really:
  1. There's no requirement for version control (beyond whatever's necessary for conflict resolution, which I imagine is minimal).

  2. They're not really set up to disseminate changes to a bunch of peers, AFAIK

  3. The ideal solution would keep it all in memory for rapid access

  4. It'd also be easy to integrate into an application (of course it'd be possible to do that with Git, etc, but it's not what the designers had in mind)

From: Rowan 8 Sep 2012 16:13
To: Dan (HERMAND) 7 of 20

Can't really give you context, no, as it's just a hypothetical.

 

How about as the basis for a system for booking meeting rooms in a large office building? I realise there are solutions for this already, but just as an example. Lots of users need to read and manipulate the data at the same time, and changes need to propagate out to all the users quickly.

From: Lucy (X3N0PH0N) 8 Sep 2012 17:37
To: Rowan 8 of 20
Sounds like what Wave was doing (being able to make multi-user apps and handling concurrency and conflict elegantly), in part. Which isn't much use to you, I realise.
From: Lucy (X3N0PH0N) 8 Sep 2012 17:46
To: Lucy (X3N0PH0N) 9 of 20
Follwing that train of google-thought: http://codoxware.com/downloads/cesdk ?
From: Lucy (X3N0PH0N) 8 Sep 2012 17:46
To: Lucy (X3N0PH0N) 10 of 20
Or for web apps: http://sharejs.org/ ?
From: Rowan 8 Sep 2012 18:09
To: Lucy (X3N0PH0N) 11 of 20

Yeah, I guess... But, saying it's "like Wave" makes it sound cutting edge, or something. I don't think it's as exotic as all that, really. I mean, you could kinda say the same about IRC (which just has a conflict resolution system of "meh").

 

I'm sure there are tonnes of applications that do this sort of thing, and I guess most of them roll their own master/slave thing.

From: Lucy (X3N0PH0N) 8 Sep 2012 18:17
To: Rowan 12 of 20
I dunno, I don't think many places have got to the point where they're collaborating in anything like realtime over the/a net.
From: Lucy (X3N0PH0N) 8 Sep 2012 18:19
To: Rowan 13 of 20
But no, what I meant about Wave is that it is was a framework for making apps and the framework has conflict/concurrency stuff built in.

Googling for alternatives lead me to: http://en.wikipedia.org/wiki/Operational_transformation
From: Rowan 8 Sep 2012 23:00
To: Lucy (X3N0PH0N) 14 of 20

Sure they have! Obvious examples are IM clients, but that sort of stuff is everywhere now (e.g. presence stuff in Steam). Even Explorer will magically update itself if you're viewing a remote folder whose content changes.

 

Ta for the OT link - already had it open in a browser for later reading.

From: Peter (BOUGHTONP) 8 Sep 2012 23:58
To: Rowan 15 of 20
IM clients aren't really collaboration, are they?


Why is your situation hypothetical?
From: Rowan 9 Sep 2012 08:47
To: Peter (BOUGHTONP) 16 of 20

I've not used the word collaboration (as I suspect it has some specific technical meaning). IM clients do have shared/synchronised state, though - the presence data is a better example than the actually chat, as it's not append-only. Consider a user logged in to the same account on multiple clients and they change their name on one; all the clients can change that state, and as soon as one does, the change propagates to the others.

 

I don't understand how to answer your second question. I have no actual need to solve this problem, I'm merely interested.

From: Rowan 9 Sep 2012 08:58
To: Lucy (X3N0PH0N) 17 of 20

Okay, so that operation transformation thing looks like it's just a jargon-heavy way of consistently serialising the kind of commands I suggested in my first post.

 

I guess that's just the way everyone tackles this sort of thing, 'cos it's simpler, and there's no real reason not to at this scale. Ho hum.

From: CHYRON (DSMITHHFX) 9 Sep 2012 12:26
To: Rowan 18 of 20
I was thinking about this last night, and two, probably irrelevant buzzwords popped into my head: cloud, and cluster.
From: DSLPete (THE_TGG)13 Sep 2012 23:10
To: Rowan 19 of 20

Along the lines of AMQP, take a look at www.rabbitmq.com It's a doddle to set up, lightning quick (we regularly run at ~80,000 messages/sec between Europe and the US over the public internet), will run in RAM or on disk and supports quite a few message distribution models, including allowing an ad-hoc number of consumers to connect and receive messages - so each client would just register its own queue update consumer when it was fired up, and when an update is made in a client it could be published out to all.

 

It's worth doing the 6 tutorials so you can see just what it can do - a solution may present itself to you.

From: Rowan14 Sep 2012 08:20
To: DSLPete (THE_TGG) 20 of 20
Yeah, I've looked at RabbitMQ in the past (and I suspect it's much nicer than Apache's ActiveMQ which is the only AMQP thingum I've used). It has a nice site of friendly client-side libraries, too, for various languages, so it'd probably be what I'd plump for if I went that route.

It only takes care of the messaging part, though; I feel like there should be something out there that builds a layer on top of this and automagically syncs data. Nothing quite seems to fit the bill, though.