• Require us to have multiple copies of data which need to keep synchronized.

  • Enterprise architect is full of platforms and frameworks which are distributed by nature.

  • Type of platform/framework and example:

    • Databases: Cassandra, HBase, Riak
    • Message Brokers: Kafka, Pulsar
    • Infrastructure: Kubernetes, Mesos, Zookeeper, etcd, Consul
    • In Memory Data/Compute Grids: Hazelcast, Pivotal Gemfire
    • Stateful Microservices: Akka Actors, Axon
    • File Systems: HDFS, Ceph
  • They run on multiple servers

  • they manage data

  • Several things can go wrong when data is stored on multiple servers.

  • Process crashes

  • the bottom line is that if the process is responsible for storing data, it should be designed to give durability guarantee for the data store on the servers.

  • Because flushing data to the disk is one of the most time-consuming operations, not every insert or update to the storage can be flushed to disk. So most databases have in-memory storage structures which are only periodically flushed to disk. This poses a risk of losing all the data if the process abruptly crashes.

  • WAL

network delays

In the TCP/IP protocol stack, there is no upper bound on delays caused in transmitting messages across a network. Problems:

  • a particular cannot wait to know if another server is crashed. Solution is Heart Beat Pattern
  • split brain problem. https://www.quora.com/What-is-split-brain-in-distributed-systems We must ensure that two set of servers which are disconnected from each other, should be able to make progress independently. To ensure this, every action the server takes, considered is successful if the majority of server s can confirm the action. If the servers cannot get the majority, they will not be able to provide required services, and some groups of clients might not be receiving the service, but servers in the cluster will always be in consistent state.
  • Quorum
  • inconsistency resolution: versioning
  • gossip protocol