Placeholder for Distributed Systems papers


#1

Here’s a place to put links to relevant papers and other references about how to
build distributed systems that actually work. My initial thought is that since debugging
of distributed systems is difficult, we should consider approaches involving formal
specification and verification of the protocols and the design (perhaps with a closely-related model being used for performance modelling, simulation, and performance projections
for different hardware configurations to optimize price-performance). So I’m starting out
trying to read about formal verification of distributed systems.

An Empirical Study on the Correctness of Formally Verified Distributed Systems

They look at some formally verified distributed systems, and find that the protocols
are indeed bug-free (in contrast to non-formally-verified systems) - but that there are
still bugs arising from interfaces between the verified code and other parts (e.g. the OS),
viz “assumptions about the unverified code, unverified libraries, resources implicitly
used by verified code, verification infrastructure, and specification”.

Verdi: A Framework for Implementing and Formally Verifying Distributed Systems

Automating Formal Proofs for Reactive Systems

Todd Montgomery on Protocol Design, Security, Formal Verification Tools

SPIN model checker

NuSMV symbolic model checker

http://nusmv.fbk.eu/

Revisiting Actor Programming in C++ (CAF = C++ Actor Framework)

Slides about typical datacenter network architecture (inter-rack bandwidth is often
4x lower than aggregate bandwidth of servers within a rack, and there may be
higher-level bottlenecks as well).

http://www.ieee802.org/3/ad_hoc/bwa/public/may11/kipp_01_0511.pdf

"Amazon Web Services has used TLA+ since 2011. TLA+ model checking uncovered bugs in DynamoDB, S3, EBS, and an internal distributed lock manager; some bugs required state traces of 35 steps. Model checking was also used to verify aggressive optimizations. In addition, TLA+ specifications were found to hold value as documentation and design aids.[4][23]

Microsoft Azure used TLA+ to design Cosmos DB, a globally-distributed database with five different consistency models."

http://www.avispa-project.org/

AVISPA stands for Automated Validation of Internet Security Protocols and Applications.
The AVISPA project aims at developing a push-button, industrial-strength technology for the analysis of large-scale Internet security-sensitive protocols and applications.