Chapter 8

Truth knowledge and Lies, after this, you should know which design does not work fundamentally

Single Node

  • Single computer system, result is deterministic, hardware problem - > blue screen, this is a good choice(deliberate), we prefer to a computer to crash completely rather than returning a wrong result

Scientific Computing: a giant compter

Commercial Data Center:

  • Cost efficiency matters, use cheap commodity hardwares

  • something is always broken

Unreliable network

  • request may have been lost

  • request may be waiting in a queue

  • the remote node may have failed

  • the remote node may have temporarily stopped

  • no absolute reliable fault detector

  • widely acceptable indicator is timeout, could not tell if it's node failure or network failure, retry few times when possible

  • a request could be called several times and results in disastrous outcomes. Idempotent. https://www.i3geek.com/archives/841 call 多次,返回相同结果

  • 如何让一个system 变得idempotent?设计API的时候

  • network partitions: 写到A, 读B

  • without testing, the network failure could break the fundamental assumptions

  • network partition naturally happens, we can only resolve it when it happens, we can not prevent it from happening. CAP theory -> CP or AP (consistency, availability and partition)

network delays are chosen by design

  • traditionally telephones lines are almost guaranteed no delays, focus on latency

  • network use patterns are burst mode, system design is focusing on the maximum throughput / resource utilization

  • similar thing happens with retail time OS

Unreliable Clock

  1. has this request time out yet?

  2. what's the 99th percentile response time of this service? https://engineering.linkedin.com/performance/who-moved-my-99th-percentile-latency

  3. how many queries per second did this service handle on average in the last five minutes?

  4. how long did the user spend on our site?

  5. when was this article published?

  6. at what date and time should the reminder email be sent?

  7. when does this cache entry expire?

  8. what is the timestamp on this error message in the log file?

Most commonly used mechanism is the Network Time Protocol (NTP) Individual NTP is not reliable.

Two type of clock

  1. time-of-day. It returns the current date and time according to some calendar. Java System.currentTimeMillis() and Linux: clock_gettime(CLOCK_REALTIME)

    1. highly unreliable, could jump forward, backward.

    2. in google spanner, time drift assumption is 30 secs per 24 hrs

  2. . monotonic clock is suitable for measuring a duration (time interval).

    1. clock_gettime(CLOCK_MONOTONIC) on linux and System.nanoTime() in Java

    2. duration could be measured relatively accurate

Why assumption based on clock is very dangerous

  • most of the components in distributed system provides accurate deterministic result of crash. Not clock

Caveat of last write win (Cassandra, can not used for bank transaction) / DynamoDB can write client config resolution

  • 后来的request的time clock比其他的慢,request会被覆盖

  • distributed id generator (UUID) -> very difficult, mac id + machine local time, different machine has different time clock

Process pause

  • Java virtual machine has a garbage collector which occasionally needs to stop all running threads. "Stop-the-world" GC pauses have sometimes been know to last for several minutes.

  • When the OS context-switches to another thread, or when the hypervisor switches to a different virtual machine, the current running thread can be paused at any arbitrary point in the code.

  • if the application performs synchronous disk access, a thread may be paused waiting for a slot disk I/O operation to complete, Java class loader lazy load first time.

  • Thrashing: swapping to disk (paging)

  • real time OS: every library call has a worst time guaranteed.

Redlock: fault tolerant distributed locks

how to do distributed locking: martin kleppmann

A real case in HBase (Redis)

ZooKeeper transaction ID

Truths

Quorum

Lies: Byzantine failures - unreliable node / network

Different models in distributed system

Last updated