Chapter 8
Truth knowledge and Lies, after this, you should know which design does not work fundamentally
Single Node
Single computer system, result is deterministic, hardware problem - > blue screen, this is a good choice(deliberate), we prefer to a computer to crash completely rather than returning a wrong result
Scientific Computing: a giant compter
Commercial Data Center:
Cost efficiency matters, use cheap commodity hardwares
something is always broken
Unreliable network
request may have been lost
request may be waiting in a queue
the remote node may have failed
the remote node may have temporarily stopped
no absolute reliable fault detector
widely acceptable indicator is timeout, could not tell if it's node failure or network failure, retry few times when possible
a request could be called several times and results in disastrous outcomes. Idempotent. https://www.i3geek.com/archives/841 call 多次,返回相同结果
如何让一个system 变得idempotent?设计API的时候
network partitions: 写到A, 读B
without testing, the network failure could break the fundamental assumptions
network partition naturally happens, we can only resolve it when it happens, we can not prevent it from happening. CAP theory -> CP or AP (consistency, availability and partition)
network delays are chosen by design
traditionally telephones lines are almost guaranteed no delays, focus on latency
network use patterns are burst mode, system design is focusing on the maximum throughput / resource utilization
similar thing happens with retail time OS
Unreliable Clock
has this request time out yet?
what's the 99th percentile response time of this service? https://engineering.linkedin.com/performance/who-moved-my-99th-percentile-latency
how many queries per second did this service handle on average in the last five minutes?
how long did the user spend on our site?
when was this article published?
at what date and time should the reminder email be sent?
when does this cache entry expire?
what is the timestamp on this error message in the log file?
Most commonly used mechanism is the Network Time Protocol (NTP) Individual NTP is not reliable.
Two type of clock
time-of-day. It returns the current date and time according to some calendar. Java System.currentTimeMillis() and Linux: clock_gettime(CLOCK_REALTIME)
highly unreliable, could jump forward, backward.
in google spanner, time drift assumption is 30 secs per 24 hrs
. monotonic clock is suitable for measuring a duration (time interval).
clock_gettime(CLOCK_MONOTONIC) on linux and System.nanoTime() in Java
duration could be measured relatively accurate
Why assumption based on clock is very dangerous
most of the components in distributed system provides accurate deterministic result of crash. Not clock
Caveat of last write win (Cassandra, can not used for bank transaction) / DynamoDB can write client config resolution
后来的request的time clock比其他的慢,request会被覆盖
distributed id generator (UUID) -> very difficult, mac id + machine local time, different machine has different time clock
Process pause
Java virtual machine has a garbage collector which occasionally needs to stop all running threads. "Stop-the-world" GC pauses have sometimes been know to last for several minutes.
When the OS context-switches to another thread, or when the hypervisor switches to a different virtual machine, the current running thread can be paused at any arbitrary point in the code.
if the application performs synchronous disk access, a thread may be paused waiting for a slot disk I/O operation to complete, Java class loader lazy load first time.
Thrashing: swapping to disk (paging)
real time OS: every library call has a worst time guaranteed.
Redlock: fault tolerant distributed locks
how to do distributed locking: martin kleppmann
A real case in HBase (Redis)
ZooKeeper transaction ID
Truths
Quorum
Lies: Byzantine failures - unreliable node / network
Different models in distributed system
Last updated