程式人蔘: Erlang 的啟發

Tuesday, December 25, 2018

Erlang 的啟發

(1) Remote procedure call 不是一個好的 idea
What seems like a good, simple idea on the surface -- hiding networks and messages behind a more familiar application-development idiom -- often causes far more harm than good.

Request-response is a network-level message exchange pattern, whereas RPC is an application-level abstraction intended.

Equating RPC with synchronous messaging means the later wrongly suffers the same criticisms as RPC.

RPC 主要有兩個問題：
(1) 無法提供合理的抽象層來處理 network failure
(2) 讓工程師沒有注意到透過網路呼叫程序的代價，容易設計出 response time 過長的系統。

Erlang 沒有提供「遠端程序呼叫」。Erlang 只有提供 message passing 和 failure handling 的 primitives ，即 send/receive 和 link 。

(2) Scalable, fault-tolerant, hot-upgradable 的共同點
一個系統同時若要同時滿足上述三件事，其實需要實作下列的 primitives
(a) detect failure
(b) move state from one node to another

When designing a system for fail-over, scalability, dynamic code upgrade we have to think about the following:

What information do I need to recover from a failure?
How can we replicate the information we need to recover from a failure?
How can we mask failures/code_upgrades/scaling operations from the clients

(3) Cooperative multitasking is fragile
協作式多工是「脆弱」的設計。( 但是，cooperative 的效能通常會比 preemptive multitasking 好一些。)

The weakness of the cooperative model is its fragility in server settings. If one of the tasks in the task queue monopolizes the CPU, hangs or blocks, then the impact is worse throughput, higher latency, or a deadlocked server. The fragility has to be avoided in large-scale systems, so the Erlang runtime is built preemptively. The normal code is “instrumented” such that any call which may block automatically puts that process to sleep, and switches in the next one on the CPU.

(4) Stacking Theory for Systems Design
對 service 的可用性，定義出數個不同的 mode of operation ，比方說：
VM 有 run 起來，這就是 level 0 。
database connection 有通，這就是 level 1 。

如果某些條件達成，系統就會轉換到高一級的 operational level 。如果 error 發生，就自動轉換低一級的 operational level 。

By "stacking" service, we can go back to level 0 and start best-effort transitions to level 1 again. This structure is a ratchet-mechanism in Erlang systems: once at a higher level, we continue operating there. Errors lead to the fault-tolerance handling acting upon the system. It moves our operating level down — by resetting and restarting the affected processes — and continues operation at the lower level.

用了這樣子多個 operational mode 的設計，系統較容易得到「容錯」的特性。

A system assuming the presence of other systems has an unwritten dependency chain. You have to boot your production systems in a certain order or things will not work. This is often bounds for trouble.

(5) Processes are the units of error encapsulation --- errors occurring in a process will not affect other processes in the system. We call this property strong isolation.

關於如何將軟體模塊化總是有許多不同的爭議。編譯器設計者通常把硬體想象成完美的，並且主張由通過靜態編譯時型別檢查來提供良好的隔離性。與編譯器設計者們相反，作業系統設計者們則主張運行期檢查，並主張將進程做為保護單位與故障單位。