程式人蔘: Erlang 的啟發 --- part 4

Friday, January 11, 2019

Erlang 的啟發 --- part 4

全互連網路的限制

基本的 Erlang 叢集配置是全互連網路，叢集內的每一個節點都跟所有的其它節點相連。也因此，叢集的大小通常會受限於大約 50 個節點這個大小：超過這個數量之後，用來保持節點互相知道彼此存在的訊息 (message) 數量會太多，開始影響到真正工作用的訊息。

full-mesh network

The basic arrangement in a distributed Erlang cluster is a full-mesh network; every node connected to every other. As a result, cluster sizes are typically limited to somewhere around 50: in this area (depending on hardware, network, user code, and so on) the number of messages being sent through the cluster just to keep it functioning starts to overwhelm a node’s ability to do real work. In other words, heartbeats start contesting with RPCs for VM time and bandwidth, which results in a flaky cluster.

分布式計算繆誤導致的問題

1. The Network is Reliable
考算網路不穩定之後，就可以想到「跨越 node 來設定 linking 或是 monitoring 」可能導致嚴重的副作用。
Linking and monitoring across nodes can be dangerous. In the case of a network failure, all remote links and monitors are triggered at once. This might then generate thousands and thousands of signals and messages to various processes, which puts a heavy and unexpected load on the system.

2. There is no Latency
記得設定 timeout

3. Bandwidth is infinite
因為頻寬不是無限的，跨越節點發送太大的訊息，有可能影響到節點彼此之間的 heartbeat ，進而讓節點之間互相以為對方已死。
If, for some reason, you need to be sending large messages, be extremely careful. The way Erlang distribution and communication works over many nodes is especially sensitive to large messages. If two nodes are connected together, all their communications will tend to happen over a single TCP connection. Because we generally want to maintain message ordering between two processes (even across the network), messages will be sent sequentially over the connection. That means that if you have one very large message, you might be blocking the channel for all the other messages.