Showing posts with label design. Show all posts
Showing posts with label design. Show all posts

Friday, January 11, 2019

Erlang 的啟發 --- part 4

全互連網路的限制

基本的 Erlang 叢集配置是全互連網路,叢集內的每一個節點都跟所有的其它節點相連。也因此,叢集的大小通常會受限於大約 50 個節點這個大小:超過這個數量之後,用來保持節點互相知道彼此存在的訊息 (message) 數量會太多,開始影響到真正工作用的訊息。

full-mesh network


The basic arrangement in a distributed Erlang cluster is a full-mesh network; every node connected to every other. As a result, cluster sizes are typically limited to somewhere around 50: in this area (depending on hardware, network, user code, and so on) the number of messages being sent through the cluster just to keep it functioning starts to overwhelm a node’s ability to do real work. In other words, heartbeats start contesting with RPCs for VM time and bandwidth, which results in a flaky cluster.

分布式計算繆誤導致的問題

1. The Network is Reliable
考算網路不穩定之後,就可以想到「跨越 node 來設定 linking 或是 monitoring 」可能導致嚴重的副作用。
Linking and monitoring across nodes can be dangerous. In the case of a network failure, all remote links and monitors are triggered at once. This might then generate thousands and thousands of signals and messages to various processes, which puts a heavy and unexpected load on the system.

2. There is no Latency
記得設定 timeout

3. Bandwidth is infinite
因為頻寬不是無限的,跨越節點發送太大的訊息,有可能影響到節點彼此之間的 heartbeat ,進而讓節點之間互相以為對方已死。
If, for some reason, you need to be sending large messages, be extremely careful. The way Erlang distribution and communication works over many nodes is especially sensitive to large messages. If two nodes are connected together, all their communications will tend to happen over a single TCP connection. Because we generally want to maintain message ordering between two processes (even across the network), messages will be sent sequentially over the connection. That means that if you have one very large message, you might be blocking the channel for all the other messages.

Friday, December 28, 2018

How to build stable systems

 How to build stable systems 算是一篇集大成的文章,內容同時涵蓋了軟體開發的各種層面,並且提出作者認為合理、有效的作法。

在「前置準備」方面,他主張,『如果要試用新的技術,只能做單一賭博』不要一口氣嘗試太多技術,過度增加專案的危險性。
A project usually have a single gamble only. Doing something you’ve never done before or has high risk/reward is a gamble. Picking a new programming language is a gamble. Using a new framework is a gamble. Using some new way to deploy the application is a gamble. Control for risk by knowing where you have gambled and what is the stable part of the software. Be prepared to re-roll (mulligan) should the gamble come out unfavorably.

在「系統規畫」方面:
由於作者來自 erlang 的背景,他的思維是以 micro-services 做為預設值,所以他主張,模組與模組之間要透過 protocol 來溝通。( 注意:傳統的系統規畫通常是建議,系統先做成 monolithic,模組與模組之間先透過 API 來溝通,之後再逐步視需要,將模組變成獨立運作的 service。)
Your system is a flat set of modules with loose coupling. Each module have one responsibility and manages that for the rest of the software. Modules communicate loosely via a protocol, which means any party in a communication can be changed, as long as they still speak the protocol in the same way. Design protocols for future extension. Design each module for independence. Design each module so it could be ripped out and placed in another system and still work.

他主張 end-to-end principle 。只有端點需要有複雜的邏輯。
In a communication chain, the end points have intelligence and the intermediaries just pass data on. That is, exploit parametricity on mulitple levels: build systems in which any opaque blob of data is accepted and passed on. Avoid intermediaries parsing and interpreting on data. Code and data changes over time, so by being parametric over the data simplifies change.

在「設置檔」方面,他主張「除非系統真的超大,不然,並不需要全然動態的『設置檔』」所以不需要太早使用 etcd/Consul 之類的,把設置檔放在 S3 即可。
Avoid the temptation of too early etcd/Consul/chubby setups. Unless you are large, you don’t need fully dynamic configuration systems. A file on S3 downloaded at boot will suffice in many cases.

Tuesday, December 25, 2018

Erlang 的啟發

(1) Remote procedure call 不是一個好的 idea
What seems like a good, simple idea on the surface -- hiding networks and messages behind a more familiar application-development idiom -- often causes far more harm than good.

Request-response is a network-level message exchange pattern, whereas RPC is an application-level abstraction intended.

Equating RPC with synchronous messaging means the later wrongly suffers the same criticisms as RPC.

RPC 主要有兩個問題:
(1) 無法提供合理的抽象層來處理 network failure
(2) 讓工程師沒有注意到透過網路呼叫程序的代價,容易設計出 response time 過長的系統。

Erlang 沒有提供「遠端程序呼叫」。Erlang 只有提供 message passing 和 failure handling 的 primitives ,即 send/receive 和 link 。

(2) Scalable, fault-tolerant, hot-upgradable 的共同點
一個系統同時若要同時滿足上述三件事,其實需要實作下列的 primitives
(a) detect failure
(b) move state from one node to another

When designing a system for fail-over, scalability, dynamic code upgrade we have to think about the following:

What information do I need to recover from a failure?
How can we replicate the information we need to recover from a failure?
How can we mask failures/code_upgrades/scaling operations from the clients

(3) Cooperative multitasking is fragile
協作式多工是「脆弱」的設計。( 但是,cooperative 的效能通常會比 preemptive multitasking 好一些。)

The weakness of the cooperative model is its fragility in server settings. If one of the tasks in the task queue monopolizes the CPU, hangs or blocks, then the impact is worse throughput, higher latency, or a deadlocked server. The fragility has to be avoided in large-scale systems, so the Erlang runtime is built preemptively. The normal code is “instrumented” such that any call which may block automatically puts that process to sleep, and switches in the next one on the CPU.

(4) Stacking Theory for Systems Design
對 service 的可用性,定義出數個不同的 mode of operation ,比方說:
VM 有 run 起來,這就是 level 0 。
database connection 有通,這就是 level 1 。

如果某些條件達成,系統就會轉換到高一級的 operational level 。如果 error 發生,就自動轉換低一級的 operational level 。

By "stacking" service, we can go back to level 0 and start best-effort transitions to level 1 again. This structure is a ratchet-mechanism in Erlang systems: once at a higher level, we continue operating there. Errors lead to the fault-tolerance handling acting upon the system. It moves our operating level down — by resetting and restarting the affected processes — and continues operation at the lower level.

用了這樣子多個 operational mode 的設計,系統較容易得到「容錯」的特性。

A system assuming the presence of other systems has an unwritten dependency chain. You have to boot your production systems in a certain order or things will not work. This is often bounds for trouble.

(5) Processes are the units of error encapsulation --- errors occurring in a process will not affect other processes in the system. We call this property strong isolation.

關於如何將軟體模塊化總是有許多不同的爭議。編譯器設計者通常把硬體想象成完美的,並且主張由通過靜態編譯時型別檢查來提供良好的隔離性。與編譯器設計者們相反,作業系統設計者們則主張運行期檢查,並主張將進程做為保護單位與故障單位

Wednesday, September 5, 2018

The Legacy Code Change Algorithm

  • Identify change point
  • Identify test point
  • Break dependencies
  • Write test
  • Make change and refactor
Notice that the LAST thing you do is write new code. Most of your work with legacy code is understanding and carefully selecting where to make changes so as to minimize risk and minimize the amount of new test code you have to write to make your target change safely.

參考資料

Friday, August 24, 2018

OO design for testability (Miško Hevery) --- 心得(續)

這兩天重看了一篇 OO design for testability (Miško Hevery) 的前 15 分鐘,又有新的領悟。

本來我覺得整個 talk 的重點是在於四個常見的錯誤不要犯,重新讀一次之後,我覺得重點是更抽象的設計原則:

設計 OO 的程式的時候,要把 class 分成兩大類:
(1) 一類是 Object Graph Contruction & Lookup ,比方 Factory ,它會有 new  operator。
(2) 一類是 Business Logic ,它會有 conditionals 和 loop 

依照這種方式設計,如果要測試 Business Logic 的時候,就可以透過 test unit ,重新 wire 一些 fake 的物件去測試 Business Logic 。

而如果要測試 Factory object ,也可以獨立於 business logic 來做測試。



可以測試的程式應該要有性質是:你可以單單只依賴整套軟體組裝起來的方式,就控制軟體的控制流 (You can control the path flow of your software purely by the way the application is wired together.)

於是又提到了一個專有名詞 Seam
A seam is a place where you can alter behavior in your program without editing it in that place.

跟 Seam 相關的另一個詞是 enabling point
Every seam has an enabling point, a place where your can make a decision to use one behavior or another.

Object Seam


附帶一提,我後來針對 seam 去查資料,發現 seam 出自一本書 WELC。在 Working Effectively with Legacy Code 裡,除了作者最建議的 Object seam 之外,作者還介紹了其它兩種 seam:
preprocessing seam 和 link seam 。

object seam 的 enabling point 是「我們生成待測物件的位置

link seam 在 Java 可以透過 classpath 環境變數做為 enabling point 。原文對 link seam 的描述:
The enabling point for a link seam is always outside the program text. Sometimes it is in a build or a deployment script. This makes the use of link seams somewhat hard to notice.

preprocessing seam 的 enabling point 則是 preprocessor define 語法

參考資料:
(1) Testing Effectively With Legacy Code
(2) How to use link seams in CommonJS
(3) Seams in Javascript







Saturday, July 28, 2018

OO design for testability (Miško Hevery)

偶然看到的 youtube 影片--- OO design for testability。對我的幫助滿大的,看完之後,有點讓我豁然開朗,發現:「喔,原來物件導向的程式要這樣子設計!」「原來 dependency injection 的用法是這樣子用!」

影片一開頭不久,就指出了有四大常見的錯誤,一旦犯了,就可以寫出很難測試的程式:
1. Location of the new operator
2. Work in constructor
3. Global State
4. Law of Demeter violation

在明確的去說明四大常見錯誤之前,講者做了一段基本的測試概述:你寫了一個物件,然後你要測試它。測試它,於是就遇上了難題:因為它有許多的「依賴」,而這些「依賴」產生的方式有許多不同的方式:
(a) 依賴可能是 global 物件
(b) 依賴可能是在 class 內 new 出來的物件
(c) 依賴可能是注入的物件
只有 (c) ,不會導致嚴重的測試問題。

錯誤 1 & 錯誤 2 
不容易測試的範例:
class Car {
  Engine engine;
  Car(File file) {
    String model = readEngineModel(file);
    engine = new EngineFactory.create(model);
  }

  // ...
}

=============================================
容易測試的範例:
class Car {
  Engine engine;
  @Inject
  Car(Engine engine) {
    this.egine = engine;
  }
}

@Provides
Engine getEngine(
    EngineFactory engineFactory,
    @EngineModel String model) {
  return engineFactor.create(model);
}
=============================================

由上述的反思可以推論出:
(*) 在 constructor 內部,應該要儘量少做事、唯一應該寫在 constructor 裡的程式碼,就是把 dependencies 指派給 class 內部的 field。 (In constructor, you should do as little work in constructor as possible. You should do nothing else but assign your dependencies to your fields.)

(*) 檢驗物件導向程式有沒有設計好 testability 的方式:
Look at the constructor and make sure there is no work in there. Look at the constructor to make sure that all it's doing is assigning its dependencies to the fields. Check to make sure that the dependencies that your actually save into your fields are actually the dependencies that you truly need. This is where we are going to get into law of Demeter violation.

遵循上述的 Guideline 的話,程式應該長成什麼樣子呢?
(*) 好的測試程式碼 (test code) 應該充滿 new operator 和 null
(*) 好的線上程式碼 (production code) 則應該「沒有」new  和 null

================================================
錯誤 3 (Global State) 的例子

不清不楚的 API 介面。 Deceptive API

testCharge() {
  Database.connect(...);
  OfflineQueue.start(...);
  CreditCardProcessor.init(...);
  CreditCard cc;
  cc = new CreditCar("12..34", ccProc);
  cc.charge(100);
}

================================================
明確說明依賴關系的 API 介面。 Better API

testCharge() {
  db = new Database();
  queue = new OfflineQueue(db);
  ccProc = new CCProcessor(queue);
  CreditCard cc;
  cc = new CreditCar("12..34", ccProc);
  cc.charge(100);
}
================================================
錯誤 4 (Law of Demeter violation) 

不良的例子: 使用超過一個 dot 符號
class LoginPage {
  RPCClient client;
  HttpRequest request;

  LoginPage(RPCClient client,
      HttpServletRequest request) {
    this.client = client;
    this.request = request;
  }

  boolean login() {
    String cookie = request.getCookie();
    return client.getAuthenticator().authenticate(cookie);
  }
}

==========================================
改進的範例
class LoginPage {
  ...
 
  LoginPage(@Cookie String cookie,
            Authenticator authenticaor) {
    this.cookie = cookie;
    this.authenticaor = authenticator;
  }

  boolean login() {
    return authenticator.authenticate(cookie);
  }
}

Monday, January 22, 2018

Simple Made Easy

話說我長久以來一直有個問題:『看不懂別人寫的 OOP style 程式碼。』學了 Clojure 之後,這個「看不懂」似乎有理由了。

OOP style 的東西,當一組 class/object 被定義生成時,有太多可能的語意 (semantic) ,要猜的可能性太多。

可能是:
1 values object , 在  clojure 會用單純的 hashmap 或是 records 來處理。
2 entity object, 在 clojure 會用 atom 處理。
3 function with state, 在 clojure 會用 lexical closure 來處理。
4 純脆只是 namespace 功能,為封裝而封裝
5 multimethod 的語意, 在 clojure 會用 defmulti 又或是  protocol + reify 來實現。

最近偶然又重看了一遍 Rich Hickey 的 Simple Made Easy 。發現類似的概念,他已經討論過了。在 Simple Made Easy 這個 talk 裡。object 被 Rich 歸類為 complex construct ,難以做出 simple artifact 。而太 complex 的東西會讓人腦的認知能力超載,因為要同時考慮太多元件。

該 talk 中有許多深具啟發的句子和概念:
1. All too often we do what is easy, at the expense of what is simple.
2. The benefits of simplicity are: ease of understanding, ease of change, ease of debugging, flexibility.
3. Reliability tools: testing, refactoring, type systems are good but secondary. They do not enforce simplicity. They are just safe net.
4. Build simple system by choosing construct that generate simple artifacts and by abstracting: design by answering questions related to what, who, how, where, when, and why.

Friday, January 19, 2018

Never build an Application

最近看了一篇談軟體設計的文章,文章是 Never build an Application 。主要的概念大概有幾項:
1. Never build an application. Build a library of reusable functions specific to the problem domain instead.  不要針對「需求」設計一體成形的應用程式做為解決方案。取而代之的是,要對「需求」設計一套可復用的函式庫,透過函式庫來做出解決方案。

2. 這套針對「需求」而產生的函式庫,可以視為是一種 domain specific language 。而它也可以是各種的形式: 比方說,一堆 class 可以提供的服務、一些 meta data 、 XML config 檔等。換言之,並不是一定要有特殊的語法,才可以看成是 DSL 。重點是在於,要做出一個「抽象層」,這個抽象層可以很好地對 problem domain 做很好的塑模。於是 application 就會變得很小,因為 application 是利用抽象層來做出的。

3. UML 在設計的應用上,有一個明顯的問題。它的塑模是針對 solution domain ,而不是針對 problem domain 。然而,DSL 的解法之所以有用的原因,是因為 DSL 是針對 problem domain 的 modeling 。

真的要講的話,其實這篇文章講的核心概念,和 Paul Graham 在 On Lisp 的序言中提到的 programming bottom-up 概念近乎一致。然而,我自己在實際的 clojure 程式設計經驗中,得到的心得啟發則是:

(*) REPL-driven development  是促成 programming bottom-up 的首要條件。

用不同的程式語言來開發,一方面是語言的表達能力不同。但是,更關鍵的一點是:語言對於「單元測試」的表達能力也有差異。使用 OOP 來開發時,單元測試的最小單位,往往就已經是物件,所以最小的抽象層,也往往要用 object 和 interface 來表達。 而使用 Lisp 語言的話,因為最小的測試單位是 REPL-driven development ,所以往往可以做出理論上最小的抽象層組件。

換言之,使用 Lisp 之所以可以導致 elegant design ,主要的原因在於 REPL-driven development  。跟 Lisp 擁有最強大的 macro system 的關系則比較小一些。使用其它的程式語言來開發時,只要貫徹「讓被測試的組件最小化」的原則,也一樣可以做出高品質的抽象層。