程式人蔘: January 2019

Thursday, January 31, 2019

重拾 Datomic

這些日子接了一個專案，打算用 Luminus 來做 web application 。前端考慮了幾個選項之後，覺得還是 ClojureScript + re-frame 是最先進的選項。資料庫自然還是要用 Datomic 才是最合用的選項。

Datomic 參考資料
1. Leran Datalog Today --- 互動式的教學網站，可以練習 Datomic 搭配的 DSL: Datalog
2. Missing Link Datomic Tutorial --- 比官網的 Datomic tutorial 易懂，因為省略了太細的細節
3. How to setup Datomic Free with Clojure ---- 看完了 tutorial 總是要自己實作一下，參考這個來實作最快
4. Using Datomic in your app: a practical guide --- 使用 Datomic 的實務經驗談
5. The ten rules of schema growth --- 使用 OLTP 資料庫常遇到的問題： schema migration ，該怎麼處理呢？

適合什麼情況？
哪些情況適合用 Datomic ？ Datomic 適用於 OLTP 的應用、適合於 iterative development ，也就是說，如果一開始對於問題的細節有很多不了解，會有許多難以預測的新 schema 變動， Datomic 是很不錯的選項。

關於 Datomic Schema 的兩個特別的心得：
(1) Datomic Schema 不需要有獨立平行的版本控管 migration file 來管理。
Datomic is uniquely suited for iterative development. Change is easy, due to the granular data model and small but powerful schema. And change is always tracked within the database itself, so you do not need a parallel infrastructure of version-controlled migration files as your application evolves.

(2) Schema 安裝是 idempotent ，所以可以放到 server startup code
In Datomic, installing your schema consists of submitting a regular transaction. Attribute installation transactions are idempotent, so you can just write your schema installation transaction in your application code and transact in your server startup code.

Tuesday, January 15, 2019

處理 Clojure 的模糊依賴 (confusing dependencies) 問題

我用 Clojure 的 Luminus 開發 web application 已經是第二回了。連續兩回都在剛起步的時候，就撞到了類似的問題：模糊依賴 (confusing dependencies) 問題

這個問題是這樣子產生的：
開發軟體總是會在 project.clj 裡的 :dependencies 引用不少需要用到的函式庫。而這些函式庫在開發的時候，也往往會有自己的依賴項。當第二層之後的依賴 (implicit dependencies) 之間有重複的依賴項，而且這些依賴項的新舊版本還不同的時候，就有可能發生，因為依賴的版本最後沒有辦法自動選定出一個大家都可以共用的，而導致程式無法啟動的錯誤。

比方說：
project.clj -> A -> C [version 1.0]
project.clj -> B -> C [version 1.5]

在這個例子，C 這個依賴就是一種模糊依賴 (confusing dependencies) 。

所幸，有一個很好用的指令，可以幫我們快速地搞定這個問題：

lein deps :tree

下了這個指令，它就會秀出 project 裡所有的依賴，不論是 explicit dependency 或是 implicit dependency 。同時它還會提出建議，建議說，要適度地善用 exclusions 來手動指定/選用依賴。

Saturday, January 12, 2019

Erlang 的啟發 --- part 5

如果要做分散式系統，可以有哪些選項？
(1) FaaS
(2) Microservices + PaaS
(3) Elixir umbrella apps

有時候想想這個問題，似乎就跟做網站/網頁/ web applications 要用什麼解法一樣？
(1) 找個 wix/weebly 之類的 site builder --- 全圖型化操作
(2) Drupal/Wordpress/Joomla + shared hosting
(3) PHP/RoR/Node.js

=> 愈往上就愈依賴特定公司的 solutions 、愈下方則愈接近 hacker 的 solutions 。

< 如何在 Applications 增加 periodic task >
之前有一回我布署了一個 Riemann 的程式、這個程式它需要一個 periodic task 。那時我沒有多想，就隨手寫了 python script ，然後用 cronjob 來布署。日後再想想那回做的東西，覺得似乎不是很好的作法。

既然 Riemann 是用 Clojure 寫的。 periodic task 其實我直接在 Riemann 生成 thread 就可以處理了。把明明可以用 application 做完的事、緊密相關的服務、需要一併布署的功能，拆到用 OS 的 cronjob 來做 --- 在開發上容易一些些，因為我對 cronjob 比較熟悉一些、但是在布署上，似乎是更麻煩。

所以兩種不同的解法主要來自於兩個問題：
(1) Development complexity v.s. Operational complexity
(2) Law of the instrument --- If all you have is a hammer, everything looks like a nail.

Friday, January 11, 2019

Erlang 的啟發 --- part 4

全互連網路的限制

基本的 Erlang 叢集配置是全互連網路，叢集內的每一個節點都跟所有的其它節點相連。也因此，叢集的大小通常會受限於大約 50 個節點這個大小：超過這個數量之後，用來保持節點互相知道彼此存在的訊息 (message) 數量會太多，開始影響到真正工作用的訊息。

full-mesh network

The basic arrangement in a distributed Erlang cluster is a full-mesh network; every node connected to every other. As a result, cluster sizes are typically limited to somewhere around 50: in this area (depending on hardware, network, user code, and so on) the number of messages being sent through the cluster just to keep it functioning starts to overwhelm a node’s ability to do real work. In other words, heartbeats start contesting with RPCs for VM time and bandwidth, which results in a flaky cluster.

分布式計算繆誤導致的問題

1. The Network is Reliable
考算網路不穩定之後，就可以想到「跨越 node 來設定 linking 或是 monitoring 」可能導致嚴重的副作用。
Linking and monitoring across nodes can be dangerous. In the case of a network failure, all remote links and monitors are triggered at once. This might then generate thousands and thousands of signals and messages to various processes, which puts a heavy and unexpected load on the system.

2. There is no Latency
記得設定 timeout

3. Bandwidth is infinite
因為頻寬不是無限的，跨越節點發送太大的訊息，有可能影響到節點彼此之間的 heartbeat ，進而讓節點之間互相以為對方已死。
If, for some reason, you need to be sending large messages, be extremely careful. The way Erlang distribution and communication works over many nodes is especially sensitive to large messages. If two nodes are connected together, all their communications will tend to happen over a single TCP connection. Because we generally want to maintain message ordering between two processes (even across the network), messages will be sent sequentially over the connection. That means that if you have one very large message, you might be blocking the channel for all the other messages.

Wednesday, January 9, 2019

Erlang 的啟發 --- part 3

取自 Joe Armstrong 的論文：

非功能特性 (non-functional property)
錯誤恢復、運行時修改系統的程式碼是許多真實系統需要的兩項典型的非功能特性。一般的程式語言和系統對於撰寫已經定義好的功能行為程式碼，提供了強力的支持，但是對非功能性部分的支持卻很貧乏。

應用作業系統 (application operating system)
從某種意義上講，作業系統提供了「被程式語言設計者遺忘的事」 (what the programming language designer forgot) 。然而，在 Erlang 這樣的程式語言中，操作系統是幾乎不需要的。 OS 提供給 Erlang 的只有驅動程式，而 OS 的其它服務諸如進程、消息傳遞、進程排程、內存管理這些功能， Erlang 都不需要。

用 OS 機制來彌補程式語言不足帶來的問題是：「 OS 底層機制不能夠輕易地被改變」。

另一方面，通過提供 lightweight processes 和錯誤檢測和處理的基本機制 (primitive mechanisms)，應用程式的編寫者就可以輕易地設計和實現他們自己需要的應用作業系統 (application operating system)。這種應用作業系統是專為他們的特定問題的特徵而特別設計的。 OTP 系統 (用 Erlang 編寫的一個應用程式) 便是其中的一個例子。

Tuesday, January 1, 2019

Erlang 的啟發 --- part 2

研讀 Joe Armstrong 的 Erlang 論文，在最後的附錄，發現了 Programming Rules and Conventions 。也是少數讓我大受啟發的 rules and conventions

比方說： Don't make assumptions about what the caller will do with the results of a function

在 WrongSample 的例子， error string 會直接列印於標準輸出。而在 CorrectSample 的例子， error descriptor 會傳回給使用 module 的 application 。application 可以選擇去使用 error_report 這個函數，或是不去使用它。重點在於： application 才有權力去決定，如何做錯誤處理。