程式人蔘: 2019

Monday, November 11, 2019

使用 Datomic 要注意的事： should not expose the entity number

偶然在 stackoverflow 上，看到了這類相關的問題。

When design the system API, it's very important that your input never have any raw entity ids in it, only external ids. Below is the reason:

You don't have an entity id until you transact the entity, so you can't create an entity and all its indexes together. You must first transact the entities, see what their entity ids were, then transact the composite index.
It's no longer easy to do your own renumbering. There are datomic techniques which involve taking datoms from one database and transacting them to another, or recreating a db from its datoms or transaction log. The reason these techniques work fairly easily and robustly is because all pointers (refs) are known and can be renumbered. If you store an entity id in a value, you now need more complicated schema-specific logic to re-transact the datoms correctly.
Cognitect has said not to rely on stable entity ids because they haven't ruled out the possibility of renumbering them in the future. Right now, however, no renumbering occurs, but I would rather not tempt fate.

Thursday, October 24, 2019

nREPL debug

最近研究透過 nREPL 來除錯，發現有幾個重點

1. 透過 nREPL 可以連上 running process ，考慮安全性，通常 nREPL 只監聽 127.0.0.1

2. 如果要在 local 端開 vim 除錯，就得用 ssh 做 local port forwarding

3. 很常見的情況是，deployment machine 不可以用 ssh 直接連上，因為中間還又卡一個 kerberos 機。

這樣子的話，解法還有兩種：

(a) 我個人會採用的，是直接把 vim, vim-fireplace 灌在 deployment 機上

https://github.com/humorless/dotfiles/issues/8

(b) 使用 DrawBridge

Tuesday, October 8, 2019

Company of One book review --- 「技能與可行性測試」

書名：「一人公司」
作者：Paul Jarvis

「成長是導致許多初創企業失敗的主要原因，甚至包含很多頂級企業也是。」

「足夠是成長的相反。」

「衡量成功的標準不應該是季度利潤的增加、不斷成長的新客戶取得、或是成功的退場策略(exit strategy)。相反的是，我們可以專注於「生存策略」(exist strategy) --- 以堅持、盈利，以及盡力為客戶提供服務為基礎。」

我最欣賞的段落，節錄如下：
「多數成功的商業人士在進行主題演講時，當他們在分享他們有多明智，選擇投入更充滿熱情的工作生活時，我注意到他們沒有談論到兩個關鍵因素。第一，他們在投入之前就對他們自己所做的事很熟練，而且這些技能非常搶手。第二個遺漏的關鍵因素是，他們在攀登到最高的舞台之前，他們能夠先跨出一小步，為自己即將跨出的一大步試水溫。(確保他們提供的東西有足夠的需求。)」

這個經典的段落談了創業的關鍵要素：技能與可行性測試。

Monday, September 30, 2019

TREVOR JIM blog 啟發

快速地掃了近十篇在 TREVOR JIM blog 上的文章，覺得也滿有啟發的。

比方說：
1. Remote work is a moon shot. (大家用 remote work 就好，比自動駕駛車好多了)
2. 應該用停用 C/C++ ，因為這類的語言對 security 是一大傷害。應該要用 memory safe 的語言。
3. Parsing is the weakest link in software security. Parsing security 甚至比 buffer security 更重要！

Tuesday, September 24, 2019

Clojure made simple / What is good software?

Clojure made simple 是 Rich Hickey 用來解釋 Clojure 的 value proposition 的一部影片，仔細看過之後，我覺得影片一開始 10 分鐘的分析，非常的深刻。它提供了 Rich Hickey 對於 good software 的看法。

開頭沒有多久，Rich Hickey 就提到寫程式是一種 economic activity ，換言之，你是有雇客或是老闆的。那…雇客/老闆要什麼？他們要的東西就是兩件事：
(1) Something good
(2) Soon

那怎樣子的軟體算是 Something good 呢？定義該是什麼？Rich Hickey 定了三個條件：(Rich Hickey 有特別去強調， type 或是 tests 只是達成目標的手段之一，不能成為目標的定義。)
(a) Does what it is supposed to do
(b) Meets operational requirements
(c) Is flexible enough to accommodate change

然後，這三個 good software 的構成條件又可以拆解開來：
首先對於 (a)
It is very difficult to determine if large or elaborate or stateful programs do what they supposed to do.
而 Clojure 設計的目標之一，就是 making it easier to understand whether or not your program is going to do what it is supposed to do by making it substantially smaller and also by making it more functional.

對於 (b) ，有下列三個目標：
=> 1. Deployment/environment
=> 2. security
=> 3. performance
Clojure 可以透過 host 在 Java/Javascript 上來達成。

對於 (c) ，達成目標的重點在於 loose coupling

Monday, September 2, 2019

Million Dollar Consulting book review

用一句話來做全書的總結的話，我會說，這本書是「彼得．杜拉克 x 一人諮詢事業」。彼得杜拉克的許多抽象的管理學概念，都在一人諮事業的前提下，有了最佳實務。比方說，彼得．杜拉克認為，公司最重要的兩件事，就是 innovation + marketing 。作者 Alan Weiss 也是強調，要不停地發展獨特的 IP (intellectual property) ，這就印証了 innovation 的部分。Marketing 的部分，全書的許多章節也一再地解釋如何「創造 buyer 的需求」。

(1) 個人網站，隨便做，放個基本介紹即可。

Why? consulting 要開發業務的話，比較有效的方式，是要透過人與人的接觸，才會有足夠的信任感，讓客戶考慮你。準備網站，只是讓客戶要查資料時，更產生信心而已。如果要把網站做到可以說服客戶直接下單、要做一堆 SEO ，這樣子的成本太高了。

基本介紹包含：
1. Your name or company name and photo
2. A brief list of typical client results
3. Testimonials, preferably video and brief, from buyers
4. Your value proposition

(2) 書中有提了許多強化 marketing gravity (行銷吸引力)，但是，Alan Weiss 毫不猶豫地提到，初期會有效的招式，只有 speaking, networking, referrals 這三招而已。這三招都是有人與人直接面對面的招式。

(3) 書裡有一個章節在教 proposal 。Alan Weiss 主張，如果你的 consulting 幫客戶創造十元的價值，你應該要在 proposal 裡把這件事寫出來。並且在 proposal 裡寫，我要 charge 1 元。作者還寫了一本 Million Dollar Proposal

(4) 書中主張，要做 consulting 的生意，要隨著時間，labor (勞動)愈來愈少、但是 fee (收費)愈來愈高。其中，業務開發 (acquisition) 就是一種巨大的勞動。而 referrals 可以有效地降低 acquisition 的 labor 。 referrals 就是一種極為重要的業務開發槓桿，而書中也有專門的章節談 referrals 的技巧。作者還寫了一本 Million Dollar Referrals。

(5) 營收的成長模型

作者認為，每年應該追求 80% 的 repeat business 與 20% new business 。同時，每 18 個月，淘汰品質最差的 15% 的營收。

(6) 利潤的分配

如果說，consulting business 有跟 partner 組隊開發。作者認為，功勞 (也就是利潤)的分配應該如下：

Acquisition = 50%, Methodology = 30%, Delivery = 20%

Sunday, June 16, 2019

Architecture: Cut horizontally v.s. Cut vertically

關於 software architecture ，我最近發現了有一塊，過去我總是沒有想得很清楚的地方： cut horizontally 與 cut vertically 的比較。仔細想想，這個也不是什麼太新的觀念。一般來講，像公司內部的分工，也有類似的兩種概念

cut vertically 近似於公司切分事業部的概念： A 部門負責 a 產品，所以 A 部門就盈虧自負。在 software 就是切分出 Component, bounded context 。也可以稱之為 package by component/ package by feature

cut horizontally 比較近似於公司內統一採購、發包的概念：公司完成工作，有時會透過與第三方協力廠商共同完成。與第三方的關係，會透過統一的採購、發包專人來做。這在 software 就是切分出 Application, Adapter 的層次。可稱之 package by layer 。 cut horizontally 比較難找到很巧妙的類比，不過，對照 cut vertically 之後，應該比較容易想象。

無論是 cut vertically 或是 cut horizontally ，都要設法讓 components 或是 module 可以解耦(decoupled)

在 cut horizontally 的情況，應用的技巧，主要是 Dependency Injection 。
然而在 cut vertically 的情況，應用的技巧則是 events/shared kernel 等。

Monday, April 15, 2019

tmux session manager --- tmuxp

不知不覺中，我開發 web application 的時候，會需要用 tmux 開啟多個背景視窗。比方說，一個用來啟動 frontend 的 npm run start 。另一個用來啟動 clojure 的 lein repl 。每次都要重複下一些固定的指令，也滿煩人的。所幸，早就有工具來處理我遇到的問題 --- tmux session manager。

我安裝的是 python 版本的，因為我在 ubuntu 上安裝 ruby/gem 並沒有想象中的好裝，我二話不說看看有沒有 python 或是 nodejs 版的。結果實驗的結果是 python 的 tmuxp 最容易安裝。

安裝好 tmuxp 之後，前前後後就是用到二個 tmuxp 的指令：
1. 先用傳統的老方法，手動把 tmux session 建立起來，並且切割視窗，跑不同的程式。
然後，下一個指令，把現在的 tmux session 寫到檔案裡, 記得把檔案存到 ~/.tmuxp/ 資料夾下，之後會比較簡單。
tmuxp freeze session-name

2. 之後，對 freezing 完的 yaml 檔做一些適當的修改。要再啟動時，下一個指令
tmuxp load session-name

Saturday, March 30, 2019

vim skill & use cases

有幾個我最近才搞懂的 vim 技巧，搭配上使用情境 (use cases) 之後，這些技巧也就不再那麼難以理解了。

< 多重剪貼簿 >
registers 本質上就是 multiple clipboard 。但是，其實平常我也不會特別想要同時使用多個剪貼簿。這個好用的點，通常是當 yank 與 delete 衝突時，就很方便。

常發生的情況是這樣子：

yank 第 15 行，打算 paste 到 28 行。 => yy
在 paste 之前，在 27 行做行刪除。　=> dd
在 28 行按下 p 時，失敗！因為貼上的並不是原先的第 15 行。

解法是這樣子，最後一個操作要改成 => "0p
"0 這個 register 總是會放入 yank 的資料，如果不指定 register 的話，會使用 default register ，但是在上述的情況， default register 會被 delete 的資料給填充。

輸入 :reg ，就可以看到各個 registers 的內容。

< 重構程式碼、變數重新命名 >
這種時候，用 vimgrep 似乎頗合用。

:vimgrep /PATTERN/ ** => 在當前目錄與其子目錄下，找出所有的 PATTERN

搭配指令
:cnext, :cn => 去下一個找到 PATTERN 的 buffer
:cprevious, :cp => 去上一個找到 PATTERN 的 buffer
:copen => 打開 Quickfix 窗口，顯示所有結果
:cclose, :ccl => 關閉 Quickfix 窗口

< 不使用 Ctrl - P 的 fuzzy search >

:e **/*部分檔名
:vsp **/*部分檔名

原理的部分，可以查 :help starstar-wildcard

Tuesday, February 19, 2019

用 Graalvm 製作 clojure native image

在 ubuntu 18 做的實驗：

1. 下載、安裝 Graalvm

2. 在 ~/.lein/profiles.clj 裡增加

{:user {:plugins [[io.taylorwood/lein-native-image "0.3.0"]]
        :native-image {:graal-bin "/path/to/graalvm-1.0.0-rc1"}}}

如此，就可以使用 lein native-image 這個指令

3. Graalvm 需要下列的套件

sudo apt-get install gcc

sudo apt-get install zlib1g-dev

4. lein native-image 之後，就可以得到 native image 。

Friday, February 8, 2019

從 SQL 到 Datomic query (datalog)

要理解新的概念，一般而言，用舊的、已經熟悉概念來加以連結新的概念，會是比較容易的方法。另一方面，要讓人們去接受新的解決方案，先確定舊的問題都可以在新的解法中解決，也是有效增加人們信心的作法。

當我使用 SQL 來開發 web application 時，最常用的 SQL query 是什麼呢？其實未必是複雜的 join operation 。最常用的，反而是樸實無華的查看單一或是多個 rows：
(1) SELECT * FROM A
(2) SELECT * FROM A WHERE A.col = b

那麼，這麼簡單的 SQL query ，如果是在 Datomic 的世界，又是怎麼對應呢？我在 datomic 官方的 mbrainz-sample 找出的一段 query 。用了之後，覺得非常容易可以對應上述的兩種情境。sample code 在下方

舉例來說明：
如果有一個「使用者」的概念，要用資料庫加以建模。使用者有電子郵件、密碼、名字三種屬性。同時，我們需要一個資料庫查詢 (query) ，可以根據電子郵件來查出對應的使用者

1. 用 RDB 來建模的話，這個 sql query 會長成這樣子：
SELECT * FROM user WHERE user.email = "ecoboy@qwerty.com";

2. 用 Datomic 來建模的話，一旦利用上述的 utility functions ，這個 query 就可以用下列的函數產生。
(find-by db :user/email "ecoboy@qwerty.com")

附註：
(a) 在 SQL best practices 裡，因為要考慮效率，往往不建議用 select * 這種 query 直接放在 production code 裡。
(b) 在 Datomic best practices 裡，因為 datomic 的 query 並不需要往返 client-server ，同時 Entity 有 lazy evaluation 的特性，一般而言，推荐的作法是直接取回 Entity ，相當於 SQL 裡的 row 概念。

Monday, February 4, 2019

From REST to CQRS with Clojure, Kafka, and Datomic

Clojure/conj 2015 年的 conference talk --- From REST to CQRS with Clojure, Kafka, and Datomic 是我 2019 年所看到第一篇深受啟發的 talk ，也有 youtube 影片。

首先作者先探討 Restful API 的一些問題：
1. modeling - jamming every operation into CRUD on some resources is often unnatural, creating impedance mismatch.
Restful API 可以視為是模仿 Database 的 CRUD 介面 (post, get, patch, put) ，然而，這個 modeling 在本質上，未必適合對各式各樣的問題做建模。
2. mutability semantics
Restful API 隱含了 mutable 語意
3. the kingdom of nouns for distributed system
最後就是有太多的 API ，然後需要有類似 swagger 之類的 API document 系統 …
4. integration
integration by API 比 integration by ETL 好一些，但是，還是有許多缺點

然後，作者 (Bobby Calderwood) 提出了一個應用 CQRS + Event sourcing 來設計系統的架構 (Commander Pattern)，如下圖：

新的架構處理了許多舊有的問題：
1. modeling
web services 的部分只對 communication semantic 做建模，所以只有三種，而非 resources * CRUD 。communication semantic 是指 /command /update /query 三種語意。
2. immutable semantics
/command API 會寫入 command log 。這個 command log 對使用者的 intention 做完整地記錄，相較於儲存進資料庫裡的資料， command log 有更完整的 command story
3. too many APIs
因為新架構的 http endpoint 只有三種： /command /update /query 。
4. integration
新的架構裡可以 integration by events ，因為有儲存下 command log 。要整合的時候，其它系統去訂閱 command log 就可以完整地拿到系統的 states

心得感想：
(*) end-to-end principle
在這個架構裡， Web services 層只用來處理 communication 。 Restful API 的 resources 名稱，其實算是一種 domain semantics，應該要放到 Business Logic 層來實現。這其實很像網路世界的 end to end principle ，因為 Client 與 Business Logic 就像是溝通的兩個端點，針對應用 (application) 設計的種種特性 (features)，應該要在這兩端來加以實現，而不是在中間的 Web services 層實現。

(*) We shape our tools and thereafter our tools shape us.
Restful API 的設計，有點像是使用了 SQL/relational database 之後的思維模型會構思的設計。而這個 Commander Pattern 則是使用了 Datomic 之後的思維模型會構思的設計。

Thursday, January 31, 2019

重拾 Datomic

這些日子接了一個專案，打算用 Luminus 來做 web application 。前端考慮了幾個選項之後，覺得還是 ClojureScript + re-frame 是最先進的選項。資料庫自然還是要用 Datomic 才是最合用的選項。

Datomic 參考資料
1. Leran Datalog Today --- 互動式的教學網站，可以練習 Datomic 搭配的 DSL: Datalog
2. Missing Link Datomic Tutorial --- 比官網的 Datomic tutorial 易懂，因為省略了太細的細節
3. How to setup Datomic Free with Clojure ---- 看完了 tutorial 總是要自己實作一下，參考這個來實作最快
4. Using Datomic in your app: a practical guide --- 使用 Datomic 的實務經驗談
5. The ten rules of schema growth --- 使用 OLTP 資料庫常遇到的問題： schema migration ，該怎麼處理呢？

適合什麼情況？
哪些情況適合用 Datomic ？ Datomic 適用於 OLTP 的應用、適合於 iterative development ，也就是說，如果一開始對於問題的細節有很多不了解，會有許多難以預測的新 schema 變動， Datomic 是很不錯的選項。

關於 Datomic Schema 的兩個特別的心得：
(1) Datomic Schema 不需要有獨立平行的版本控管 migration file 來管理。
Datomic is uniquely suited for iterative development. Change is easy, due to the granular data model and small but powerful schema. And change is always tracked within the database itself, so you do not need a parallel infrastructure of version-controlled migration files as your application evolves.

(2) Schema 安裝是 idempotent ，所以可以放到 server startup code
In Datomic, installing your schema consists of submitting a regular transaction. Attribute installation transactions are idempotent, so you can just write your schema installation transaction in your application code and transact in your server startup code.

Tuesday, January 15, 2019

處理 Clojure 的模糊依賴 (confusing dependencies) 問題

我用 Clojure 的 Luminus 開發 web application 已經是第二回了。連續兩回都在剛起步的時候，就撞到了類似的問題：模糊依賴 (confusing dependencies) 問題

這個問題是這樣子產生的：
開發軟體總是會在 project.clj 裡的 :dependencies 引用不少需要用到的函式庫。而這些函式庫在開發的時候，也往往會有自己的依賴項。當第二層之後的依賴 (implicit dependencies) 之間有重複的依賴項，而且這些依賴項的新舊版本還不同的時候，就有可能發生，因為依賴的版本最後沒有辦法自動選定出一個大家都可以共用的，而導致程式無法啟動的錯誤。

比方說：
project.clj -> A -> C [version 1.0]
project.clj -> B -> C [version 1.5]

在這個例子，C 這個依賴就是一種模糊依賴 (confusing dependencies) 。

所幸，有一個很好用的指令，可以幫我們快速地搞定這個問題：

lein deps :tree

下了這個指令，它就會秀出 project 裡所有的依賴，不論是 explicit dependency 或是 implicit dependency 。同時它還會提出建議，建議說，要適度地善用 exclusions 來手動指定/選用依賴。

Saturday, January 12, 2019

Erlang 的啟發 --- part 5

如果要做分散式系統，可以有哪些選項？
(1) FaaS
(2) Microservices + PaaS
(3) Elixir umbrella apps

有時候想想這個問題，似乎就跟做網站/網頁/ web applications 要用什麼解法一樣？
(1) 找個 wix/weebly 之類的 site builder --- 全圖型化操作
(2) Drupal/Wordpress/Joomla + shared hosting
(3) PHP/RoR/Node.js

=> 愈往上就愈依賴特定公司的 solutions 、愈下方則愈接近 hacker 的 solutions 。

< 如何在 Applications 增加 periodic task >
之前有一回我布署了一個 Riemann 的程式、這個程式它需要一個 periodic task 。那時我沒有多想，就隨手寫了 python script ，然後用 cronjob 來布署。日後再想想那回做的東西，覺得似乎不是很好的作法。

既然 Riemann 是用 Clojure 寫的。 periodic task 其實我直接在 Riemann 生成 thread 就可以處理了。把明明可以用 application 做完的事、緊密相關的服務、需要一併布署的功能，拆到用 OS 的 cronjob 來做 --- 在開發上容易一些些，因為我對 cronjob 比較熟悉一些、但是在布署上，似乎是更麻煩。

所以兩種不同的解法主要來自於兩個問題：
(1) Development complexity v.s. Operational complexity
(2) Law of the instrument --- If all you have is a hammer, everything looks like a nail.

Friday, January 11, 2019

Erlang 的啟發 --- part 4

全互連網路的限制

基本的 Erlang 叢集配置是全互連網路，叢集內的每一個節點都跟所有的其它節點相連。也因此，叢集的大小通常會受限於大約 50 個節點這個大小：超過這個數量之後，用來保持節點互相知道彼此存在的訊息 (message) 數量會太多，開始影響到真正工作用的訊息。

full-mesh network

The basic arrangement in a distributed Erlang cluster is a full-mesh network; every node connected to every other. As a result, cluster sizes are typically limited to somewhere around 50: in this area (depending on hardware, network, user code, and so on) the number of messages being sent through the cluster just to keep it functioning starts to overwhelm a node’s ability to do real work. In other words, heartbeats start contesting with RPCs for VM time and bandwidth, which results in a flaky cluster.

分布式計算繆誤導致的問題

1. The Network is Reliable
考算網路不穩定之後，就可以想到「跨越 node 來設定 linking 或是 monitoring 」可能導致嚴重的副作用。
Linking and monitoring across nodes can be dangerous. In the case of a network failure, all remote links and monitors are triggered at once. This might then generate thousands and thousands of signals and messages to various processes, which puts a heavy and unexpected load on the system.

2. There is no Latency
記得設定 timeout

3. Bandwidth is infinite
因為頻寬不是無限的，跨越節點發送太大的訊息，有可能影響到節點彼此之間的 heartbeat ，進而讓節點之間互相以為對方已死。
If, for some reason, you need to be sending large messages, be extremely careful. The way Erlang distribution and communication works over many nodes is especially sensitive to large messages. If two nodes are connected together, all their communications will tend to happen over a single TCP connection. Because we generally want to maintain message ordering between two processes (even across the network), messages will be sent sequentially over the connection. That means that if you have one very large message, you might be blocking the channel for all the other messages.

Wednesday, January 9, 2019

Erlang 的啟發 --- part 3

取自 Joe Armstrong 的論文：

非功能特性 (non-functional property)
錯誤恢復、運行時修改系統的程式碼是許多真實系統需要的兩項典型的非功能特性。一般的程式語言和系統對於撰寫已經定義好的功能行為程式碼，提供了強力的支持，但是對非功能性部分的支持卻很貧乏。

應用作業系統 (application operating system)
從某種意義上講，作業系統提供了「被程式語言設計者遺忘的事」 (what the programming language designer forgot) 。然而，在 Erlang 這樣的程式語言中，操作系統是幾乎不需要的。 OS 提供給 Erlang 的只有驅動程式，而 OS 的其它服務諸如進程、消息傳遞、進程排程、內存管理這些功能， Erlang 都不需要。

用 OS 機制來彌補程式語言不足帶來的問題是：「 OS 底層機制不能夠輕易地被改變」。

另一方面，通過提供 lightweight processes 和錯誤檢測和處理的基本機制 (primitive mechanisms)，應用程式的編寫者就可以輕易地設計和實現他們自己需要的應用作業系統 (application operating system)。這種應用作業系統是專為他們的特定問題的特徵而特別設計的。 OTP 系統 (用 Erlang 編寫的一個應用程式) 便是其中的一個例子。

Tuesday, January 1, 2019

Erlang 的啟發 --- part 2

研讀 Joe Armstrong 的 Erlang 論文，在最後的附錄，發現了 Programming Rules and Conventions 。也是少數讓我大受啟發的 rules and conventions

比方說： Don't make assumptions about what the caller will do with the results of a function

在 WrongSample 的例子， error string 會直接列印於標準輸出。而在 CorrectSample 的例子， error descriptor 會傳回給使用 module 的 application 。application 可以選擇去使用 error_report 這個函數，或是不去使用它。重點在於： application 才有權力去決定，如何做錯誤處理。