Friday, January 5, 2018

git log analysis --- find out the hot spot

Clojure 寫的 https://codescene.io/
它是一個 on-line 的 data analyzing tool 。分析的目標是 git commit  log 。

它可以回答下列幾個我覺得對於 technical manager 會有意義的問題:
1  How shall we prioritize improvements to our codebase?
2  How can we follow-up on the effects of the improvements we do? Did
we really get an effect out of that design improvement?

它分析 git log 的方式,是透過一種 hot spots 的分析,找出 git repo 裡變動頻率最高、相對的行數也特別多的檔案,這些檔案是 git repo 裡的 hot spots 。

同時,hot spots 往往也是
(1) 最容易產生 bug
(2) 最有可能需要、或是值得重構、
(3) business value 最高、
(4) 最容易造成 delivery delay
(5) 最值得做 code review 的區塊。

hot spots 可以如何使用呢?
  • Developers use hotspots to identify maintenance problems. Complicated code that we have to work with often is no fun. The hotspots give you information on where those parts are. Use that information to prioritize re-designs.
  • Hotspots points to code review candidates. At Empear we’re big fans of code reviews. Code reviews are also an expensive and manual process so we want to make sure it’s time well invested. In this case, use the hotspots map to identify your code review candidates.
  • Hotspots are input to exploratory tests. A Hotspot Map is an excellent way for a skilled tester to identify parts of the codebase that seem unstable with lots of development activity. Use that information to select your starting points and focus areas for exploratory tests.

-------------
針對這個透過 git log 來找出 hot spot 的工具,再三思考後,我覺得它本質上還是有點像 unit test, static type system 這樣子的東西,它基本上還是輔助用的東西,可以提供「提示」卻無法提供「洞察」。

故意舉一個極端的案例:如果我寫 web application 使用了 ORM 。ORM 也許不會要常常修改,要手動改的程式碼也不多。所以 ORM 相關的區塊就不會被 git log analysis 標定為 hot spot 。然而,ORM 的本質卻是極致的 complex ,因為它同時把物件與 SQL 纏繞在一起,絕對是極度複雜的區塊。