得寸进尺

《Clean Code》代码整洁之道

2020-07-11 · 29 min read
读书 方法论 笔记

前七章内容阅读与纸质书,后面的内容源于下面的链接:

代码整洁之道

HOW DO YOU WRITE FUNCTIONS LIKE THIS? 如何写出这样的函数

Writing software is like any other kind of writing. When you write a paper or an article, you get your thoughts down first, then you massage it until it reads well. The first draft might be clumsy and disorganized, so you wordsmith it and restructure it and refine it until it reads the way you want it to read.

写代码和写别的东西很像。在写论文或文章时,你先想什么就写什么,然后再打磨它。初稿也许粗陋无序,你就斟酌推敲,直至达到你心目中的样子。

When I write functions, they come out long and complicated. They have lots of indenting and nested loops. They have long argument lists. The names are arbitrary, and there is duplicated code. But I also have a suite of unit tests that cover every one of those clumsy lines of code.

我写函数时,一开始都冗长而复杂。有太多缩进和嵌套循环。有过长的参数列表。名称是随意取的,也会有重复的代码。不过我会配上一套单元测试,覆盖每行丑陋的代码。

So then I massage and refine that code, splitting out functions, changing names, eliminating duplication. I shrink the methods and reorder them. Sometimes I break out whole classes, all the while keeping the tests passing.

然后我打磨这些代码,分解函数、修改名称、消除重复。我缩短和重新安置方法。有时我还拆散类。同时保持测试通过。最后,遵循本章列出的规则,我组装好这些函数。

In the end, I wind up with functions that follow the rules I’ve laid down in this chapter. I don’t write them that way to start. I don’t think anyone could.

我并不从一开始就按照规则写函数。我想没人做得到。

  • 编程艺术是且一直就是语言设计的艺术。
  • 注释的恰当用法是弥补我们在用代码表达意图时遭遇的失败。注意,我用了“失败”一词。我是说真的。注释总是一种失败。我们总无法找到不用注释就能表达自我的方法,所以总要有注释,这并不值得庆贺。
    • Ps. 所以还是得写,因为我现在还无法避免所有的“失败”
  • 我为什么要极力贬低注释?因为注释会撒谎。也不是说总是如此或有意如此,但出现得实在太频繁。注释存在的时间越久,就离其所描述的代码越远,越来越变得全然错误。原因很简单。程序员不能坚持维护注释。

Again, we see the complimentary nature of these two definitions; they are virtual opposites! This exposes the fundamental dichotomy between objects and data structures:

我们再次看到这两种定义的本质;它们是截然对立的。这说明了对象与数据结构之间的二分原理:

Procedural code (code using data structures) makes it easy to add new functions without changing the existing data structures. OO code, on the other hand, makes it easy to add new classes without changing existing functions.

过程式代码(使用数据结构的代码)便于在不改动既有数据结构的前提下添加新函数。面向对象代码便于在不改动既有函数的前提下添加新类。

The complement is also true:

反过来讲也说得通:

Procedural code makes it hard to add new data structures because all the functions must change. OO code makes it hard to add new functions because all the classes must change.

过程式代码难以添加新数据结构,因为必须修改所有函数。面向对象代码难以添加新函数,因为必须修改所有类。

So, the things that are hard for OO are easy for procedures, and the things that are hard for procedures are easy for OO!

所以,对于面向对象较难的事,对于过程式代码却较容易,反之亦然!

In any complex system there are going to be times when we want to add new data types rather than new functions. For these cases objects and OO are most appropriate. On the other hand, there will also be times when we’ll want to add new functions as opposed to data types. In that case procedural code and data structures will be more appropriate.

在任何一个复杂系统中,都会有需要添加新数据类型而不是新函数的时候。这时,对象和面向对象就比较适合。另一方面,也会有想要添加新函数而不是数据类型的时候。在这种情况下,过程式代码和数据结构更合适。

Mature programmers know that the idea that everything is an object is a myth. Sometimes you really do want simple data structures with procedures operating on them.

Wrappers like the one we defined for ACMEPort can be very useful. In fact, wrapping third-party APIs is a best practice. When you wrap a third-party API, you minimize your dependencies upon it: You can choose to move to a different library in the future without much penalty. Wrapping also makes it easier to mock out third-party calls when you are testing your own code.

类似我们为 ACMEPort 定义的这种打包类非常有用。实际上,将第三方 API 打包是个良好的实践手段。当你打包一个第三方 API,你就降低了对它的依赖:未来你可以不太痛苦地改用其他代码库。在你测试自己的代码时,打包也有助于模拟第三方调用。

One final advantage of wrapping is that you aren’t tied to a particular vendor’s API design choices. You can define an API that you feel comfortable with. In the preceding example, we defined a single exception type for port device failure and found that we could write much cleaner code.

打包的好处还在于你不必绑死在某个特定厂商的 API 设计上。你可以定义自己感觉舒服的 API。在上例中,我们为 port 设备错误定义了一个异常类型,然后发现这样能写出更整洁的代码。

Often a single exception class is fine for a particular area of code. The information sent with the exception can distinguish the errors. Use different classes only if there are times when you want to catch one exception and allow the other one to pass through.

对于代码的某个特定区域,单一异常类通常可行。伴随异常发送出来的信息能够区分不同错误。如果你想要捕获某个异常,并且放过其他异常,就使用不同的异常类。


在此之前的内容主要是通过纸质书阅读的,笔记难以整理(其实是懒)就零散放在上面了,接下来几章主要通过大佬的在线翻译(见顶部链接)阅读的,比较好进行记录。

第八章 边界

  • 我们没有测试第三方代码的职责,但为要使用的第三方代码编写测试,可能最符合我们的利益。

Learning the third-party code is hard. Integrating the third-party code is hard too. Doing both at the same time is doubly hard. What if we took a different approach? Instead of experimenting and trying out the new stuff in our production code, we could write some tests to explore our understanding of the third-party code. Jim Newkirk calls such tests learning tests.1

学习第三方代码很难。整合第三方代码也很难。同时做这两件事难上加难。如果我们采用不同的做法呢?不要在生产代码中试验新东西,而是编写测试来遍览和理解第三方代码。Jim Newkirk 把这叫做学习性测试(learning tests)。

In learning tests we call the third-party API, as we expect to use it in our application. We’re essentially doing controlled experiments that check our understanding of that API. The tests focus on what we want out of the API.

在学习性测试中,我们如在应用中那样调用第三方代码。我们基本上是在通过核对试验来检测自己对那个 API 的理解程度。测试聚焦于我们想从 API 得到的东西。

第九章 单元测试

Some of you reading this might sympathize with that decision. Perhaps, long in the past, you wrote tests of the kind that I wrote for that Timer class. It’s a huge step from writing that kind of throw-away test, to writing a suite of automated unit tests. So, like the team I was coaching, you might decide that having dirty tests is better than having no tests.

有些读者可能会同意这种做法。或许,在很久以前,你也用过我为那个 Timer 类写测试的方法。从编写那种用后即扔的测试到编写全套自动化单元测试是一大进步。所以,就像那个我指导过的团队一样,你或许也会认为脏测试好过没测试。

What this team did not realize was that having dirty tests is equivalent to, if not worse than, having no tests. The problem is that tests must change as the production code evolves. The dirtier the tests, the harder they are to change. The more tangled the test code, the more likely it is that you will spend more time cramming new tests into the suite than it takes to write the new production code. As you modify the production code, old tests start to fail, and the mess in the test code makes it hard to get those tests to pass again. So the tests become viewed as an ever-increasing liability.

这个团队没有意识到的是,脏测试等同于——如果不是坏于的话 ——没测试。问题在于,测试必须随生产代码的演进而修改。测试越脏,就越难修改。测试代码越缠结,你就越有可能花更多时间塞进新测试,而不是编写新生产代码。修改生产代码后,旧测试就会开始失败,而测试代码中乱七八糟的东西将阻碍代码再次通过。于是,测试变得就像是不断翻番的债务。

9.3-整洁的测试

The BUILD-OPERATE-CHECK2 pattern is made obvious by the structure of these tests. Each of the tests is clearly split into three parts. The first part builds up the test data, the second part operates on that test data, and the third part checks that the operation yielded the expected results.

这些测试显然呈现了构造-操作-检验(BUILD-OPERATE-CHECK)模式。每个测试都清晰地拆分为三个环节。第一个环节构造测试数据,第二个环节操作测试数据,第三个部分检验操作是否得到期望的结果。

9.5-F.I.R.S.T

Timely The tests need to be written in a timely fashion. Unit tests should be written just before the production code that makes them pass. If you write tests after the production code, then you may find the production code to be hard to test. You may decide that some production code is too hard to test. You may not design the production code to be testable.

及时(Timely)测试应及时编写。单元测试应该恰好在使其通过的生产代码之前编写。如果在编写生产代码之后编写测试,你会发现生产代码难以测试。你可能会认为某些生产代码本身难以测试。你可能不会去设计可测试的代码。

第十章-类

在类这一章节之中,我试着将其中大部分的概念迁移到Go中的interface{}进行理解。

The name of a class should describe what responsibilities it fulfills. In fact, naming is probably the first way of helping determine class size. If we cannot derive a concise name for a class, then it’s likely too large. The more ambiguous the class name, the more likely it has too many responsibilities. For example, class names including weasel words like Processor or Manager or Super often hint at unfortunate aggregation of responsibilities.

类的名称应当描述其权责。实际上,命名正是帮助判断类的长度的第一个手段。如果无法为某个类命以精确的名称,这个类大概就太长了。类名越含混,该类越有可能拥有过多权责。例如,如果类名中包括含义模糊的词,如 Processor 或 Manager 或 Super,这种现象往往说明有不恰当的权责聚集情况存在。

内聚

  • 当类丧失了内聚性,就拆分它!

So breaking a large function into many smaller functions often gives us the opportunity to split several smaller classes out as well. This gives our program a much better organization and a more transparent structure.

所以,将大函数拆为许多小函数,往往也是将类拆分为多个小类的时机。程序会更加有组织,也会拥有更为透明的结构。

10.3-为了修改而组织

If a system is decoupled enough to be tested in this way, it will also be more flexible and promote more reuse. The lack of coupling means that the elements of our system are better isolated from each other and from change. This isolation makes it easier to understand each element of the system.

如果系统解耦到足以这样测试的程度,也就更加灵活,更加可复用。部件之间的解耦代表着系统中的元素互相隔离得很好。隔离也让对系统每个元素的理解变得更加容易。

By minimizing coupling in this way, our classes adhere to another class design principle known as the Dependency Inversion Principle (DIP).5 In essence, the DIP says that our classes should depend upon abstractions, not on concrete details.

通过降低连接度,我们的类就遵循了另一条类设计原则,依赖倒置原则(Dependency Inversion Principle,DIP)。本质而言,DIP 认为类应当依赖于抽象而不是依赖于具体细节。

第11章-系统

It is a myth that we can get systems “right the first time.” Instead, we should implement only today’s stories, then refactor and expand the system to implement new stories tomorrow. This is the essence of iterative and incremental agility. Test-driven development, refactoring, and the clean code they produce make this work at the code level.

“一开始就做对系统”纯属神话。反之,我们应该只去实现今天的用户故事,然后重构,明天再扩展系统、实现新的用户故事。这就是迭代和增量敏捷的精髓所在。测试驱动开发、重构以及它们打造出的整洁代码,在代码层面保证了这个过程的实现。

不要过早动手

We all know it is best to give responsibilities to the most qualified persons. We often forget that it is also best to postpone decisions until the last possible moment. This isn’t lazy or irresponsible; it lets us make informed choices with the best possible information. A premature decision is a decision made with suboptimal knowledge. We will have that much less customer feedback, mental reflection on the project, and experience with our implementation choices if we decide too soon.

众所周知,最好是授权给最有资格的人。但我们常常忘记了,延迟决策至最后一刻也是好手段。这不是懒惰或不负责;它让我们能够基于最有可能的信息做出选择。提前决策是一种预备知识不足的决策。如果决策太早,就会缺少太多客户反馈、关于项目的思考和实施经验。

Whether you are designing systems or individual modules, never forget to use the simplest thing that can possibly work.

无论是设计系统或单独的模块,别忘了使用大概可工作的最简单方案。

第 12 章 Emergence 迭进

According to Kent, a design is “simple” if it follows these rules:

据 Kent 所述,只要遵循以下规则,设计就能变得“简单”:

  • Runs all the tests
  • Contains no duplication
  • Expresses the intent of the programmer
  • Minimizes the number of classes and methods

运行所有测试;不可重复;表达了程序员的意图;尽可能减少类和方法的数量;

The rules are given in order of importance.

以上规则按其重要程度排列。

Remarkably, following a simple and obvious rule that says we need to have tests and run them continuously impacts our system’s adherence to the primary OO goals of low coupling and high cohesion. Writing tests leads to better designs.

遵循有关编写测试并持续运行测试的简单、明确的规则,系统就会更贴近 OO 低耦合度、高内聚度的目标。编写测试引致更好的设计。

During this refactoring step, we can apply anything from the entire body of knowledge about good software design. We can increase cohesion, decrease coupling, separate concerns, modularize system concerns, shrink our functions and classes, choose better names, and so on. This is also where we apply the final three rules of simple design: Eliminate duplication, ensure expressiveness, and minimize the number of classes and methods.

在重构过程中,可以应用有关优秀软件设计的一切知识。提升内聚性,降低耦合度,切分关注面,模块化系统性关注面,缩小函数和类的尺寸,选用更好的名称,如此等等。这也是应用简单设计后三条规则的地方:消除重复,保证表达力,尽可能减少类和方法的数量。

Most of us have had the experience of working on convoluted code. Many of us have produced some convoluted code ourselves. It’s easy to write code that we understand, because at the time we write it we’re deep in an understanding of the problem we’re trying to solve. Other maintainers of the code aren’t going to have so deep an understanding.

我们中的大多数人都经历过费解代码的纠缠。我们中的许多人自己就编写过费解的代码。写出自己能理解的代码很容易,因为在写这些代码时,我们正深入于要解决的问题中。代码的其他维护者不会那么深入,也就不易理解代码。

所以有一个角度的阅读源码的方式就是从测试开始(当然前提是准备阅读的目标拥有足够优秀的测试代码,并且覆盖率够高。)

Well-written unit tests are also expressive. A primary goal of tests is to act as documentation by example. Someone reading our tests should be able to get a quick understanding of what a class is all about.

编写良好的单元测试也具有表达性。测试的主要目的之一就是通过实例起到文档的作用。读到测试的人应该能很快理解某个类是做什么的。

第 13 章 Concurrency 并发编程

“Objects are abstractions of processing. Threads are abstractions of schedule.”

—James O. Coplien1

“对象是过程的抽象。线程是调度的抽象。”——James O

Here are a few more balanced sound bites regarding writing concurrent software:

下面是一些有关编写并发软件的中肯说法:

  • Concurrency incurs some overhead, both in performance as well as writing additional code.
  • Correct concurrency is complex, even for simple problems.
  • Concurrency bugs aren’t usually repeatable, so they are often ignored as one-offs2 instead of the true defects they are.
  • Concurrency often requires a fundamental change in design strategy.

并发会在性能和编写额外代码上增加一些开销;正确的并发是复杂的,即便对于简单的问题也是如此;并发缺陷并非总能重现,所以常被看做偶发事件而忽略,未被当做真的缺陷看待;并发常常需要对设计策略的根本性修改。

  • Concurrency-related code has its own life cycle of development, change, and tuning.
  • Concurrency-related code has its own challenges, which are different from and often more difficult than nonconcurrency-related code.
  • The number of ways in which miswritten concurrency-based code can fail makes it challenging enough without the added burden of surrounding application code.

并发相关代码有自己的开发、修改和调优生命周期;开发相关代码有自己要对付的挑战,和非并发相关代码不同,而且往往更为困难;即便没有周边应用程序增加的负担,写得不好的并发代码可能的出错方式数量也已经足具挑战性。

Recommendation: Keep your concurrency-related code separate from other code.6

建议:分离并发相关代码与其他代码。

第 14 章 Successive Refinement 逐步改进

Let me set your mind at rest. I did not simply write this program from beginning to end in its current form. More importantly, I am not expecting you to be able to write clean and elegant programs in one pass. If we have learned anything over the last couple of decades, it is that programming is a craft more than it is a science. To write clean code, you must first write dirty code and then clean it.

先放松一下神经。这段程序并非从一开始就写成现在的样子。更重要的是,我也没指望你能够一次过写出整洁、漂亮的程序。如果说我们从过去几十年里面学到什么东西的话,那就是编程是一种技艺甚于科学的东西。要编写整洁代码,必须先写肮脏代码,然后再清理它。

Incrementalism demanded that I get this working quickly before making any other changes. Indeed, the fix was not too difficult. I just had to move the check for null. It was no longer the boolean being null that I needed to check; it was the ArgumentMarshaller.

渐进主义要求我在做其他修改之前迅速修正这个问题。修正并不费劲。我只是把对 null 值的检查移了个位置。再也不用检测 bollean 是否为 null,而是检查 ArgumentMarshaler 是否为 null。

14,,15,16 三章基本都是实践内容,但是由于本书的例子是基于Java构建的,而本人对Java确实不太熟悉(案例必然涉及到Java世界的一些特有特性或者轮子),便决定不在这三章中琢磨太多时间。直接将重心放回了最后总结的章节。

第 17 章 Smells and Heuristics 味道与启发

本章内容相当于之前内容的一个总结,所以英文原文我就暂且不一并摘录了。另外,全文都不保证完整,毕竟这只是我的一篇笔记。一般情况下我只会记下让我有感触的部分。

  • 无关或不正确的注释就是废弃的注释。注释会很快过时。最好别编写将被废弃的注释。如果发现废弃的注释,最好尽快更新或删除掉。废弃的注释会远离它们曾经描述的代码,变成代码中无关和误导的浮岛。
  • 如果注释描述的是某种充分自我描述了的东西,那么注释就是多余的。

下方的G系列来自 17.4 一般性问题

G2:明显的行为未被实现

  • 遵循“最小惊异原则”(The Principle of Least Surprise)[2],函数或类应该实现其他程序员有理由期待的行为。例如,考虑一个将日期名称翻译为表示该日期的枚举的函数。

    Day day = DayDate.StringToDay(String dayName);
    
  • 如果明显的行为未被实现,读者和用户就不能再依靠他们对函数名称的直觉。他们不再信任原作者,不得不阅读代码细节。

G3:不正确的边界行为

  • 没什么可以替代谨小慎微。每种边界条件、每种极端情形、每个异常都代表了某种可能搞乱优雅而直白的算法的东西。别依赖直觉。追索每种边界条件,并编写测试。

G5:重复

  • 每次看到重复代码,都代表遗漏了抽象。重复的代码可能成为子程序或干脆是另一个类。将重复代码叠放进类似的抽象,增加了你的设计语言的词汇量。其他程序员可以用到你创建的抽象设施。编码变得越来越快,错误越来越少,因为你提升了抽象层级。
  • 更隐蔽的形态是采用类似算法但具体代码行不同的模块。这也是一种重复,可以使用模板方法模式[4]或策略模式[5]来修正。

G6:在错误的抽象层级上的代码

  • 良好的软件设计要求分离位于不同层级的概念,将它们放到不同容器中。有时,这些容器是基类或派生类,有时是源文件、模块或组件。无论哪种情况,分离都要完整。较低层级概念和较高层级概念不应混杂在一起。

G10: Vertical Separation

  • 变量和函数应该在靠近被使用的地方定义。本地变量应该正好在其首次被使用的位置上面声明,垂直距离要短。本地变量不该在其被使用之处几百行以外声明。
  • 私有函数应该刚好在其首次被使用的位置下面定义。私有函数属于整个类,但我们还是要限制调用和定义之间的垂直距离。找个私有函数,应该只是从其首次被使用处往下看一点那么简单。

G11:前后不一致

  • 如果在特定函数中用名为 response 的变量来持有 HttpServletResponse 对象,则在其他用到 HttpServletResponse 对象的函数中也用同样的变量名。如果将某个方法命名为 processVerificationRequest,则给处理其他请求类型的方法取类似的名字,例如 processDeletion Request。

G17:位置错误的权责

  • 最小惊异原则在这里起作用了。代码应该放在读者自然而然期待它所在的地方。PI 常量应该在出现在声明三角函数的地方。OVERTIME_RATE 常量应该在 HourlyPayCalculator 类中声明。

G20:函数名称应该表达其行为

  • 如果你必须查看函数的实现(或文档)才知道它是做什么的,就该换个更好的函数名,或者重新安排功能代码,放到有较好名称的函数中。

G26:准确

  • 期望某个查询的第一次匹配就是唯一匹配可能过于天真。用浮点数表示货币几近于犯罪。因为你不想做并发更新就避免使用锁和/或事务管理往好处说也是一种懒惰行为。在可以用 List 的时候非要把变量声明为 ArrayList 就过分拘束了。把所有变量设置为 protected 却不够自律。
  • 在代码中做决定时,确认自己足够准确。明确自己为何要这么做,如果遇到异常情况如何处理。别懒得理会决定的准确性。如果你打算调用可能返回 null 的函数,确认自己检查了 null 值。如果查询你认为是数据库中唯一的记录,确保代码检查不存在其他记录。如果要处理货币数据,使用整数[11],并恰当地处理四舍五入。如果可能有并发更新,确认你实现了某种锁定机制。
  • 代码中的含糊和不准确要么是意见不同的结果,要么源于懒惰。无论原因是什么,都要消除。

G28:封装条件

  • 如果没有 if 或 while 语句的上下文,布尔逻辑就难以理解。应该把解释了条件意图的函数抽离出来。

    // 例如:
    
    if (shouldBeDeleted(timer))
    is preferable to
    
    // 好于
    
    if (timer.hasExpired() && !timer.isRecurrent())
    

G33:封装边界条件

  • 边界条件难以追踪。把处理边界条件的代码集中到一处,不要散落于代码中。我们不想见到四处散见的+1 和-1 字样。

    if(level + 1 < tags.length)
    {
      parts = new Parse(body, tags, level + 1, offset + endTag);
      body = null;
    }
    
    // 注意,level + 1 出现了两次。这是个应该封装到名为 nextLevel 之类的变量中的边界条件。
    
    int nextLevel = level + 1;
    if(nextLevel < tags.length)
    {
      parts = new Parse(body, tags, nextLevel, offset + endTag);
      body = null;
    }
    

17.6 名称

N2:名称应与抽象层级相符

  • 不要取沟通实现的名称;取反映类或函数抽象层级的名称。这样做不容易。人们擅长于混杂抽象层级。每次浏览代码,你总会发现有些变量的名称层级太低。你应当趁机为之改名。要让代码可读,需要持续不断的改进。看看下面的 Modem 接口:

N5:为较大作用范围选用较长名称

  • 名称的长度应与作用范围的广泛度相关。对于较小的作用范围,可以用很短的名称,而对于较大作用范围就该用较长的名称。
    • 类似 i 和 j 之类的变量名对于作用范围在 5 行之内的情形没问题。