Introducing an Auto-formatter on a Big Codebase

Posted on Feb 17, 2021

rant codeStyle java

Over the past few years, auto-formatting utilities (that do code formatting for you) such as gofmt , black or google-java-format have become increasingly popular. While such tools undoubtedly provide some benefits when used from the start of a project, changing the code style of an entire codebase is hard. The larger it is, the harder it is to break through its inertia… Which begs the question: is it worth it?

In my company, we decided to convert our mono-repo to google-java-format. For the record, we have several hundred developers working on a total of just above 1M lines of Java code. I initiated then oversaw this transition, and this is an account of how things went. If you’re wondering whether to make a similar change, I hope this post will help you make a more informed decision… And if you’re already set on doing it, I hope it will provide you extra arguments to convince people around you!

Why Use an Auto-formatter?

Code style always has been, is, and will always be a sensitive topic. Something as rigid as an auto-formatter is bound to invite passionate, and often endless, debate (Is two-space indentation superior to four-space indentation? Where should braces be placed? Etc.). In this context, it is important to take a step back, and always focus on the WHY: using an auto-formatter is not about improving code locally by making such and such file “prettier” as much as it is about improving code globally.

Introducing an Auto-formatter on a Big Codebase /img/thinking.jpg

There are indeed some objective arguments in favor of auto-formatting tools when it comes to impacting the entire codebase:

they save you time and mental energy, as you mostly do not have to think about code formatting anymore,
they make sure legacy code is up to current standards,
they can be enforced by the CI, which ensures future code will stay up to the standards too,
they generate a constant layout for a given block of code, no matter who writes it, while keeping the code at least reasonably readable.

Enforcing the code style via CI is a great way to simplify communication: you won’t have to manually make sure everyone is aware of the style guidelines anymore (which can be a pain when you onboard 10+ new developers per month)!

I also want to stress the importance of having a constant layout: it lets developers come up with expectations about what a given block of code should look like, in turn speeding up their reading and comprehension of any part of the codebase¹. Let me illustrate this with a simple example. Say the auto-formatter has a rule where all function calls with more than three arguments are written with one argument per line. You might think this is stupid: it makes sense if the argument names are too long to fit in one line, but what about short names?

// You probably think this:
var x = func(a, b, c);

// Is more readable than this:
var x = func(
  a,
  b,
  c);

It is true that the first example may be more readable. Locally, in the context of that statement, it makes more sense. But consider the global impact: if you know that you can expect all function calls with three or more arguments to be formatted in the same way, it becomes a lot quicker to know at a glance how many arguments are passed to a given function.

// Quick! What's the second argument to this function?
var x = func(A.builder().param(a, b).build(), b, c.method(d), e);

var x = func(
  A.builder().param(a, b).build(),
  b,
  c.method(d),
  e);

This example is arguably a little simplistic, but I hope I got the point across. Consistency is key, and auto-formatters provide just that.

The Psychology of Changing Code Styles

People tend to resist change, especially when it affects their daily life, and developers are no different. Making sure everyone understands why you are introducing an auto-formatter is a first step towards overcoming this resistance, but it will probably not be enough. Some people will raise concerns and disagree with you, and it is important to not brush them aside. In my experience, you will face three main types of complaints.

“It will cause merge conflicts” (aka “It adds overhead”)

For people to accept the change of code styles, friction has to be minimal. Not only does this means that the post-change workflow has to be seamless (IDE integration, on-save plugins, commit hooks…), but also that the change itself has to be as painless as possible. Our plan of action was something along the lines of:

Deciding of a date about 1 month in the future,
Communicating a lot about said date, and how the future workflow will look like,
Giving advice on how to minimize inconvenience (“please merge everything before the date”, “use git filters to help with merge conflicts”, etc.),
On the day, running the formatter and merging changes early to avoid more merge conflicts,
Prioritizing helping people and answering their questions for the first few days after the change.

Enabling the format changes on only one part of the codebase for some time can also be a way to gather feedback ahead of time, and we actually did that for about a month. It is a double edged sword though, as having two formats side by side in the same repository causes additional pain.

“It is different from what I’m used to”

These complaints can be expressed in many ways, but they boil down to developers not wanting to change their habits. You could of course spend hours debating about the comparative merits of different indentation styles, but it is usually not worth it: from my experience, such complaints disappear pretty fast after the change is done. It is hard to get things moving, but once you do, it’s fine.

“I don’t think we need an auto-formatter”

Unlike the two other types of complaints, this time, people disagree with your reasons for introducing an auto-formatter. It is important to setup a time and a place for such people to openly discuss the topic, and to avoid making them feel like they are being ignored. But be prepared: because code style is a religious issue, things can get heated. Keep a cool head! And do not attempt to do this alone: instead, assemble a team around you to manage the change of code style, so you can provide support to each other during the hard times.

Should Your Auto-formatter Be Configurable?

Faced with multiple complaints regarding a specific rule of the formatter (like the indentation style), it is tempting to change its configuration. After all, the point is to be consistent, so why not compromise on some rules? While it may sound like a smart way to sweeten the pill, it is like opening a Pandora’s box: do you really want to discuss and debate every single formatting rule? Besides, most auto-formatters deliberately offer no configurability, so changing a rule would mean creating and maintaining your own fork.

Another controversial topic is the use of @formatter comments: use them carelessly, and you risk losing all benefits of an opinionated formatter. Most of the time, when people ask for it, they will come up to you with a specific example and say: “look how bad the formatter makes this snippet look”. But even if that specific example may be annoying, is it annoying enough, often enough, to justify peppering your codebase with magic comments?

// Unformatted
for (var locallyMonitoredGroup : PeakGlobalDecider.groupBy10(locallyMonitoredScopes)) {
  sender().tell(new PeakGlobalDecider.LocalMonitoringV2(channelId, locallyMonitoredGroup, true), self());
}

// After formatting
for (var locallyMonitoredGroup : PeakGlobalDecider.groupBy10(locallyMonitoredScopes)) {
  sender()
      .tell(
          new PeakGlobalDecider.LocalMonitoringV2(channelId, locallyMonitoredGroup, true),
          self());
}

The above is a real world example that was brought to me with the complaint that the line breaks introduced by the formatter made it harder to read. It is true that the sender().tell() call is now split over multiple lines. But is it really the formatter’s fault?

for (var locallyMonitoredGroup : PeakGlobalDecider.groupBy10(locallyMonitoredScopes)) {
  var msg = new PeakGlobalDecider.LocalMonitoringV2(channelId, locallyMonitoredGroup, true);
  sender().tell(msg, self());
}

Introducing another variable not only solved the formatting issue, but also made the code better. Sometimes, producing badly formatted code is a code smell!

I am personally opposed to @formatter comments, but in our case, we ended up authorizing them in one specific scenario: a DSL where indentation had a semantic meaning. But it was out of laziness more than necessity, as we could have reworked that DSL’s syntax to work better with the formatter.

One Month Later: Was it worth it?

We started enforcing google-java-format over our entire codebase in late November 2020, and the transition itself went surprisingly smoothly. New code is easy to write thanks to the tooling working fine, and most of the old code became easier to read. There were some minor technical hiccups (IntelliJ’s configuration sometimes conflicts with code formatting…) and a few people still dislike the new code format, but the feedback is pretty good overall.

A surprising amount of people who were not particularly in favor of switching to an auto-formatter expressed their satisfaction, especially regarding not having to think about the code formatting anymore. On the flip side, there was a single case where old code became very hard to read, but closer inspection showed that it was due to a lot of nested lambdas resulting in crazy vertical spacing: in other words, it exposed some flaws in the code that could have been fixed with a short refactor.

As always, shoot me a message or tweet @nicol4s_c if you want to chat about any of this, if you spotted any mistakes or typos, or if you’d like me to cover anything else! Have a great day :)

Steve McConnell makes a very interesting analogy between chess players and developers on this topic in Code Complete ↩︎