Yonson

@yonson

2,316 words

You'll only receive email when Yonson publishes a new post

Pandora's Box

Operator! Get me the president of the world!

Dependencies

Dependencies are great. They allow developers to reuse code, a pillar of software engineering. This can be code developed by another company, like the omnipresent Google Guava library. Or it can be an internal library that developers publish for other teams.

Dependencies are scary. Dependencies tend to have dependencies. Managing this graph of dependencies is impossible.

An example is the "Diamond Dependency Problem". Library A depends on libraries B and C. B and C both depend on library D, but different versions of D. What version of library D should library A use?

Programmers have implemented different strategies to deal with the "Diamond Dependency" and none have solved it. Build tools like Maven are able to recognize the problem. But after recognizing it, Maven picks a version (by default the newest) and tosses it on the classpath. No guarantee that version will work and failures often occur at runtime.

NPM takes a different approach. Each dependency has a copy of all its dependencies and these copies are not shared. Library B has its own library D and library C has its own version of D. This is kicking the can down the road. Now developers discover breaking changes at runtime in horrible fashion. For example, library A asks for a model object from library B and passes it to library C. What if library D defines this model? There are now two versions of library D's model floating around which can lead to awful serialization problems.

Dependencies, man.

Leaky Concurrency

And I would play with fire to break the ice.

Asynchronous

It was time to start breaking up the Monolith into microservices. The Monolith was a massive application which contained all our project's code and it was almost impossible to maintain. As we considered our options for our next generation tech stack, I heard the word "asynchronous" a lot.

We chose a microservice framework which exposed an asynchronous API. This was a radical change from our old tech where each request had it's own thread. Following the hip new trend, I thought our database driver should also use an asynchronous API.

There were performance and scaling benefits with the new tech, but there was also a large increase in bugs. What happened?

I needed to dive deep under the abstraction layers to understand what was going on. What is "asynchornous"? Is it worth it?

There Is No Thread

At the lowest level of computers everything is asynchronous. This was somewhat of a surprise after years of writing synchronous code. It probably has something to do with the world being asynchronous though.

The path from high level language code to bits on the wire is complex. But just knowing the gist is helpful.

Some synchronous code makes a blocking I/O call. The language runtime library translate that into some kernel commands. The kernel instructs a hardware device, through a device driver, to perform the actual I/O. At this point the kernel moves on and the device is busy sending signals.

Asynchronously.

When the device finishes its I/O it interrupts the kernel. The kernel makes a note to pass that message back up to user land. The language runtime is waiting for that signal so the synchronous code can continue its "thread" of execution.

So down at the lower levels there are no threads. Threads are a higher level abstraction that developers have been working on for the past fifty years.

Structured Programming

Back in the 70's the creation of this abstraction was a big deal. A debate raged between the computer science heavy weights on whether we should restrict our code to make it easier to reason about. This is when developers started to frown on GOTO jumps despite their power.

GOTO leads to spaghetti code which is hard to reason about. So developers created and adopted some structures to keep things simple. These exists in all major languages today. Things like control flow (if/then/else), code blocks, and subroutines (functions and callstacks).

This also solidified the causality of code. If you see the following: g();f(); you would assume that the function g is run before f. Programmers take this concept for granted these days.

Programmers have spent a lot of time and effort building up this "thread" concept. But these new fancy asynchronous APIs with their callbacks look an awful lot like a GOTO.

Performance and Scalability

Asynchronous implementations get sold on their performance and scalability. How much performance and scalability though, depends on the use case.

Let's take the case of a monolith application broken up into microservices. A service gains performance if it can query other services in parallel. And a service is more scalable if it can service hundreds of I/O requests on one thread since it takes less memory.

For the database driver, developers did not often query the database in parallel. But a service did perform hundreds of parallel I/O requests to the database. In our old tech stack, each request had its own thread. These threads would block on database I/O threads, wasting the memory resources they consumed.

The nature of this application meant it spent most of its time waiting on I/O. It would also spend some CPU marshalling data around, but was by no means CPU bound. This is a case where it makes sense to have one thread manage all this waiting and light CPU work.

So we use our limited resources more efficiently. But at what cost?

What Have We Lost

Our code is now full of callbacks. Callbacks shatter structured programming. Exception handling no longer works. Try with resources no longer works. And what's worse is that these fail silently. The compiler isn't going to tell you that the code you wrote won't actually catch any exceptions.

We have also lost the free back pressure. If ten threads are running synchronous code and the database hiccups, all ten threads will pause. They will no longer accept new work and this back pressure propagates upstream. Asynchronous code keeps accepting new work even though none is getting done.

Arguably the worse loss however is causality. While some asynchronous frameworks guarantee all code is ran in one thread, removing a large set of concurrency bugs, it is not obvious in what order this code will be ran. There are many different possible logical threads of execution. g();f(); no longer means what a developer thinks it does.

A Leaky Abstraction

Developers struggled with the loss of causality when migrating to the async API of the database driver. When was code being executed? And from where?

The callbacks were exposed as Futures (fancy callbacks) which readily accepted more GOTOs to be tacked on (in the form of functions). What wasn't obvious was that these GOTOs would be run by the underlying event loop implementation. Slowing down the event loop thread caused hard to debug problems.

This burden of making sure code was being ran in the correct spot was new. And as code gets more complex, keeping track of these logical threads of execution becomes more difficult.

The thread has become a leaky abstraction.

An Old Hope

So I don't like asynchronous interfaces, but its undeniable that there are cases where operating system threads are not the best concurrency model.

Maybe coroutines are the best of both worlds.

Each of these "threads of execution" have their own stackframe. They have all the same characteristics as normal threads, but have the potential to be ran concurrently.

This isn't free though. The Rust language actually used to have coroutines as first class citizens, but deprecated them. This is because not only must the compiler have the ability to turn functions into state machines, but a runtime is needed to schedule these coroutines. Rust being a low level systems language didn't want the burden of this runtime scheduler.

A language like Go doesn't mind it though. Maybe the future is here.

Where All the Code Should Live

They would take their software out and race it in the black desert of the electronic night.

Monorepo

I was once part of a developer holy war.

The team could not decide how to organize our code. Should it live in one repository, a monorepo, or should there be a repository per project?

The war was ignited by a challenge we were facing: scaling developer productivity as we grew.

Everyone agreed that code organization could help combat this producitivity loss, but which strategy should we take?

Much to the dismay of some, we ended up with the monorepo.

And it was the right call.

Productivity Breakdown

We developers face scaling challenges all the time. Some are easy to predict. Some application might work fine for ten users, but a we wouldn't expect it to hold up to millions of users without some changes. Some challenges are not as obvious.

At one point in time, all the developers of Fitbit were in single room. I took for granted a lot of properties that come with a team that size. We merged code straight to master and resolved conflicts in person. Even if other developers were not working on code related to a change, they had a good gut instinct on the effects it would have. This instinct allowed us to detect breaking changes before they got to production.

However, as the team grew, errors began to happen at an exponential rate in development and production.

It's tough to say at what team size the project began to degrade, 10 devs, 30 devs, or 100 devs. But changes that used to be easy began to require hours to coordinate and were error prone. The size of our team was taking a toll on productivity.

And that is when the monorepo versus multiple repo debate took off.

The Influence of Code Organization

Code organization has the potential to influence how easy or difficult it is for a developer to discover code, build code, and test code.

Discover: Where does this code live? Where is this code used?

Build: How do I build this project? How do I manage it's dependencies?

Test: How do I test this code? How do I test the code that depends on this code?

Developer productivity would remain high, in the face of a growing team, if these tasks remained easy. So which code organization strategy influences these the most?

Discover

Finding usages of code is marginally easier in a monorepo, since all code can be grep'd at once. But simple tools applied to the multirepo approach produce the same effect.

Relative to the other tasks, its a wash.

Build and Test

Buiding and testing code is where a monorepos shines because a monorepo can enable faster failures and avoid technical debt.

To enable faster failures, we must leverage a monorepo's one inherit advantage: atomic commits. A developer can make a change affecting more than one project in one all-or-nothing (a.k.a. atomic) step. The multiple repository process to push out a change often follows the open source pattern. A developer patches a project and uploads it to a central location. At a later time, a dependent project pulls down the new version. There is a layer of indirection which forces the process to have multiple steps.

So to perform a library update with multiple repository code organization, waves of builds and tests have to be run. First, a developer patches the original library and publishes it to a central location. The downstream projects which depend on the library need to test the new version. A tool, or an unfortunate developer, needs to update the downstream projects and test them all. If there is a break, the original library patch needs to be rolled back.

And what about downstream projects of the downstream projects? The waves of building and testing continue. Each wave adds complexity and brittleness, especially in the face of rollbacks.

Using atomic commits in a monorepo, we avoid the waves of builds and tests. Instead of pulishing a new version of library and then coordinationg testing of affected projects, we do it in one step. The library and all affected projects are all tested on the revision containing the change. This allows dependent projects to fail fast on changes.

Avoiding Debt

If a developer is used to the open source, multiple repository model, this monorepo approach sounds like a lot of work. To update a library I have to update all dependent projects at the same time? Why me? The answer is you, because the developer best equiped to deal with a breaking change is the one making the change.

An Unnecessary Interface

At some level of scale it makes sense to break a monolith application into micorservices. Microservices accept that the complexity of the system increases (more than one live version of code, service discovery, load balancing) versus a monolith. But in this case, the complexity can be worth it.

Is there added complexity for multirepos? The trials of building and testing code exist, but there is also a social element. Conway's Law states that the structure of projects that people build reflects the social structure of the people that build them. In software engineering, this often manifests itself as code interfaces between projects. And these interfaces are often where bugs occur.

Multiple repository code organization encourages another interface within a system, where as a monorepo discourages it. One less interface to cause problems.

Embrace the Monorepo

A monorepo has its faults and doesn't solve everything, but it has the higher potential to maintain developer productivity as a team grows.

P.S. Deploying

As soon as a project requires more than one machine to run on, it will have to deal with artifact versioning in production.

It sounds weird to have a monorepo with microservices, but deployment and code organization are orthogonal strategies.