Open Source Software developer @Codethink
824 words

Redis caching


In memory data structure store which is memory backed, provides fast read/write, persistance, replication, automatic failover, replication.

Redis is not a cache as such, as it suppors many different complex data types, but can be configured to be a cache based on some specific options.

AWS supports replication and automatic failover:

#### Replication


Replication in AWS can be configured in 2 different ways, you can have cluster mode on or cluster mode off. With custer mode off you get one primary replication group, that consists of one primary node and up to 5 replica (read) nodes. The replica nodes are seperate nodes in AWS, which can be spread across subnets to improve redundancy, you can also enable failover in aws, which allows redis to detect problems with the primary node, and begin to promote a read node to be the new primar node, whilst executing the failing node and replacing it with a new read node.

With cluster mode on, you get the same features as the clustermode off configuration, but with multiple primary nodes, this of course aids with redundancy and spreads the load horizontally. However there consitency guarantees between the multiple primary nodes is not strong, meaning you can get cache misses when data has been written to one primary nodes, but that data has not been replicated to other primary nodes yet.

Once a redis instance begins to fill up, in some uses cases you may not care, or simply increase the size or destroy it, however with a cache, this is of course not good as suddenly data being written to the cache fails. Redis facilitates this by offering cleanup methods, which will try use some metric to delete items in the cache that are not being used that often. It supports multiple different methods, which are best described here

BuildStream cache blog

Deploying a BuildStream Cache via Docker


Deplying provides many benefits for your project, a cache enables your team to work in a distributed manner without hitting dependecy hell, it also provides a large speed boost when building complex problems with lots of churn.

Deploying a cache however in a secure manner, can be a quite a tricky task, especially with the added overhead of Docker. In this guide I will try to streamline to process and document any GOTCHAS I encountered while doing so.

BuildStream Cache

The BuildStream project kindly provide a Docker image that is published on Docker Hub, you can find this image here. This image is configued to be a "testing" deployment, it is not configured for production use.

A cache is not a very complex endpoint, however it does require some configuraation for a production environemt, these things are:

  • A dedicated compute machine
  • A volume (preferably detachable)
  • domain
  • SSL certificates


The cache is designed to be performant, however depending on your use case, you must ensure the machine can handle multiple data streams at once, as it is common for a cache to be reading/writing artifacts to multiple BuildStream clients at one given time, networking performance will also be a consideration here.

Artifact Directory

Again, depending on your use case, a suitably sized volume will be needed for the writing on artifacts. There are some consideration to be hard here also, the speed in which the volume can write data, the size of the volume and if you are using cloud machines, how redundant that volume will be.

A cache is not only there to serve artifacts to developers who are currently rebuilding/developing the code base, it also serves as an archive of previous builds.

If your product is something that you feel coud benefit from reproducability in the future, you may want to consider having a larger volume, in order for BuildStream to preserve your artifacts for a longer period of time.

BuildStream by default will prune your cache if it detects your volume is becoming full, in order for a smoother user experience, however this pruning may result in a prebuilt artifacts being removed.


BuildStream cache support a few different configurations, each with varying levels of authentication.

The default BuildStream image is not configured for production use, as stated earlier, and we will discuss why this is shortly.

The cache supports two modes push and pull, as you can probably guess, one is configured to write artifacts and one of configured to read artifacts. Each instance can also support authentication and encryption.

Encryption is the cache is important as otherwise your source code is accessible to the public, these artifacts could also be tampered with in transit without your knowlege, so it is reccomended that you at leat encrypt your traffic.

Depending on your use case, you may also want to limit the user than can pull/push to your cache, you can configure key pairs, that if a user has, allows them to psh/pull to your cache.