4047 words

Link Dump #3 - 2020-04-17

In less tardy fashion, a third link-dump - a place for me to offload the interesting articles or pieces or software I come across while trawling the internet.

Orchestration

Pueue: A modern, Rust-based alternative to task-spooler which provides a (local) subset of the kind of batch queueing functionality provided by schedulers such as SLURM.

Compression

orz: A stand-alone, general-purpose compression tool - not much more to say than that; performance numbers are impressive, so the only reason I can see not to use it is the fact that alternatives have better adoption and so are more likely to be maintained long-term.

zstd: A fast lossless compression algorithm implemented in a library and CLI utility, which has started to gain widespread adoption within tools which have a compression component such as SquashFS. One particularly noteworthy feature is the ability to "train" the compression algorithm using similar data, which is paticularly beneficial for inputs which are only a few KB per file.

zfp/fpzip: Compression tools targeting scientific data - specifically, arrays of floating-point data with some form of regularity.

Hosting

Caddy: Having recently been upgraded to version 2, Caddy offers a simple yet flexible web server distributed as a static binary.

Configuration

Dhall: Fitting somewhere between a configuration file syntax and a full programming language, Dhall provides advantages like variable substitution and simple logic without the complexity of Turing-completeness.

Development

VSCodium: Releases of the VSCode editor, pre-compiled but without being encumbered by Microsoft tracking.

VisiData: A TUI for data science, aimed at making data cleaning/reformatting simpler from the command line.

Challenges

Project Euler: Most of my early practice at writing Python was via attempts to solve Project Euler challenges. The main downside is the bias towards mathenatics, meaning that there aren't really many opportunities to learn entirely new patterns. Nowadays, I have more than enough ideas for interesting projects to keep me busy with learning "real" software development in the rare moments when the mood takes me.

ROSALIND: Similar to Project Euler, but with the focus shifted from mathematics to bioinformatics, and with more structured learning built in.

Encryption

gocryptfs: An encrypted FUSE file-system, where any files added are stored 1:1 as encrypted files on the host.

Link Dump #2 - 2020-04-14

The second, greatly delayed, link dump - a place for me to offload the interesting articles or pieces or software I come across while trawling the internet.

Benchmarking & Stress-testing

Tools to evaluate the performance of commands and the systems they run on.

Hyperfine: A wrapper command to perform statistical analysis on the performance of other CLI tools.

Stress-ng: As a rule, I try to avoid CLI applications which have hyphens in their command names. Not really sure why. Regardless, this one is too good to pass up - it offers a highly configurable way to apply resource stress to a machine (lots of CPU usage, lots of memory accesses, etc).

Backups & Storage

Restic: "Backups done right" - a tagline I whole-heartedly agree with. Restic allows storage to be backed up as encrypted deduplicated snapshots to a hosted server or directly to object storage.

Borg: An alternative to Restic with compression as well as encryption and deduplication, but without the ability to save to object storage.

MinIO (+ RADIO): MinIO is a simple object storage server for self-hosting, with configurable erasure coding across multiple disks and servers. RADIO is a higher-level abstraction which provides caching as well as distributing data (via mirroring, as well as erasure coding) across multiple object storage targets.

Standalone CLI Utilities

A catch-all segment for cool CLI applications I use somewhat frequently.

UPX: A slightly "meta" choice - UPX is used to reduce the size of binaries. Together with the "strip" command, this can make a huge difference to the size of application downloads, and so is something I hope to see used more often.

Monolith: A tool for archiving full websites as a single file, for data hoarding or offline reading purposes.

FirefoxSend CLI: Firefox Send has become my go-to way of moving files between machines where shared-access object storage or NFS mounts aren't available. Being able to limit accesses to a single download, on a single day, behind password encryption makes me much more comfortable about what is effectively "dump this thing on the internet to grab it later". The official website is handy when on a Windows desktop, but for any other use-cases my default is to use this CLI tool.

Creating CLI Utilities

Turning a set of scripts or functions into a tool can range from "easy peasy" (Rust, Go) to "pain in the ass" (Bash, Python) - these links should help with the latter.

The Many Layers of Packaging: Returning to Python - this is the best guide I've come across on how to convert projects into distributable applications.
AppImage: One of the techniques listed in the previous link on Python package has more general usefulness, and can be used to distribute arbitrary executables along with their own dependencies as a single binary (actually an executable SquashFS image).

Security, Isolation & Containerization

I have documented previously that I've tried to use LXD and Multipass to manage development environments. Personally I'm happy with those two, but here are some alternatives which might be more appropriate in other situations.

Firejail: A CLI tool to isolate untrusted applications into their own sandbox, using the full range of isolation capabilities already available via the Linux kernel.

Toolbox: A CLI tool for generating development environments as OCI containers, with a focus on providing the software environment while allowing easy access to the underlying user file-system.

Schemas

Aside from StandardNotes - which has now become my go-to text editor and scratchpad - I use a couple of other tools to offload ideas from my brain to a more reliable medium.

The most long-standing of these is Todoist. It's been at least 4 years since I started using this tool, with no significant breaks. Throughout that time, my only real struggle has been when trying to apply a new organisational schema, such as using projects and subprojects in an effective way. Recent updates to the application could theoretically have helped with this, but instead just left me stuck between two not-quite-ideal, overly complex arrangements. After a week or so of struggling with the adjustments I made trying to accommodate new functionality into my workflow, I not only reverted to my previous technique, but took a further simplifying step by culling my subprojects entirely.

A more recent addition to my list of personal tools is Notion. Initially, Notion replaced Airtable as a database tool, but quickly also absorbed some of the content I was writing in Simplenote. Trying to create more structure using a mixture of lists, tables and kanban boards in Notion soon turned into an unsatisfactory experience; personal and work projects rarely have the same level of depth required, and I often ended up with unused headings or blank pages - ie structure simply for the sake of structure. I've since rescued my Notion usage by:

  • combining it with Pocket as a bookmark/reading-material aggregator - Notion is used for preserving reference material I am likely to come back to multiple times, while Pocket is used for one-off reading material such as blogs or time-sensitive news articles;
  • focusing on the use of simple list-pages at the top level of a project or other structure, with tasks exclusively placed on kanban boards when they (mostly) fit with SMART criteria;
  • using existing project pages as a simplified scratchpad for notes which do not yet have a home, or need to be merged/split - once these notes gain some value and context, they can be re-shuffled into an updated project.

With each of the tools I've adopted, the main limitation on how effective I found them initially was due to my own tendency to over-complicate. In each case I have struggled to keep my focus on the content being generated, instead becoming obsessed with finding the correct structure.

To break out of this cycle, I have adopted the idea of the "last responsible moment" to decide on a structure for data, and defaulting to the simplest possible format or collection of metadata. Both Simplenote and now StandardNotes have helped with this by enforcing the use of free-form tags as the only real form of organisation.

The lesson I've taken from this back-and-forth journey across multiple productivity tools is that no single schema can adequately fit to my work tasks, CPD activities and other speculative personal projects. I've wasted enough time on organization at the expense of "doing", so now things only get written down when they are at risk of slipping through the cracks - and ideas only have a structure applied to them where it can help with delivering results.

Nudge

"Drink more water" is one of those aspirational items which has been hanging around my to-do lists, project planning notes and habit tracking apps for what feels like half a lifetime. I've known it would be a healthy living "quick win" all along; a feeling cemented a few years ago when I tracked various aspects of my day-to-day life and found that I occasionally went for several months without drinking a beverage that wasn't either alcoholic or caffeinated.

Regardless of whether I was at home with unlimited free access to water and drinking vessels, or working away in unfamiliar places where deliberate meal choices were easier to deal with, I always struggled. I tried several times to find a suitable nudge mechanism, but the habit never stuck.

That seems to have changed. A couple of weeks ago when attending a conference, I got a free cupanion water bottle. I didn't need another water bottle, but I took one anyway...because it was free. In fact, with the bare minimum of encouragement from the people at the desk, I took two. They look great, hold a sensible volume, and have a nice feeling when carried. Since picking up the bottles, I have consistently drunk at least a litre of water every day.

What I've discovered is that I actually needed two nudges to form this habit. The positive aesthetic and tactile experiences of using the bottle are enough to get me drinking most of the time, but to fill in the gaps in my attention I needed some form of gamification. The cupanion bottles achieve this via a barcode sticker which can be scanned using a smartphone app - each scan results in a charitable donation to some good cause related to providing clean water to those who might not otherwise have access.

The fact that scanning the barcode is beneficial to a good cause isn't just incidental - without it, I would've lost interest in the app within a few days, and these would just be another pair of bottles littering my house. As things stand though, I have both selfish and selfless reasons to keep drinking water - together these should finally allow me to maintain the habit.

Development Environments

I've never really settled on a good way of managing software-related projects - both things I'm putting together myself, and downloaded resources I'm experimenting with. Not being a software developer, it has never become a priority, and so any structure which might be detectable in the contents of my "Projects" directory has thus far been purely coincidental. However, over the last year I've had several halting attempts at trying to change that.

The end goal of my recent experiments is to determine what the structure of my personal laptop OS is likely to be when Ubuntu 20.04 rolls around - I've settled on having a LTS distro on the device itself, and so my "development" workflow is likely to require more attention. With that in mind, here is a summary of the things I've tried while stumbling towards some sort of sane, professional setup.

Arbitrarily labelled directories: A sadly-too-common fall-back; but not very neat or easy to manage after a period of non-activity. Backups become opaque ("Have I archived this one already?"), and nothing ever truly feels finished. The one exception to this, shining like a beacon of sanity, is the directory which contains example code I've put together as part of an effort to learn Rust. I adore Cargo, and the Rust toolchain management experience in general. It's nice to see that some lessons have been learned from the shit-show that is Python packaging.

Git repositories: A significant step up from raw directories - the trick I haven't mastered is figuring out when something is "significant" enough to require version control. The correct answer is probably to use git for everything immediately, shifting the decision back a step - ie to determining what remote origin to use (if any). Git-by-default is the new habit I'm trying to form. Fortunately I love using git - I just wish I worked on more projects which allowed me to experiment with the more advanced bits of functionality.

Python virtualenvs: More of a necessary evil than a deliberate choice, Python virtual environments have probably introduced more hassle than benefit in the small use-cases I have had to date. I am certainly guilty of over-complicating things when using Python - my default position in the past has always been to install the Anaconda distribution, and occasionally Intel Python as well. In reality I hardly ever used Intel Python, or the high-performance functionality of Anaconda. As a result I've concluded that in future I'll stick to the OS system Python, or use Intel Python/Anaconda exclusively inside a VM or container. More generally, I need to do some more tests to decide between simply using pip+venv and using poetry, since these seem to be the best current options.

Docker containers: Using Docker for local development environments has never really appealed to me. Given that my work doesn't involve the development of persistent services, the benefits of matching a development workflow to a production one are mostly lost. Without this benefit, Docker really becomes just a convoluted and opaque alternative to running in a VM - as such I have tended to avoid the hassle.

Other application containers: Working in the HPC space, I have a natural disposition towards containers other than Docker - namely Singularity, Charliecloud, and more recently Sarus. While I toyed with the idea of Singularity-based development environments, the default approach to bind-mounting host directories wasn't really what I wanted. Similarly, immutable container images are what I want when building an executable - but not when doing ad-hoc experimentation with software packages.

Vagrant VMs: For a time, managing VirtualBox VMs with Vagrant was my preferred option for personal projects. However, I never really liked that the VirtualBox application itself had a license structure which effectively precluded professional use, or that the Vagrant templates themselves were Ruby syntax. While I have nothing against the language, it seemed unnecissarily burdensome that I needed to know anything about a particular programming language rather than just using a standard, well-documented config file syntax like YAML or TOML. Using alternative Vargant providers had some appeal, but I soon decided that keeping a bunch of plugins up to date was an unnecessary hassle.

LXD/LXC: In recent months, LXD with ZFS-backed storage has become my go-to solution for isolated environments. Creating a system container is fast, and the set of available images is acceptable (albeit rather biased towards Ubuntu). LXD is about as lightweight an interface as I could ask for without skimping on any features - my one minor complaint is that customization is only really possible via quite "raw" methods like using cloud-init.

Multipass: I only became aware of Multipass very recently (late 2019), but quickly concluded that if LXD wasn't the right answer for me, then this might do the trick instead. Most of the ease-of-use features I want (copying files in/out, mounting directories) are available in Multipass by default, while LXD requires tweaks or hacks to the host OS. These are trivial enough, but knowing that the problem is being handled in an "official" way is nice. The downside is that despite now being in a 1.0 release, there still seem to be significant bugs - I've yet to get a multipass VM to remain accessible across host sessions when suspending or rebooting.

The One True Answer (for now, at least)

Realistically, I'll probably just use either Multipass or LXD as the "outer" layer for projects, with Python virtualenvs internally where necessary. Both have an acceptable approach to bind-mounting directories from outside, though the automation of UID/GID mapping in Multipass probably means that it will win out in the long run if the problems can be resolved. If it turns out using LXD is still part of my workflow in the coming months, it will almost certainly be because of ZFS and the ease of managing storage. I'm currently considering the best way to manage an isolated ZFS pool (based on the same approach as LXD, ie a file-system in a file) which can be shared between Multipass VMs. However, the fact that this is best achieved by just using a directory mounted from the host begs the question what benefit ZFS is actually adding in this instance.

Regardless of the choice of Multipass or LXD for workspace isolation, enforcing git-by-default and a simpler Python workflow will definitely help to make my life easier.

The Desktop App

Recently (that is, for the last 2-3 years), I've resisted using the desktop version of browser-based applications. Either they tend to be incredibly bloated and slow compared to the browser itself, or they have a shoddy half-finished feel about them.

This one seems different. While writing this short blog, I got distracted by testing the creation of AppImage binaries (of which the StandardNotes app is one). When I came back to finish off the post, I forgot which window I had been in a couple of times, and flipped back and forth between my "home" browser tabs and the desktop version. The fact that I couldn't really tell the difference means that everything is working as intended.

Aesthetic

It's now been just under two weeks since signing up for StandardNotes.

Boy - do I owe the creator an apology. "Slightly uglier Simplenotes"? Now I've adjusted to the layout, I can't imagine anything better. Over this two week period I've written more - and been inspired to structure my ideas more - than at any time over the last several years. That impulse has leaked over into my use of other tools, in a positive way.

I'm still on the fence about whether this should be the "permanent home" for a blog. A GitHub/GitLab static site has a certain appeal due being more familiar - but avoiding the need to generate the static pages and push them to a repo is nice. The reduced friction of being able to just type something, hit publish, and "fuhgeddaboudit" is somehow quite freeing.

Link Dump #1 - 2019-12-05

The first of (hopefully) many link dumps - a place for me to offload the interesting articles or pieces or software I come across while trawling the internet.

Python

A consistent frustration for me has been the difficulty of using even moderately complicated Python projects across multiple devices. A lot of my recent reading has involved a search for whatever is the canonical/best approach to bundling dependencies together as a portable development environment, and also create a simple binary distribution.

I haven't come across the ultimate solution yet - but these are some of the tools or guides which will hopefully ease the burden the next time I have to tangle with Python.

Pex: A Python tool for generating executables from Python environments or individual packages.
Voila: A tool for converting Jupyter notebooks to interactive web dashboards, while retaining a functional compute kernel.
Pipx: A tool for consuming already-published Python packages as regular binaries.
Poetry: A popular alternative package/environment manager. Although I haven't used it extensively, my feeling is that this tool is most likely the one which will allow me to consider my frustrations with Python "solved".
PyOxidizer: The most modern in a selection of tools which promises the ability to generate application binaries from Python projects.

Chess

I recently started playing chess (using the Lichess android app) - having progressed from "beaten every time by the most basic AI" to "find the basic AI annoyingly bad in its stupidity" - I'm ready to step up to Stockfish 2! Thinking about how modern chess engines compare to real humans sent me down a rabbit hole of learning about how they are developed.

Stockfish: The home of the most popular chess engine, which is also one of the strongest.
Chessprogramming Wiki: A handy resource for anyone exploring the idea of writing their own chess engine, or those with an interest in the techniques used to make the engine more or less competent.

Rust

Learning some form of low-level programming language has been on my to-do list for far too long. Having never really progressed beyond useful-but-not-professional Python, I recently decided that targeting C/C++/Fortran to do anything numerical was probably a waste of time; anything I need will be far better served via existing open-source libraries than whatever novice version I could put together.

That decision led me towards Go and Rust, both of which have the appeal of being potentially useful for writing CLI applications - an area where I'm far more likely to be able to put my learning to real productive use.

The Rust Programming Language: A free online version of the (definitive?) guide to Rust. I have a copy of the physical book, and so far it is probably the best text I've come across about a particular programming language.

Encryption

A topic I find fascinating, but don't have much scope to experiment with on a day-to-day basis. One of my general concerns has been that gpg is easy enough after a bit of practice or memory-jogging, but not simple enough to be a comfortable default choice when only using it on a very occasional basis.

Toplip: A CLI tool for encryption with steganography and plausible-deniability features.

Raspberry Pi

My current count is 10 Raspberry Pi devices scattered around the house. One is permanently occupied as a bastion host/generic testing device, while the others tend to spend a few weeks in a cluster configuration before I get bored and put them aside, only to come back to them a few months later. My plans for the next round include a k3s cluster and a set of lightweight file-sync nodes. In that vein...

k3sup: A tool to simplify the deployment of k3s. While k3s is already a seemingly much more straightforward version of Kubernetes, impatient people like me can benefit from the "quick wins" a tool like this provides, and use it as a staging point to start learning more.
Resilio Sync on a Raspberry Pi: Many moons ago I was a BTSync user, but moved away from it as I was lacking a use-case. More recently I have decided to try to ditch Dropbox and Google Drive, so a private p2p sync tool seemed like a logical consideration. I tried Syncthing, but wasn't impressed by ther performance or stability. While Resilio is proprietary, the improved user experience seems to more than make up for this so far.

Operating Systems

I've now been a preferential user of Linux for over 10 years. One thing I've consistently overlooked is the world of open-source, or generally non-Windows operating systems outside the Linux space. People who have worked with a modern Unix OS seem to swear by them; so these are the two I intend to try when time allows.

OmniOS: A Unix OS, which might be useful as a means to learn about Illumos-based systems, and also make a nice hypervisor for Linux VMs/containers, with ZFS as the underlying storage.
SmartOS: Another Unix OS, seemingly an even better choice for a pure hypervisor deployment.

Hello World!

I'm experimenting with StandardNotes, after seeing it mentioned on HackerNews and being sucked in by a Black Friday deal.

Initial impressions weren't so great; it felt like I'd just signed up to pay for a slightly uglier version of Simplenote, which I've already used for years.

This feature might change my mind, however...