Why you should use a Reference Manager, as a Software Engineer
April 20, 2021โข964 words
The little secret no one told the non Academic folks.
We tend to see the world through the lens of the cumulated media that influence us. The same goes for your Software Engineering career; every technical blog posts, research papers, books, online courses, videos, etc. you consume shape the way you tackle a problem. Some pieces of information stand out more than others. And some pieces even give you that Ahah! moment where a foggy concept becomes crystal clear. It would be great to keep those eyes-opening ressources at hand whenever we need a refresher or want to share them with a friend.
How can we keep track of those valuables chunks of information spread out on the Internet?
Our digital knowledge is scattered among various services and keeping track of a piece of information depends on the format of the media.
You might add the last book you read on your Goodreads bookshelf. If it's a Youtube video, you like it and maybe add it to a playlist. For research papers you read them, annotate them and then never come back at it because you forgot what it relates to. Maybe you use Pocket for interesting blog articles and read 1 out of 10 bookmarked posts. Each of those systems have their own metadata system and information retrieval becomes tedious.
A use case - Indexing research papers
I tried several techniques to organize my pdf. For research papers, I used to have a tag system with the colored label on macOS. Red for an an unread paper, green for read, yellow for outstanding. But this 1-Dimensional approach is limited. If you store all those papers in a single directory you quickly get overwhelmed, so why not using several folders? Ok why not, let's take an example: I have two folders; one for papers related to Distributed Computing, one for Machine Learning. Now let's say I find a great paper on managing ML models in a distributed environment, where do I put it? In the Distributed Computing folder? Or in the Machine Learning folder? Do I create a Distributed Machine Learning folder? All those answers make information retrieval harder.
A solution, using a Reference Management Software
We can take inspiration from the Academic World. They read conference proceedings and other long, scary .pdf almost everyday. Their career is build on their intellectual capital and their ability to connect the dots between results of past experiments to come up with innovative ways to address a problem and move Science forward. To achieve this goal, they need tools to organize their knowledge with little distractions. We share at least this minimal set of features with Academic folks:
Storing a reference and its metadata - a Write
Retrieving a reference efficiently with multiple search criteria - a Read
In Database speak, the system we aim to design shall sustain heavy read and low write rate. It's ok that it takes more time for us to "write" a new record in the reference management system (adding metadata: author, publisher, source url, year, keywords, abstract, etc.) since it allows cheap read operation. The faster you retrieve information, the more likely you'll re-use the tool over time.
It literally takes less than a minute to add an entry to a your reference manager. You're helping your future self by doing this and adding those tiny bits of metadata that make that paper, blog, video, meaningful to you.
A reference manager gives you that one-stop shop for the pieces of information that build up your intellectual capital. Usage of such a tool pays off in the long term, like 2+ years IMO. Indeed, a folder-based approach is good enough to get started. But once you realize how hard it is to maintain your knowledge base, consider using a reference manager.
Helping your future you..
...and others along the way
So what are the benefits of using a reference manager?
It enforces a consistent metadata system and naming conventions which speeds-up information retrieval.
You now have your one-stop shop for all your resources that have contributed to your intellectual growth. Books, papers, but also Youtube videos and other non-academic formats. I even have a record for this amazing Youtube video distilling the intuition behind the Fourier Transform. I could never get such a visual intuition through a textbook. (I just typed Fourier in my Bookends to get that link :) )
It becomes easier to share a reference with a friend. Discussing with a colleague about string encoding? It's likely they'll be interested in reading Spolsky, J. (2004). The absolute minimum every software developer absolutely, positively must know about unicode and character sets (no excuses!). Retrieved 2021-04-19, 2021, from https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
Low on disk space? It's ok, you can store metadata of a book/paper/pdf/video and not have the actual data on your disk. The wise man know where to look it up.
Getting practical, two interesting reference management systems
I hesitated between Zotero and Bookends but ultimately went with Bookends as my reference manager on macOS. They offer very similar features and if I were a Linuxian, Zotero would be my go to solution.
Their interface has little distractions to focus on the content. The several panels and options make the tool overwhelming at first but key features of adding/searching for a document are intuitive to use.
Zotero is open source and completely free, while Bookends has a price tag of 63โฌ with a one time buy (I'd never go for a subscription service for something I know I'll use my whole career if I can answer the need through a one-time payment).
After trying them both, the UI and iCloud sync feature of Bookends made me choose this one.