Spark Tseung

Tech/non-tech notes and random thoughts. Views and opinions expressed here are mine only. They do not represent those of my former and current employers, or any other organizations I am affiliated with.

Minimal Example for Python Multiprocessing

The following is a minimal example for multiprocessing in Python. A very useful guide written by Jan Bodnar is here. from multiprocessing import Process def fun(i): return i def main(): proc = [] for i in 1:100: p = Process(target=fun, args=(i, )) p.start() proc.append(p) for p in proc: p.join() if __name__ == '__main__': main() ...
Read post

# %% scripts are superior to Jupyter Notebooks

Jupyter notebooks, while popular as an entry to data science, have many shortcomings. The drawbacks and dangers of over-reliance on Jupyter notebooks are best summarized in this 2018 talk by Joel Grus. The percent format for Python scripts, denoted by # %%, is a great way to replace Jupyter notebooks. The percent format is supported by many editors, most notably VS Code. The official guide offers intuitive examples as introduction to the percent format and allows for a smooth transition from no...
Read post

venv for Python Virtual Environments

The venv library is part of the Python standard library suites for creating reproducible environments in terms of package dependencies. Suppose you have a folder for a Python project. The following command creates a new virtual environment. python -m venv /path/to/new/virtual/environment Then, use the following command to activate (or "enter") this virtual environment. source /path/to/new/virtual/environment/bin/activate Now we are working in an isolated Python environment: everything Python-...
Read post

Book Review: Data Science at the Command Line

Back in 2022, I came across a free E-book called Data Science at the Command Line written by Jeroen Janssens. I had quick and light read. My immediate thoughts after reading it are: Well, this is interesting... I didn't know you could do this much data manipulation and analysis through command line alone. Nah... I will probably stick with scripts and notebooks. Command lines still have their limits. Overall, the book is quite well-written: a friendly and easy introduction, abundant code exam...
Read post

Whoogle: Google but no ads

Whoogle is an open-source project that allows you to get "clean" Google search results, e.g., no ads/sponsored contents, no JavaScript, no cookies, limited tracking, etc. It is very easy to set it up locally with Docker - literally just two commands to pull it down and spin it up. I've been using it for weeks with great experiences - primarily to remove ads, disassociate search results from my Google account, and remove url tracking when clicking results. However, my IP address is still visibl...
Read post

VB-Cable for Virtual Audio Cables

VB-Cable is probably the simplest way to set up a virtual audio cable between two applications. The software is free and easy to download and install. It adds two virtual audio devices to your computer: CABLE Input (like a virtual microphone) and CABLE Output (like a virtual speaker). The following is a typical use case. You have some program to add filters and tweaks to your microphone input, e.g. using OBS or specialized software. You prefer this processed audio and would like to use it el...
Read post

Inconsistency in Python library names

It is well known (or perhaps not) that some Python packages have seemingly inconsistent names. Keeping this in mind might save you a lot of unnecessary debugging headaches. For example, the package multiprocessing can speed up certain tasks by parallelism. It is imported as follows. import multiprocessing Meanwhile, a nonsensical error would occur if you try to use pip install multiprocessing for actually installing said package. It turns out the installation should be: pip install multiproces...
Read post