How I Structure Python Scripts

Python scripts can quickly become complex beasts, especially when nested if statements and long routines come into play. But what if there was a way to tame this complexity, making your code more readable and maintainable? I'm going to describe a way to structure your Python script so that it is easy and clear to follow regardless of complexity.

This pattern may not be suitable for every scenario, but for most processes that are contained to one or two scripts I find it very useful. The guiding principle here is about making the code easy to comprehend what it is doing.

I've seen many scripts that just go top to bottom with multiple nested levels of if statements and very few functions to handle repetitive routines. When a new person looks at it (including yourself six months later), they can spend hours just trying to decipher it all. Rarely are the scripts well commented either. But even with comments, you tend to end up scrolling up and down multiple times trying to piece together the big picture.

Describing the big picture in code is the essence of this pattern. Most textbooks start with a table of contents that outline to the reader what topics are available and allow them to quickly jump to the section that interests them. Setting up your script in a similar way gives the same benefit.

Here's a simple example:

# imports go at the very top
import os

# Any constants or global variables follow the imports
important_global_variable = "Hi! 👋🏻"

def main():
    """The main function is the high level outline of the script.
    It is placed at the top of the script after any global variables.
    This makes sure it is the first thing a developer sees when they open
    the file."""

    # The names of the functions themselves are self describing
    do_this_first()
    then_do_this()
    and_then_do_this()

def do_this_first():
    """Every major step is defined inside a function"""
    # Do stuff here
    return

def then_do_this():
    """The functions have descriptive names to make it clear what they do"""
    # Do some other complicated stuff here
    return

def and_then_do_this():
    """The function names read like commands clarifying their intended actions"""
    # Finally do the last stuff
    return

# This if statement goes at the very end of the script.
# It should read exactly like this.
if __name__ == '__main__':
     # The only thing it needs to do is call the main() function
     main()
     # But during development you can also swap main()
     # for one of the other functions if you want to test
     # one of other the steps in isolation. E.g.
     # then_do_this()

The key component of this structure is the main() function at the top and then the if statement at the very end that calls main().1 The name, main, is simply a convention. You could name this function anything you like such as my_script(). You just need to make sure you call it at the end. Note, even if you were to name it something else, the if condition would remain the same. When Python executes a script, the initial script that it invokes gets the internal __name__ variable set to __main__.2

The next crucial component to the pattern is that every major step of the script is defined inside a function. You might be wondering how the main() function can invoke these other functions even though they are defined after it. The reason is that Python only executes functions when they are called. When this script is executed, the Python interpreter will read (i.e. execute) each line top to bottom as it always does. When it reads def main(): or def do_this_first():, Python is being instructed to define these functions, not run them. It's not until Python reaches the end of the script that is it instructed to actually do something i.e. run main(). By this time, all the functions have been defined, so main() is free to call any of the other functions.

My sample script is probably too simple for the majority of real-world scenarios. Typically, the contents of the main() function are a little more sophisticated. For example, you might need to pass arguments to the major functions or return a value. You could also have some light control flow logic (if or try).

def main():
    connection = get_db_connection(db_name)
    data = get_some_data(connection, sql_query)
    num_rows = count_the_rows(data)
    if num_rows == 0:
        print("No data found")
        # Lets quit. Return now instead of continuing
        return
    data = clean_up_data(data)
    save_results_to_csv(data, "results.csv")

Resist the temptation to add too much implementation detail to main(). Recall the guiding principle is to easily assess what the script is intended to do. You can see in the previous example, that the whole main() function almost reads like pseudo-code. One quick scan gives a clear picture of what the script is doing, while the details of the implementation are relegated to the other functions later in the script.

This example also demonstrates another little bonus tip on code implementation that is enabled by this pattern. Did you notice the return in the middle of the function? You frequently need to validate what is happening when your code runs to avoid errors and undesired side effects when something unexpected happens. This typically takes the form of if statements checking for expected values of variables.

Here's another way this sample code might have been written and note that I am not defining this inside a function like main().

connection = get_db_connection(db_name)
if connection is not None:
    data = get_some_data(connection, sql_query)
    num_rows = count_the_rows(data)
    if num_rows > 0:
        data = clean_up_data(data)
        save_results_to_csv(data, "results.csv")
    else:
        print("No data found")

This toy example is not difficult to comprehend but try to extrapolate this to a more complex use case. Imagine there could be 50 or 100 lines of code between the first and second if statements. Notice the progressive indentation as we go further down and imagine if there were 3, 4, or even 10 other if conditions throughout the progression of the script. There is almost no difference in the contents and order of the lines between this and the previous version, but the former is vastly easier for a person to read and comprehend.

Since the original version wraps the main code flow inside a function (i.e. main()), if something happens that I don't like, I can just exit the function with a return. Otherwise, it keeps going. The steps of the normal code flow also remain at the same level of indentation, making it easier to read.

  1. Technically, the if statement is not a requirement for this pattern to work. But it is a common convention that is used. It becomes more important if you start having other scripts calling functions from this script. In that scenario, the value of __name__ is not '__main__'. So you avoid the other script causing this script to execute the main() function, when you may not have intended that. Up to you, but probably a good habit to maintain.
  2. python - What does if name == "main": do? - Stack Overflow

You'll only receive email when they publish something new.

More from Matt Carter
All posts