I consider myself a moderately fluent Pythonista — I like to think I can decorate, iterate, & comprehend lists with the best of ‘em. However, organizing Python source — especially for bigger projects — has always been a major blind spot for me. Due to Fear!!, Terror!!, Dread!!, and Severe Tilting!!! over cryptic, intransigent python import errors and big long stackoverflow pages bereft of copy/paste miracle magic I’ve always just fallen back to keeping everything in a single monster scripts — worse yet, duplicating everything across a wicked bramble of sibling script files.

humorous, relatable memes

Organizing and writing with a library mindset is the better way to go:

  • source can be stored as small, digestible pieces with descriptive names,
  • source can be reused better so there’s less duplicated effort and shotgun surgery, and
  • source can be pampered with Good Practices like unit testing and formal documentation.

So, this year I’ve had a bit of a sit-down with myself to try to build some skills in this department with Python. After rounding my usual miserable circuit of fruitless fiddling and googling, I clocked a new (to me) idea that finally helped me switch over to a library organization model for Python in my projects.

Before we get to my One Weird Trick, let me bore you with a brief description of my long-held project organization dream and

🔗 My Dream K̶i̶t̶c̶h̶e̶n̶ Project

Let’s say that I’m working on a very swell codebase called my_yeet_project. I’d like to be able to organize all my reusable python components into a single folder — maybe called yeetpylib/. I’m going to want to use these components from some Jupyter notebooks. Lately, I’ve taken to shoving these all into a folder called binder/ and hosting them via https://mybinder.org. I usually also want to use these components from a variety of scripts that will be invoked directly from the command-line. I’d like to organize them in a few folders named based on what they’re actually used to do — typically stuff along the lines of postprocessing/ and the like. I’ll probably have a few non-python odds and ends (bash :roll_eyes:) living inside postprocessing/ too.

Here’s what a file tree for that project might look like.

my_yeet_project
├── binder
│   ├── cheuggy_graphs_n_stats.ipynb
│   └── sheeshish_graphs_n_stats.ipynb
├── postprocessing
│   ├── gaggy_bash_shenanigans.sh
│   └── sheesh_n_cheugify_data.py
└── yeetpylib
    ├── calculate_cheug.py
    ├── calculate_sheesh.py
    └── __init__.py

🔗 Why We Can’t Have Nice Things

Wouldn’t it be nice if we could just access yeetpylib by relative import from our scripts and notebooks, something like

binder/cheuggy_graphs_n_stats.ipyhb or postprocessing/sheesh_n_cheugify_data.py:

import ..yeetpylib.calculate_cheug.cheugrify

Wouldn’t it, indeed. :thinking:

Unfortunately, we’ll run into a ImportError: attempted relative import with no known parent package.

We could just install yeetpylib locally with pip, but in my lived experience this arrangement quickly unravels into an unholy nightmare to try to keep the installed version up-to-date, especially in situations where the library’s actively being developed side-by-side with an end-script or when you want scripting to be guaranteed-consistent with your source at a particular revision.

Taking this one step further, we could also bundle yeetpylib as a PyPi package. I’ve had moderate payoff spamming PyPi with little nuggets I want to re-use across projects (like keyname, iterpop, and iterdub). However, this only works well for well-defined components are one-and-done-able so you can just pin and forget them. The local install frustrations mentioned above would pair dreadfully with the aperitif of constantly (forgetting to) deploy your latest changes.

🔗 My One Weird Trick (TM): Symlinking

Soft symlinks are a special kind of file that redirects your operating system to go grab a file (or folder) form somewhere else. For example, if I had this file tree where my_symlink was set up to redirect to ../my_file

.
├── my_folder
│   └── my_symlink
└── my_file

and I were to run cat my_folder/my_symlink the contents of my_file would be printed out. Because they’re a redirect rather than a copy, the contents of my_symlink will always be up to date with my_file (even if we change it later) without us having to do anything!

As a bonus, because they’re basically just a file that stores a path to redirect to they work good with Git. (So long as your redirect path is relative and doesn’t stretch outside the scope of your repository.)

Soft symlinks are very useful as a get-out-of-jail-free card for Python import errors. If we symlink yeetpylib into all the folders with scripts or notebooks where we want to use it, we can freely

import yeetpylib.calculate_cheug.cheugrify

to our heart’s content!

Here’s what a file tree for my_yeet_project might look like with symlinks in place.

my_yeet_project
├── binder
│   ├── cheuggy_graphs_n_stats.ipynb
│   ├── sheeshish_graphs_n_stats.ipynb
│   └── yeetpylib -> ../yeetpylib
├── postprocessing
│   ├── gaggy_bash_shenanigans.sh
│   ├── sheesh_n_cheugify_data.py
│   └── yeetpylib -> ../yeetpylib
└── yeetpylib
    ├── calculate_cheug.py
    ├── calculate_sheesh.py
    └── __init__.py

To set up these soft symlinks, you can jump into the directories you want to read yeetpylib from and run

ln -s ../yeetpylib yeetpylib

🔗 Let’s Chat

I would love to hear your thoughts on Python project organization!! I’m really looking to glow-up my wisdom on wholesome, anti-cringe practices in this space.

I started a twitter thread (right below) so we can chat :phone: :phone: :phone:

Pop on there and drop me a line :fishing_pole_and_fish: or make a comment :raising_hand_woman: