I consider myself a moderately fluent Pythonista — I like to think I can decorate, iterate, & comprehend lists with the best of ‘em. However, organizing Python source — especially for bigger projects — has always been a major blind spot for me. Due to Fear!!, Terror!!, Dread!!, and Severe Tilting!!! over cryptic, intransigent python import errors and big long stackoverflow pages bereft of copy/paste miracle magic I’ve always just fallen back to keeping everything in a single monster scripts — worse yet, duplicating everything across a wicked bramble of sibling script files.
Organizing and writing with a library mindset is the better way to go:
- source can be stored as small, digestible pieces with descriptive names,
- source can be reused better so there’s less duplicated effort and shotgun surgery, and
- source can be pampered with Good Practices like unit testing and formal documentation.
So, this year I’ve had a bit of a sit-down with myself to try to build some skills in this department with Python. After rounding my usual miserable circuit of fruitless fiddling and googling, I clocked a new (to me) idea that finally helped me switch over to a library organization model for Python in my projects.
Before we get to my One Weird Trick, let me bore you with a brief description of my long-held project organization dream and
🔗 My Dream K̶i̶t̶c̶h̶e̶n̶ Project
Let’s say that I’m working on a very swell codebase called
I’d like to be able to organize all my reusable python components into a single folder — maybe called
I’m going to want to use these components from some Jupyter notebooks.
Lately, I’ve taken to shoving these all into a folder called
binder/ and hosting them via https://mybinder.org.
I usually also want to use these components from a variety of scripts that will be invoked directly from the command-line.
I’d like to organize them in a few folders named based on what they’re actually used to do — typically stuff along the lines of
postprocessing/ and the like.
I’ll probably have a few non-python odds and ends (bash ) living inside
Here’s what a file tree for that project might look like.
my_yeet_project ├── binder │ ├── cheuggy_graphs_n_stats.ipynb │ └── sheeshish_graphs_n_stats.ipynb ├── postprocessing │ ├── gaggy_bash_shenanigans.sh │ └── sheesh_n_cheugify_data.py └── yeetpylib ├── calculate_cheug.py ├── calculate_sheesh.py └── __init__.py
🔗 Why We Can’t Have Nice Things
Wouldn’t it be nice if we could just access
yeetpylib by relative import from our scripts and notebooks, something like
Wouldn’t it, indeed.
Unfortunately, we’ll run into a
ImportError: attempted relative import with no known parent package.
We could just install
yeetpylib locally with
pip, but in my lived experience this arrangement quickly unravels into an unholy nightmare to try to keep the installed version up-to-date, especially in situations where the library’s actively being developed side-by-side with an end-script or when you want scripting to be guaranteed-consistent with your source at a particular revision.
Taking this one step further, we could also bundle
yeetpylib as a PyPi package.
I’ve had moderate payoff spamming PyPi with little nuggets I want to re-use across projects (like keyname, iterpop, and iterdub).
However, this only works well for well-defined components are one-and-done-able so you can just pin and forget them.
The local install frustrations mentioned above would pair dreadfully with the aperitif of constantly (forgetting to) deploy your latest changes.
🔗 My One Weird Trick (TM): Symlinking
Soft symlinks are a special kind of file that redirects your operating system to go grab a file (or folder) form somewhere else.
For example, if I had this file tree where
my_symlink was set up to redirect to
. ├── my_folder │ └── my_symlink └── my_file
and I were to run
cat my_folder/my_symlink the contents of
my_file would be printed out.
Because they’re a redirect rather than a copy, the contents of
my_symlink will always be up to date with
my_file (even if we change it later) without us having to do anything!
As a bonus, because they’re basically just a file that stores a path to redirect to they work good with Git. (So long as your redirect path is relative and doesn’t stretch outside the scope of your repository.)
Soft symlinks are very useful as a get-out-of-jail-free card for Python import errors.
If we symlink
yeetpylib into all the folders with scripts or notebooks where we want to use it, we can freely
to our heart’s content!
Here’s what a file tree for
my_yeet_project might look like with symlinks in place.
my_yeet_project ├── binder │ ├── cheuggy_graphs_n_stats.ipynb │ ├── sheeshish_graphs_n_stats.ipynb │ └── yeetpylib -> ../yeetpylib ├── postprocessing │ ├── gaggy_bash_shenanigans.sh │ ├── sheesh_n_cheugify_data.py │ └── yeetpylib -> ../yeetpylib └── yeetpylib ├── calculate_cheug.py ├── calculate_sheesh.py └── __init__.py
To set up these soft symlinks, you can jump into the directories you want to read
yeetpylib from and run
ln -s ../yeetpylib yeetpylib
🔗 Let’s Chat
I would love to hear your thoughts on Python project organization!! I’m really looking to glow-up my wisdom on wholesome, anti-cringe practices in this space.
I started a twitter thread (right below) so we can chat
wow guess my blog's been on a bit of a Divine Comedy kick lately 🥴— mmore500 (@mmore500) May 15, 2021
anyways this one's on symlinking my way out of #Python ImportError hell 🔥🔥🐍🔥
aight catch u later gotta get back 2 simping for virgilhttps://t.co/QnSAo7IsVZ https://t.co/03JTezsHcy pic.twitter.com/dUqjrIhihI
Pop on there and drop me a line or make a comment