Profiling slow imports in Python
This week at $WORK
I went on a bit of a side quest investigating why pytest collection was so slow.
Just running pytest --collect-only
from our project root was taking a painfully long time.
Following the advice on: awesome-pytest-speedup I discovered imports were taking up a significant amount of time after running:
python -X importtime -m pytest &> profile
And pasting the output in a nice profile viewer: https://kmichel.github.io/python-importtime-graph
The output of which looks something like this:
On surprising culprit I found was the stripe
library. Take the following example script:
import stripe # noqa
def main():
print('hello world')
if __name__ == "__main__":
main()
If we compare benchmark this script against a standard hello world program the results are pretty shocking
$ hyperfine --warmup 10 'python main.py' 'python with_stripe.py'
Benchmark 1: python main.py
Time (mean ± σ): 14.5 ms ± 0.4 ms [User: 10.9 ms, System: 3.4 ms]
Range (min … max): 13.6 ms … 17.3 ms 190 runs
Benchmark 2: python with_stripe.py
Time (mean ± σ): 761.0 ms ± 4.4 ms [User: 722.8 ms, System: 33.9 ms]
Range (min … max): 754.2 ms … 768.0 ms 10 runs
Summary
python main.py ran
52.58 ± 1.64 times faster than python with_stripe.py
The seemingly innocuous stripe import takes a whopping 761ms to complete
Not only will this slow down pytest
collection, but it will also impact the startup speed of any applications or scripts that transitively depend on stripe.
Unless you've carefully architected your application, you might find that a lot of code paths are impacted.
Solutions
Lazy Imports 🥱
One solution here to lazily defer this import so that it happens at runtime. E.g:
STRIPE = None
def _stripe():
global STRIPE
if STRIPE is None:
import stripe
return stripe
return STRIPE
Any code that needs to use the stripe
namespace can instead do:
def create_product():
# Trigger the import
stripe_client = _stripe()
starter_subscription = stripe_client.Product.create(
name="Starter Subscription",
description="$12/Month subscription",
api_key=API_KEY
)
The trade-off here is that we've traded the upfront penalty of loading this library ahead of time to incucring the cost at runtime.
So our app might start faster, and tests might be collected faster, but any test or request that makes a call to the create_product()
function will still incur the 700ms overhead of importing the stripe
module.
Roll your own 🧵
It might be worth considering whether you really need to be importing this entire library in the first place.
In the case of stripe, if you're only using a subset of functionality, it might be trivial to instead call their REST API directly, and come up with your own wrapper/bindings.
Conclusion
I knew Pythons import system was slow, and I knew third party modules are often pretty bloated. But I was pretty shocked to see just how slow things could get.
At my $WORK
stripe
isn't the only culprit, there's many other big name offenders.
Typically you'll encounter a problem (I need to interact with a payment provider), pull in a library and go about your day.
On the face of it the library appears to be completely harmless. You'll probably not notice any tangible difference in start-up time before/after.
One day you'll realise your app is slow as molasses because you've got x10 third party modules all with their own associated overhead.
I guess the takeaway (obvious in hindsight) is that everything you import has a cost. In the future I'll think twice before pulling in a dependency or not, and consider whether it's really worth the tradeoff.