Jack's blog

Profiling slow imports in Python

This week at $WORK I went on a bit of a side quest investigating why pytest collection was so slow.

Just running pytest --collect-only from our project root was taking a painfully long time.

Following the advice on: awesome-pytest-speedup I discovered imports were taking up a significant amount of time after running:

python -X importtime -m pytest &> profile

And pasting the output in a nice profile viewer: https://kmichel.github.io/python-importtime-graph

The output of which looks something like this:

Profile output

On surprising culprit I found was the stripe library. Take the following example script:

import stripe # noqa


def main():
    print('hello world')


if __name__ == "__main__":
    main()

If we compare benchmark this script against a standard hello world program the results are pretty shocking

$ hyperfine --warmup 10 'python main.py' 'python with_stripe.py'
Benchmark 1: python main.py
  Time (mean ± σ):      14.5 ms ±   0.4 ms    [User: 10.9 ms, System: 3.4 ms]
  Range (min … max):    13.6 ms …  17.3 ms    190 runs

Benchmark 2: python with_stripe.py
  Time (mean ± σ):     761.0 ms ±   4.4 ms    [User: 722.8 ms, System: 33.9 ms]
  Range (min … max):   754.2 ms … 768.0 ms    10 runs

Summary
  python main.py ran
   52.58 ± 1.64 times faster than python with_stripe.py

The seemingly innocuous stripe import takes a whopping 761ms to complete

Not only will this slow down pytest collection, but it will also impact the startup speed of any applications or scripts that transitively depend on stripe.

Unless you've carefully architected your application, you might find that a lot of code paths are impacted.

Solutions

Lazy Imports 🥱

One solution here to lazily defer this import so that it happens at runtime. E.g:

STRIPE = None


def _stripe():
    global STRIPE
    if STRIPE is None:
        import stripe

        return stripe
    return STRIPE

Any code that needs to use the stripe namespace can instead do:

def create_product():
    # Trigger the import
    stripe_client = _stripe()

    starter_subscription = stripe_client.Product.create(
      name="Starter Subscription",
      description="$12/Month subscription",
      api_key=API_KEY
    )

The trade-off here is that we've traded the upfront penalty of loading this library ahead of time to incucring the cost at runtime.

So our app might start faster, and tests might be collected faster, but any test or request that makes a call to the create_product() function will still incur the 700ms overhead of importing the stripe module.


Roll your own 🧵

It might be worth considering whether you really need to be importing this entire library in the first place.

In the case of stripe, if you're only using a subset of functionality, it might be trivial to instead call their REST API directly, and come up with your own wrapper/bindings.

Conclusion

I knew Pythons import system was slow, and I knew third party modules are often pretty bloated. But I was pretty shocked to see just how slow things could get.

At my $WORK stripe isn't the only culprit, there's many other big name offenders.

Typically you'll encounter a problem (I need to interact with a payment provider), pull in a library and go about your day.

On the face of it the library appears to be completely harmless. You'll probably not notice any tangible difference in start-up time before/after.

One day you'll realise your app is slow as molasses because you've got x10 third party modules all with their own associated overhead.

I guess the takeaway (obvious in hindsight) is that everything you import has a cost. In the future I'll think twice before pulling in a dependency or not, and consider whether it's really worth the tradeoff.