Performance Considerations
Note, these are guidelines used by the SCD Cloud Operations Group. They are being presented externally as they may be useful for those developing their own software, they are not being provided with the intention of the Cloud Team reviewing your code - unless you are contributing to one of our repositories.
There are many factors which can affect the performance of a program when coding it, some of them are less obvious. This page will give you some recommendations on techniques to both improve performance in code and also diagnose potential areas which are causing issues using profiling.
Profiling in Python
Profiling is a process where the efficiency of your code is checked line by line, this is incredibly useful for identifying areas of your code that aren’t running as well when compared to others. By default Python comes packaged with two profilers: cProfile
and profile
. These two profilers can be imported into a program normally or can also be used on the Python command line, cProfile
is recommended for general use where profile
is really only to be used if you’re adding to the profiler directly or if cProfile
cannot be used.
cProfile
can be called within code to test certain functions or can be invoked when running a python script through the command line, two examples for both cases can be found below:
import cProfile
import random
def randgen():
y = []
for x in range(1000):
y.append(random.randint(1,10)+random.randint(1,10))
y.sort()
print(y)
cProfile.run("randgen()")
python -m cProfile randgen.py
In both cases replace randgen()
with the name of the function you mean to profile or randgen.py
with the name of the file you want to profile. Either way you should get an output that looks something like this:
19189 function calls in 0.008 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.008 0.008 <string>:1(<module>)
1 0.001 0.001 0.008 0.008 Profiling.py:4(randgen)
2000 0.001 0.000 0.002 0.000 random.py:235(_randbelow_with_getrandbits)
2000 0.002 0.000 0.004 0.000 random.py:284(randrange)
2000 0.001 0.000 0.004 0.000 random.py:358(randint)
6000 0.000 0.000 0.000 0.000 {built-in method _operator.index}
1 0.000 0.000 0.008 0.008 {built-in method builtins.exec}
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
1000 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
2000 0.000 0.000 0.000 0.000 {method 'bit_length' of 'int' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
3184 0.000 0.000 0.000 0.000 {method 'getrandbits' of '_random.Random' objects}
1000 0.002 0.000 0.002 0.000 {method 'sort' of 'list' object
This overview will show you a few important things, namely: how many times each function was called under ncalls
, the total time spent in the given function excluding time made in other called functions under tottime
, the quotient of tottime
divided by ncalls
under percall
and the cumulative time it took to run all of the calls required under cumtime
. With a chart like this you can start to identify which functions are called the most, take the longest and where to optimise.
As an aside, if you ever wanted to get a profile like this in a visual form python-call-graph
can be used to generate graphs like these:
Be careful, however, not to use the original pycallgraph
package that it was derived from. This is because it has been deprecated for years and no longer functions properly without some additional work. The repo for python-call-graph
can be found here.
However, you’ll see that if I scale up my randgen()
function large enough, to around 300,000 then the memory usage grows incredibly high, especially as the list gets longer and longer:
This is a good chance to talk about the next tip for optimisation: Generators
Generators in Python
When an array becomes larger and larger it hits a point where it is no longer practical to keep all of it loaded into memory at one time, in cases such as reading a large file or getting large amounts of information from a database, having that much data loaded into memory at one time can often lead to an almost immediate MemoryError
. So how do we get over this? Using a datatype in Python called a Generator.
Generators are much more memory efficient than lists when it comes to large datasets because unlike lists it doesn’t load the entire thing into memory at one time, rather only yielding one item from it at a time. Let's take a look at a basic generator function:
You’ll probably notice that this seems no different from a normal function you’d declare, which is true, yet there’s one thing different. The use of the yield
syntax rather than return
, this makes a generator object from the function, effectively telling the interpreter to return an entry from the data stored, however, unlike when using return
the interpreter will stay in the function and remember the previous state (i.e. the index point in the list it was last at) so it can proceed from there. Let’s take a look at the results by trying to print a variable born from the function!
As you can tell, this isn’t exactly very useful. All we get back is a random address in memory, but this is where the peculiarity of a generator comes to play! A generator object is supposed to be iterated through, so, we’ll add a simple loop like this:
And all of a sudden…
Infinite numbers! This function will theoretically keep printing numbers forever, without as much of a memory limitation. While this is a rather abstract use of a generator you can start to imagine how useful they can be if you replaced the number
integer with an array of any sort!
You can even iterate through a generator using the next
command, like this:
Seeing all this, generators are an amazing way to cut down memory costs when using massive datasets. Keep in mind, though, that in cases where the list is small, generators may actually incur a performance cost instead of a benefit, so you only need to rely on them when working with large amounts of data.
Additional considerations
Paying attention to caching
Have you ever been running code with a lot of variables to keep track of? Often this is the case when running complex calculations that rely on results from previous operations. You’ll find that functions like those take much longer than you would think. This can be solved by manually setting the allocated cache to a function through a specific decorator! If you import the functools
library this will give you access to the @functools.lru_cache(maxsize=X)
decorator! If you place this above a function that needs additional cache and set the max size to your desired amount (I use 128) you can increase the number of data items allowed to be cached for the function, which will massively speed up the process.
Always use built in features where possible
Built in features in Python tend to be incredibly well optimised and fast already, if it’s possible to use them then you definitely should before writing your own functions up.
Avoid overuse of global variables
While this is common sense to some people, it’s important to check how many global variables you have and to try and keep them in check, each global variable will take up space in memory as long as the program is running, so if they can be localised down then they should be.
Data types
Certain data structures are better performance-wise. The obvious example of this would be a linked list, allowing for items in a list to be stored in different segments of memory from each other, linked by pointers. This sounds unimportant until you realise that lists pre-allocate the memory required for a list in advance, which can be a lot if the list is going to grow over time. Rather than doing that you can make a linked list, memory being allocated as objects come in rather than immediately.
The downside of using a linked list is that lookup times are bound to be slower, so if your list is already small or guaranteed to be a certain size. In the case that you know all of the items needed for the list already, consider using a tuple instead. This will also likely decrease memory usage in the long run. If you don’t know about the data types you can use or how to use them, refer to the first two lessons of the python training page which you can find here.
List comprehension
When manipulating lists in a way that you can’t predict your first thought may be to immediately make a conditional or a loop to parse through the filters and changes you would want to make. However, this isn’t necessarily needed due to a concept called list comprehension, effectively allowing you to shorten down list manipulations down to a single line and save some performance while doing it. This works by embedding the loop into the list line like the following example:
We start with x to state what will get put into the list, then continue with a relatively normal for loop expression. It’s important to note that while this does cut down the amount of line and is theoretically good for performance, this also negatively affects code readability and can get rather in-depth. To find out more it would be worth doing your own research.