@@ -159,13 +159,91 @@ to benchmark a full cell containing a block of code.
159
159
160
160
Profiling
161
161
---------
162
+ Profilers are applications which attach to the execution of the program, which in our case is done
163
+ by the CPython interpreter and analyze the time taken for different portions of the code.
164
+ Profilers help to identify performance bottlenecks in the code by showing
162
165
166
+ - wall-time (*or start to end time that the user observes),
167
+ - CPU and GPU time, and
168
+ - memory usage patterns
169
+
170
+ in **function/method/line of code ** level granularity.
171
+
172
+ Deterministic profilers vs. sampling profilers
173
+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
174
+
175
+ .. note ::
176
+
177
+ *Deterministic profilers * are also called *tracing profilers *.
178
+
179
+ **Deterministic profilers ** record every function call and event in the program,
180
+ logging the exact sequence and duration of events.
181
+
182
+ 👍 **Pros: **
183
+ - Provides detailed information on the program's execution.
184
+ - Deterministic: Captures exact call sequences and timings.
185
+ 👎 **Cons: **
186
+ - Higher overhead, slowing down the program.
187
+ - Can generate larger amount of data.
188
+
189
+ **Sampling profilers ** periodically samples the program's state (where it is
190
+ and how much memory is used), providing a statistical view of where time is
191
+ spent.
192
+
193
+ 👍 **Pros: **
194
+ - Lower overhead, as it doesn't track every event.
195
+ - Scales better with larger programs.
196
+
197
+ 👎 **Cons: **
198
+ - Less precise, potentially missing infrequent or short calls.
199
+ - Provides an approximation rather than exact timing.
200
+
201
+
202
+ .. discussion ::
203
+
204
+ *Analogy *: Imagine we want to optimize the Stockholm Länstrafik (SL) metro system
205
+ We wish to detect bottlenecks in the system to improve the service and for this we have
206
+ asked few passengers to help us by tracking their journey.
207
+
208
+ - **Deterministic **:
209
+ We follow every train and passenger, recording every stop
210
+ and delay. When passengers enter and exit the train, we record the exact time
211
+ and location.
212
+ - **Sampling **:
213
+ Every 5 minutes the phone notifies the passenger to note
214
+ down their current location. We then use this information to estimate
215
+ the most crowded stations and trains.
216
+
217
+ In addition to the above distinctions, some profilers can also
218
+
219
+ .. callout :: Examples of some profilers
220
+ :class: dropdown
221
+
222
+ CPU profilers:
223
+
224
+ - `cProfile and profile <https://docs.python.org/3/library/profile.html >`__
225
+ - `line_profiler <https://kernprof.readthedocs.io/ >`__
226
+ - `py-spy <https://github.com/benfred/py-spy >`__
227
+
228
+ Memory profilers:
229
+
230
+ - `tracemalloc <https://docs.python.org/3/library/tracemalloc.html >`__
231
+ - `memray <https://bloomberg.github.io/memray/index.html >`__
232
+
233
+ Both CPU and memory:
234
+
235
+ - `Scalene <https://github.com/plasma-umass/scalene >`__ (see optional course material on :ref: `scalene `)
236
+
237
+ In the following sections, we will use :ref: `cProfile ` and :ref: `line-profiler ` to profile a Python program.
238
+ cProfile is a deterministic (tracing) profiler built-in to the Python standard library
239
+ and gives timings in function-level granularity.
240
+ Line profiler is also deterministic and it provides timings in line-of-code granularity for few selected
241
+ functions.
242
+
243
+ .. _cProfile :
163
244
cProfile
164
245
^^^^^^^^
165
246
166
- For more complex code, one can use the `built-in python profilers
167
- <https://docs.python.org/3/library/profile.html> `_, ``cProfile `` or ``profile ``.
168
-
169
247
As a demo, let us consider the following code which simulates a random walk in one dimension
170
248
(we can save it as ``walk.py `` or download from :download: `here <example/walk.py >`):
171
249
@@ -190,14 +268,14 @@ to a file with the ``-o`` flag and view it with `profile pstats module
190
268
<https://docs.python.org/3/library/profile.html#module-pstats> `__
191
269
or profile visualisation tools like
192
270
`Snakeviz <https://jiffyclub.github.io/snakeviz/ >`__
193
- or `profile-viewer <https://pypi.org/project/profile-viewer / >`__.
271
+ or `tuna <https://pypi.org/project/tuna / >`__.
194
272
195
273
.. note ::
196
274
197
275
Similar functionality is available in interactive IPython or Jupyter sessions with the
198
276
magic command `%%prun <https://ipython.readthedocs.io/en/stable/interactive/magics.html >`__.
199
277
200
-
278
+ .. _ line-profiler :
201
279
Line-profiler
202
280
^^^^^^^^^^^^^
203
281
@@ -274,11 +352,14 @@ line-by-line breakdown of where time is being spent. For this information, we ca
274
352
which is called thousands of times! Moving the module import to the top level saves
275
353
considerable time.
276
354
277
-
278
355
Performance optimization
279
356
------------------------
280
357
281
358
Once we have identified the bottlenecks, we need to make the corresponding code go faster.
359
+ The specific optimization can vary widely based on the computational load
360
+ (how big or small the data is, and how frequently a function is executed)
361
+ and particular problem at hand. Nevertheless, we present some common methods which can be
362
+ handy to know.
282
363
283
364
284
365
Algorithm optimization
0 commit comments