Skip to content

Commit 9f5bd14

Browse files
committed
Distinguish det. and sampling profilers, add scalene to optional
1 parent 2af09b2 commit 9f5bd14

File tree

4 files changed

+221
-7
lines changed

4 files changed

+221
-7
lines changed

content/example/scalene_web.png

173 KB
Loading

content/index.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ and distributed computing.
6868
pandas-extra
6969
GPU-computing
7070
parallel-computing_opt
71+
optimization_opt
7172

7273

7374
.. toctree::
@@ -136,7 +137,7 @@ Several examples and formulations are inspired by other open source educational
136137
- `Python for Data Analysis <https://github.com/wesm/pydata-book/>`__
137138
- `GTC2017-numba <https://github.com/ContinuumIO/gtc2017-numba/>`__
138139
- `IPython Cookbook <https://ipython-books.github.io/>`__
139-
- `Scipy Lecture Notes <https://scipy-lectures.org/>`__
140+
- `Scientific Python Lectures <https://lectures.scientific-python.org/>`__ (*previously known as, Scipy Lecture Notes*)
140141
- `Machine Learning and Data Science Notebooks <https://sebastianraschka.com/notebooks/ml-notebooks/>`__
141142
- `Elegant SciPy <https://github.com/elegant-scipy/notebooks/>`__
142143
- `A Comprehensive Guide to NumPy Data Types <https://axil.github.io/a-comprehensive-guide-to-numpy-data-types.html/>`__

content/optimization.rst

+87-6
Original file line numberDiff line numberDiff line change
@@ -159,13 +159,91 @@ to benchmark a full cell containing a block of code.
159159

160160
Profiling
161161
---------
162+
Profilers are applications which attach to the execution of the program, which in our case is done
163+
by the CPython interpreter and analyze the time taken for different portions of the code.
164+
Profilers help to identify performance bottlenecks in the code by showing
162165

166+
- wall-time (*or start to end time that the user observes),
167+
- CPU and GPU time, and
168+
- memory usage patterns
169+
170+
in **function/method/line of code** level granularity.
171+
172+
Deterministic profilers vs. sampling profilers
173+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
174+
175+
.. note::
176+
177+
*Deterministic profilers* are also called *tracing profilers*.
178+
179+
**Deterministic profilers** record every function call and event in the program,
180+
logging the exact sequence and duration of events.
181+
182+
👍 **Pros:**
183+
- Provides detailed information on the program's execution.
184+
- Deterministic: Captures exact call sequences and timings.
185+
👎 **Cons:**
186+
- Higher overhead, slowing down the program.
187+
- Can generate larger amount of data.
188+
189+
**Sampling profilers** periodically samples the program's state (where it is
190+
and how much memory is used), providing a statistical view of where time is
191+
spent.
192+
193+
👍 **Pros:**
194+
- Lower overhead, as it doesn't track every event.
195+
- Scales better with larger programs.
196+
197+
👎 **Cons:**
198+
- Less precise, potentially missing infrequent or short calls.
199+
- Provides an approximation rather than exact timing.
200+
201+
202+
.. discussion::
203+
204+
*Analogy*: Imagine we want to optimize the Stockholm Länstrafik (SL) metro system
205+
We wish to detect bottlenecks in the system to improve the service and for this we have
206+
asked few passengers to help us by tracking their journey.
207+
208+
- **Deterministic**:
209+
We follow every train and passenger, recording every stop
210+
and delay. When passengers enter and exit the train, we record the exact time
211+
and location.
212+
- **Sampling**:
213+
Every 5 minutes the phone notifies the passenger to note
214+
down their current location. We then use this information to estimate
215+
the most crowded stations and trains.
216+
217+
In addition to the above distinctions, some profilers can also
218+
219+
.. callout:: Examples of some profilers
220+
:class: dropdown
221+
222+
CPU profilers:
223+
224+
- `cProfile and profile <https://docs.python.org/3/library/profile.html>`__
225+
- `line_profiler <https://kernprof.readthedocs.io/>`__
226+
- `py-spy <https://github.com/benfred/py-spy>`__
227+
228+
Memory profilers:
229+
230+
- `tracemalloc <https://docs.python.org/3/library/tracemalloc.html>`__
231+
- `memray <https://bloomberg.github.io/memray/index.html>`__
232+
233+
Both CPU and memory:
234+
235+
- `Scalene <https://github.com/plasma-umass/scalene>`__ (see optional course material on :ref:`scalene`)
236+
237+
In the following sections, we will use :ref:`cProfile` and :ref:`line-profiler` to profile a Python program.
238+
cProfile is a deterministic (tracing) profiler built-in to the Python standard library
239+
and gives timings in function-level granularity.
240+
Line profiler is also deterministic and it provides timings in line-of-code granularity for few selected
241+
functions.
242+
243+
.. _cProfile:
163244
cProfile
164245
^^^^^^^^
165246

166-
For more complex code, one can use the `built-in python profilers
167-
<https://docs.python.org/3/library/profile.html>`_, ``cProfile`` or ``profile``.
168-
169247
As a demo, let us consider the following code which simulates a random walk in one dimension
170248
(we can save it as ``walk.py`` or download from :download:`here <example/walk.py>`):
171249

@@ -190,14 +268,14 @@ to a file with the ``-o`` flag and view it with `profile pstats module
190268
<https://docs.python.org/3/library/profile.html#module-pstats>`__
191269
or profile visualisation tools like
192270
`Snakeviz <https://jiffyclub.github.io/snakeviz/>`__
193-
or `profile-viewer <https://pypi.org/project/profile-viewer/>`__.
271+
or `tuna <https://pypi.org/project/tuna/>`__.
194272

195273
.. note::
196274

197275
Similar functionality is available in interactive IPython or Jupyter sessions with the
198276
magic command `%%prun <https://ipython.readthedocs.io/en/stable/interactive/magics.html>`__.
199277

200-
278+
.. _line-profiler:
201279
Line-profiler
202280
^^^^^^^^^^^^^
203281

@@ -274,11 +352,14 @@ line-by-line breakdown of where time is being spent. For this information, we ca
274352
which is called thousands of times! Moving the module import to the top level saves
275353
considerable time.
276354

277-
278355
Performance optimization
279356
------------------------
280357

281358
Once we have identified the bottlenecks, we need to make the corresponding code go faster.
359+
The specific optimization can vary widely based on the computational load
360+
(how big or small the data is, and how frequently a function is executed)
361+
and particular problem at hand. Nevertheless, we present some common methods which can be
362+
handy to know.
282363

283364

284365
Algorithm optimization

0 commit comments

Comments
 (0)