|
5 | 5 | <link href="https://learnbyexample.github.io/atom.xml" rel="self" type="application/atom+xml"/>
|
6 | 6 | <link href="https://learnbyexample.github.io"/>
|
7 | 7 | <generator uri="https://www.getzola.org/">Zola</generator>
|
8 |
| - <updated>2023-08-08T00:00:00+00:00</updated> |
| 8 | + <updated>2023-08-16T00:00:00+00:00</updated> |
9 | 9 | <id>https://learnbyexample.github.io/atom.xml</id>
|
| 10 | + <entry xml:lang="en"> |
| 11 | + <title>Python tip 32: positive lookarounds</title> |
| 12 | + <published>2023-08-16T00:00:00+00:00</published> |
| 13 | + <updated>2023-08-16T00:00:00+00:00</updated> |
| 14 | + <link rel="alternate" href="https://learnbyexample.github.io/tips/python-tip-32/" type="text/html"/> |
| 15 | + <id>https://learnbyexample.github.io/tips/python-tip-32/</id> |
| 16 | + <content type="html"><p>Lookarounds help to create custom anchors and add conditions within a regex definition. These assertions are also known as <strong>zero-width patterns</strong> because they add restrictions similar to anchors and are not part of the matched portions. Negative lookarounds were discussed in <a href="https://learnbyexample.github.io/tips/python-tip-29/">this post</a>. The syntax for positive lookarounds is shown below:</p> |
| 17 | +<ul> |
| 18 | +<li><code>(?=pat)</code> positive lookahead assertion</li> |
| 19 | +<li><code>(?&lt;=pat)</code> positive lookbehind assertion</li> |
| 20 | +</ul> |
| 21 | +<p>Here are some examples:</p> |
| 22 | +<pre data-lang="python" style="background-color:#f5f5f5;color:#1f1f1f;" class="language-python "><code class="language-python" data-lang="python"><span style="color:#72ab00;">&gt;&gt;&gt; </span><span>s </span><span style="color:#72ab00;">= </span><span style="color:#d07711;">&#39;42 apple-5, fig3; x-83, y-20: f12&#39; |
| 23 | +</span><span> |
| 24 | +</span><span style="color:#7f8989;"># extract digits only if it is followed by , |
| 25 | +</span><span style="color:#7f8989;"># note that end of string doesn&#39;t qualify as this is a positive assertion |
| 26 | +</span><span style="color:#72ab00;">&gt;&gt;&gt; </span><span>re.</span><span style="color:#5597d6;">findall</span><span>(</span><span style="color:#668f14;">r</span><span style="color:#d07711;">&#39;</span><span style="color:#aeb52b;">\d</span><span style="color:#72ab00;">+</span><span style="color:#7c8f4c;">(</span><span style="color:#aeb52b;">?=</span><span style="color:#7c8f4c;">,)</span><span style="color:#d07711;">&#39;</span><span>, s) |
| 27 | +</span><span>[</span><span style="color:#d07711;">&#39;5&#39;</span><span>, </span><span style="color:#d07711;">&#39;83&#39;</span><span>] |
| 28 | +</span><span> |
| 29 | +</span><span style="color:#7f8989;"># extract digits only if it is preceded by - and followed by ; or : |
| 30 | +</span><span style="color:#72ab00;">&gt;&gt;&gt; </span><span>re.</span><span style="color:#5597d6;">findall</span><span>(</span><span style="color:#668f14;">r</span><span style="color:#d07711;">&#39;</span><span style="color:#7c8f4c;">(</span><span style="color:#aeb52b;">?&lt;=</span><span style="color:#7c8f4c;">-)</span><span style="color:#aeb52b;">\d</span><span style="color:#72ab00;">+</span><span style="color:#7c8f4c;">(</span><span style="color:#aeb52b;">?=[:;]</span><span style="color:#7c8f4c;">)</span><span style="color:#d07711;">&#39;</span><span>, s) |
| 31 | +</span><span>[</span><span style="color:#d07711;">&#39;20&#39;</span><span>] |
| 32 | +</span><span> |
| 33 | +</span><span style="color:#7f8989;"># replace &#39;par&#39; as long as &#39;part&#39; occurs as a whole word later in the line |
| 34 | +</span><span style="color:#72ab00;">&gt;&gt;&gt; </span><span>re.</span><span style="color:#5597d6;">sub</span><span>(</span><span style="color:#668f14;">r</span><span style="color:#d07711;">&#39;</span><span style="color:#7c8f4c;">par(</span><span style="color:#aeb52b;">?=.</span><span style="color:#72ab00;">*\b</span><span style="color:#7c8f4c;">part</span><span style="color:#72ab00;">\b</span><span style="color:#7c8f4c;">)</span><span style="color:#d07711;">&#39;</span><span>, </span><span style="color:#d07711;">&#39;[</span><span style="text-decoration:underline;font-style:italic;color:#d2a8a1;">\g</span><span style="color:#d07711;">&lt;0&gt;]&#39;</span><span>, </span><span style="color:#d07711;">&#39;par spare part party&#39;</span><span>) |
| 35 | +</span><span style="color:#d07711;">&#39;[par] s[par]e part party&#39; |
| 36 | +</span></code></pre> |
| 37 | +<p>With lookbehind assertion (both positive and negative), the pattern used for the assertion cannot <em>imply</em> matching variable length of text. Fixed length quantifier is allowed. Different length alternations are not allowed, even if the individual alternations are of fixed length.</p> |
| 38 | +<pre data-lang="python" style="background-color:#f5f5f5;color:#1f1f1f;" class="language-python "><code class="language-python" data-lang="python"><span style="color:#72ab00;">&gt;&gt;&gt; </span><span>s </span><span style="color:#72ab00;">= </span><span style="color:#d07711;">&#39;pore42 tar3 dare7 care5&#39; |
| 39 | +</span><span> |
| 40 | +</span><span style="color:#7f8989;"># not allowed |
| 41 | +</span><span style="color:#72ab00;">&gt;&gt;&gt; </span><span>re.</span><span style="color:#5597d6;">findall</span><span>(</span><span style="color:#668f14;">r</span><span style="color:#d07711;">&#39;</span><span style="color:#7c8f4c;">(</span><span style="color:#aeb52b;">?&lt;=</span><span style="color:#7c8f4c;">tar</span><span style="color:#72ab00;">|</span><span style="color:#7c8f4c;">dare)</span><span style="color:#aeb52b;">\d</span><span style="color:#72ab00;">+</span><span style="color:#d07711;">&#39;</span><span>, s) |
| 42 | +</span><span>re.error: look</span><span style="color:#72ab00;">-</span><span>behind requires fixed</span><span style="color:#72ab00;">-</span><span>width pattern |
| 43 | +</span><span> |
| 44 | +</span><span style="color:#7f8989;"># workaround for r&#39;(?&lt;!tar|dare)\d+&#39; |
| 45 | +</span><span style="color:#72ab00;">&gt;&gt;&gt; </span><span>re.</span><span style="color:#5597d6;">findall</span><span>(</span><span style="color:#668f14;">r</span><span style="color:#d07711;">&#39;</span><span style="color:#7c8f4c;">(</span><span style="color:#aeb52b;">?&lt;!</span><span style="color:#7c8f4c;">tar)(</span><span style="color:#aeb52b;">?&lt;!</span><span style="color:#7c8f4c;">dare)</span><span style="color:#aeb52b;">\d</span><span style="color:#72ab00;">+</span><span style="color:#d07711;">&#39;</span><span>, s) |
| 46 | +</span><span>[</span><span style="color:#d07711;">&#39;42&#39;</span><span>, </span><span style="color:#d07711;">&#39;5&#39;</span><span>] |
| 47 | +</span><span> |
| 48 | +</span><span style="color:#7f8989;"># workaround for r&#39;(?&lt;=tar|dare)\d+&#39; |
| 49 | +</span><span style="color:#72ab00;">&gt;&gt;&gt; </span><span>re.</span><span style="color:#5597d6;">findall</span><span>(</span><span style="color:#668f14;">r</span><span style="color:#d07711;">&#39;</span><span style="color:#7c8f4c;">(?:(</span><span style="color:#aeb52b;">?&lt;=</span><span style="color:#7c8f4c;">tar)</span><span style="color:#72ab00;">|</span><span style="color:#7c8f4c;">(</span><span style="color:#aeb52b;">?&lt;=</span><span style="color:#7c8f4c;">dare))</span><span style="color:#aeb52b;">\d</span><span style="color:#72ab00;">+</span><span style="color:#d07711;">&#39;</span><span>, s) |
| 50 | +</span><span>[</span><span style="color:#d07711;">&#39;3&#39;</span><span>, </span><span style="color:#d07711;">&#39;7&#39;</span><span>] |
| 51 | +</span></code></pre> |
| 52 | +<p><img src="/images/info.svg" alt="info" /> The third-party <code>regex</code> module (<a href="https://pypi.org/project/regex/">https://pypi.org/project/regex/</a>) offers advanced features like variable-length lookbehinds, subexpression calls, etc.</p> |
| 53 | +<p><strong>Video demo</strong>:</p> |
| 54 | +<p align="center"><iframe width="560" height="315" loading="lazy" src="https://www.youtube.com/embed/Bu27WS-GExk" title="YouTube video player" frameborder="0" allow="accelerometer; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></p> |
| 55 | +<br> |
| 56 | +<p><img src="/images/info.svg" alt="info" /> See also my <a href="https://github.com/learnbyexample/100_page_python_intro">100 Page Python Intro</a> and <a href="https://github.com/learnbyexample/py_regular_expressions">Understanding Python re(gex)?</a> ebooks.</p> |
| 57 | +</content> |
| 58 | + </entry> |
10 | 59 | <entry xml:lang="en">
|
11 | 60 | <title>Vim tip 30: some general Vim settings</title>
|
12 | 61 | <published>2023-08-08T00:00:00+00:00</published>
|
|
0 commit comments