|
71 | 71 | <li class="toctree-l4"><a class="reference internal" href="#utils.check_embeddings.is_valid_term"><code class="docutils literal notranslate"><span class="pre">is_valid_term()</span></code></a></li>
|
72 | 72 | <li class="toctree-l4"><a class="reference internal" href="#utils.check_embeddings.load_liwc_dict"><code class="docutils literal notranslate"><span class="pre">load_liwc_dict()</span></code></a></li>
|
73 | 73 | <li class="toctree-l4"><a class="reference internal" href="#utils.check_embeddings.read_in_lexicons"><code class="docutils literal notranslate"><span class="pre">read_in_lexicons()</span></code></a></li>
|
| 74 | +<li class="toctree-l4"><a class="reference internal" href="#utils.check_embeddings.sort_words"><code class="docutils literal notranslate"><span class="pre">sort_words()</span></code></a></li> |
74 | 75 | <li class="toctree-l4"><a class="reference internal" href="#utils.check_embeddings.str_to_vec"><code class="docutils literal notranslate"><span class="pre">str_to_vec()</span></code></a></li>
|
75 | 76 | </ul>
|
76 | 77 | </li>
|
|
275 | 276 |
|
276 | 277 | <dl class="py function">
|
277 | 278 | <dt class="sig sig-object py" id="utils.check_embeddings.is_valid_term">
|
278 |
| -<span class="sig-prename descclassname"><span class="pre">utils.check_embeddings.</span></span><span class="sig-name descname"><span class="pre">is_valid_term</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">dicTerm</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#utils.check_embeddings.is_valid_term" title="Link to this definition"></a></dt> |
| 279 | +<span class="sig-prename descclassname"><span class="pre">utils.check_embeddings.</span></span><span class="sig-name descname"><span class="pre">is_valid_term</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">dicTerm</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">→</span> <span class="sig-return-typehint"><span class="pre">bool</span></span></span><a class="headerlink" href="#utils.check_embeddings.is_valid_term" title="Link to this definition"></a></dt> |
279 | 280 | <dd><p>Check if a dictionary term is valid.</p>
|
280 |
| -<p>This function returns <cite>True</cite> if the term matches the regex pattern and <cite>False</cite> otherwise. |
281 |
| -The regex pattern matches:</p> |
| 281 | +<p>This function returns True if the term matches the regex pattern and False otherwise. |
| 282 | +The pattern matches the following criteria:</p> |
282 | 283 | <ul class="simple">
|
283 |
| -<li><p>Alphanumeric characters (a-z, A-Z, 0-9)</p></li> |
284 |
| -<li><p>Valid symbols: <cite>-</cite>, <cite>‘</cite>, <cite>*</cite>, <cite>/</cite></p></li> |
285 |
| -<li><p>The <cite>*</cite> symbol can appear only once at the end of a word</p></li> |
286 |
| -<li><p>Emojis are valid only when they appear alone</p></li> |
287 |
| -<li><p>The <cite>/</cite> symbol can appear only once after alphanumeric characters</p></li> |
| 284 | +<li><p>Alphanumeric characters (a-zA-Z0-9)</p></li> |
| 285 | +<li><p>Valid symbols: -, ‘, *, /</p></li> |
| 286 | +<li><p>The * symbol can only appear once at the end of a word</p></li> |
| 287 | +<li><p>8 emojis are valid only when they appear alone</p></li> |
| 288 | +<li><p>The / symbol can only appear once after alphanumeric characters</p></li> |
288 | 289 | <li><p>Spaces are allowed between valid words</p></li>
|
289 | 290 | </ul>
|
290 | 291 | <dl class="field-list simple">
|
291 | 292 | <dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
292 |
| -<dd class="field-odd"><p><strong>dicTerm</strong> (<em>str</em>) – The dictionary term to validate.</p> |
| 293 | +<dd class="field-odd"><p><strong>dicTerm</strong> (<em>str</em>) – The dictionary term</p> |
293 | 294 | </dd>
|
294 | 295 | <dt class="field-even">Returns<span class="colon">:</span></dt>
|
295 |
| -<dd class="field-even"><p><cite>True</cite> if the term is valid, <cite>False</cite> otherwise.</p> |
| 296 | +<dd class="field-even"><p>True if the term is valid, False otherwise</p> |
296 | 297 | </dd>
|
297 | 298 | <dt class="field-odd">Return type<span class="colon">:</span></dt>
|
298 | 299 | <dd class="field-odd"><p>bool</p>
|
|
308 | 309 | <p>This functions reads the content of a LIWC dictionary file in the official format,
|
309 | 310 | and convert it to a dictionary with lexicon: regular expression format.
|
310 | 311 | We assume the dicText has two parts: the header, which maps numbers to “category names,”
|
311 |
| -and the body, which maps words in the lexicon to different category numbers, separated by a ‘%’ sign.</p> |
| 312 | +and the body, which maps words in the lexicon to different category numbers, separated by ‘%’. |
| 313 | +Below is an example: |
| 314 | +‘’’ |
| 315 | +% |
| 316 | +1 function |
| 317 | +2 pronoun |
| 318 | +3 ppron |
| 319 | +% |
| 320 | +again 1 2 |
| 321 | +against 1 2 3 |
| 322 | +‘’’ |
| 323 | +Note that the elements in each line are separated by ‘ ‘.</p> |
312 | 324 | <dl class="field-list simple">
|
313 | 325 | <dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
314 | 326 | <dd class="field-odd"><p><strong>dicText</strong> (<em>str</em>) – The content of a .dic file</p>
|
|
327 | 339 | <span class="sig-prename descclassname"><span class="pre">utils.check_embeddings.</span></span><span class="sig-name descname"><span class="pre">read_in_lexicons</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">directory</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">lexicons_dict</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#utils.check_embeddings.read_in_lexicons" title="Link to this definition"></a></dt>
|
328 | 340 | <dd></dd></dl>
|
329 | 341 |
|
| 342 | +<dl class="py function"> |
| 343 | +<dt class="sig sig-object py" id="utils.check_embeddings.sort_words"> |
| 344 | +<span class="sig-prename descclassname"><span class="pre">utils.check_embeddings.</span></span><span class="sig-name descname"><span class="pre">sort_words</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">lexicons</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">list</span></span></em><span class="sig-paren">)</span> <span class="sig-return"><span class="sig-return-icon">→</span> <span class="sig-return-typehint"><span class="pre">str</span></span></span><a class="headerlink" href="#utils.check_embeddings.sort_words" title="Link to this definition"></a></dt> |
| 345 | +<dd><p>Sorts the dictionary terms in a list.</p> |
| 346 | +<p>This function sorts the dictionary terms in a list by their length in descending order. |
| 347 | +The hyphenated words are sorted first, followed by the non-hyphenated words.</p> |
| 348 | +<dl class="field-list simple"> |
| 349 | +<dt class="field-odd">Parameters<span class="colon">:</span></dt> |
| 350 | +<dd class="field-odd"><p><strong>dicTerms</strong> (<em>list</em>) – List of dictionary terms</p> |
| 351 | +</dd> |
| 352 | +<dt class="field-even">Returns<span class="colon">:</span></dt> |
| 353 | +<dd class="field-even"><p>dicTerms</p> |
| 354 | +</dd> |
| 355 | +<dt class="field-odd">Return type<span class="colon">:</span></dt> |
| 356 | +<dd class="field-odd"><p>str</p> |
| 357 | +</dd> |
| 358 | +</dl> |
| 359 | +</dd></dl> |
| 360 | + |
330 | 361 | <dl class="py function">
|
331 | 362 | <dt class="sig sig-object py" id="utils.check_embeddings.str_to_vec">
|
332 | 363 | <span class="sig-prename descclassname"><span class="pre">utils.check_embeddings.</span></span><span class="sig-name descname"><span class="pre">str_to_vec</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">str_vec</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#utils.check_embeddings.str_to_vec" title="Link to this definition"></a></dt>
|
|
0 commit comments