Skip to content

Commit 44f71be

Browse files
committed
address #304
1 parent 7905240 commit 44f71be

File tree

16 files changed

+1714
-2088
lines changed

16 files changed

+1714
-2088
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ MANIFEST
3131
.DS_Store
3232

3333
# unwanted files
34+
*/filtered_dict.json
3435
src/team_comm_tools/features/lexicons/liwc_lexicons/*
3536
src/team_comm_tools/features/lexicons/liwc_lexicons_small_test/*
3637
src/team_comm_tools/features/lexicons/certainty.txt
25.5 KB
Binary file not shown.

docs/build/doctrees/examples.doctree

9.37 KB
Binary file not shown.

docs/build/doctrees/index.doctree

9.33 KB
Binary file not shown.

docs/build/html/_sources/examples.rst.txt

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -277,3 +277,62 @@ Here are some additional design details of the FeatureBuilder that you may wish
277277
* The only caveat to this rule is if you happen to have a column that is named exactly the same as one of the conversation features that we generate. In that case, your column will be overwritten. Please refer to `<https://teamcommtools.seas.upenn.edu/HowItWorks>`_ for a list of all the features we generate, along with their column names.
278278

279279
* **When summarizing features from the utterance level to the conversation and speaker level, we only consider numeric features.** This is perhaps a simplifying assumption more than anything else; although we do extract non-numeric information (for example, a Dale-Chall label of whether an utterance is "Easy" to ready or not; a list of named entities identified), we cannot summarize these efficiently, so they are not considered.
280+
281+
Inspecting Generated Features
282+
++++++++++++++++++++++++++++++
283+
284+
Feature Information
285+
^^^^^^^^^^^^^^^^^^^^^
286+
Every FeatureBuilder object has an underlying property called the **feature_dict**, which lists information and references about the features included in the toolkit. Assuming that **jury_feature_builder** is the name of your FeatureBuilder, you can access the feature dictionary as follows:
287+
288+
.. code-block:: python
289+
290+
jury_feature_builder.feature_dict
291+
292+
The keys of this dictionary are the formal feature names, and the value is a JSON blob with information about the feature or collection of features. A more nicely-displayed version of this dictionary is also available on our `website <https://teamcommtools.seas.upenn.edu/HowItWorks>`_.
293+
294+
**New in v.0.1.4**: To access a list of the formal feature names that a FeatureBuilder will generate, you can use the **feature_names** property:
295+
296+
.. code-block:: python
297+
298+
jury_feature_builder.feature_names # a list of formal feature names included in featurization (e.g., "Team Burstiness")
299+
300+
You can also use the **feature_names** property in tandem with the **feature_dict** to learn more about a specific feature; for example, the following code will show the dictionary entry for the first feature in **feature_names**:
301+
302+
.. code-block:: python
303+
304+
jury_feature_builder.feature_dict[jury_feature_builder.feature_names[0]]
305+
306+
Here is some example output (for the RoBERTa sentiment feature):
307+
308+
.. code-block:: text
309+
310+
{'columns': ['positive_bert', 'negative_bert', 'neutral_bert'],
311+
'file': './utils/check_embeddings.py',
312+
'level': 'Chat',
313+
'semantic_grouping': 'Emotion',
314+
'description': 'The extent to which a statement is positive, negative, or neutral, as assigned by Cardiffnlp/twitter-roberta-base-sentiment-latest. The total scores (Positive, Negative, Neutral) sum to 1.',
315+
'references': '(Hugging Face, 2023)',
316+
'wiki_link': 'https://conversational-featurizer.readthedocs.io/en/latest/features_conceptual/positivity_bert.html',
317+
'function': <function team_comm_tools.utils.calculate_chat_level_features.ChatLevelFeaturesCalculator.concat_bert_features(self) -> None>,
318+
'dependencies': [],
319+
'preprocess': [],
320+
'vect_data': False,
321+
'bert_sentiment_data': True}
322+
323+
Feature Column Names
324+
^^^^^^^^^^^^^^^^^^^^^
325+
326+
Once you call **.featurize()**, you can also obtain a convenient list of the feature columns generated by the toolkit:
327+
328+
.. code-block:: python
329+
330+
jury_feature_builder.chat_features # a list of the feature columns generated at the chat (utterance) level
331+
jury_feature_builder.conv_features_base # a list of the base (non-aggregated) feature columns at the conversation level
332+
jury_feature_builder.conv_features_all # a list of all feature columns at the conversation level, including aggregates
333+
334+
These lists may be useful to you if you'd like to inspect which features in the output dataframe come from the FeatureBuilder; for example:
335+
336+
.. code-block:: python
337+
338+
jury_output_chat_level[jury_feature_builder.chat_features]

docs/build/html/_sources/index.rst.txt

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,8 @@ After you import the package and install dependencies, you can then use our tool
4444
Using the Package
4545
******************
4646

47+
Declaring a FeatureBuilder
48+
+++++++++++++++++++++++++++
4749
Once you import the tool, you will be able to declare a FeatureBuilder object, which is the heart of our tool. Here is some sample syntax:
4850

4951
.. code-block:: python
@@ -78,6 +80,69 @@ Once you import the tool, you will be able to declare a FeatureBuilder object, w
7880
# this line of code runs the FeatureBuilder on your data
7981
my_feature_builder.featurize()
8082
83+
Inspecting Generated Features
84+
++++++++++++++++++++++++++++++
85+
86+
Feature Information
87+
^^^^^^^^^^^^^^^^^^^^^
88+
Every FeatureBuilder object has an underlying property called the **feature_dict**, which lists information and references about the features included in the toolkit. Assuming that **my_feature_builder** is the name of your FeatureBuilder, you can access the feature dictionary as follows:
89+
90+
.. code-block:: python
91+
92+
my_feature_builder.feature_dict
93+
94+
The keys of this dictionary are the formal feature names, and the value is a JSON blob with information about the feature or collection of features. A more nicely-displayed version of this dictionary is also available on our `website <https://teamcommtools.seas.upenn.edu/HowItWorks>`_.
95+
96+
**New in v.0.1.4**: To access a list of the formal feature names that a FeatureBuilder will generate, you can use the **feature_names** property:
97+
98+
.. code-block:: python
99+
100+
my_feature_builder.feature_names # a list of formal feature names included in featurization (e.g., "Team Burstiness")
101+
102+
You can also use the **feature_names** property in tandem with the **feature_dict** to learn more about a specific feature; for example, the following code will show the dictionary entry for the first feature in **feature_names**:
103+
104+
.. code-block:: python
105+
106+
my_feature_builder.feature_dict[my_feature_builder.feature_names[0]]
107+
108+
Here is some example output (for the RoBERTa sentiment feature):
109+
110+
.. code-block:: text
111+
112+
{'columns': ['positive_bert', 'negative_bert', 'neutral_bert'],
113+
'file': './utils/check_embeddings.py',
114+
'level': 'Chat',
115+
'semantic_grouping': 'Emotion',
116+
'description': 'The extent to which a statement is positive, negative, or neutral, as assigned by Cardiffnlp/twitter-roberta-base-sentiment-latest. The total scores (Positive, Negative, Neutral) sum to 1.',
117+
'references': '(Hugging Face, 2023)',
118+
'wiki_link': 'https://conversational-featurizer.readthedocs.io/en/latest/features_conceptual/positivity_bert.html',
119+
'function': <function team_comm_tools.utils.calculate_chat_level_features.ChatLevelFeaturesCalculator.concat_bert_features(self) -> None>,
120+
'dependencies': [],
121+
'preprocess': [],
122+
'vect_data': False,
123+
'bert_sentiment_data': True}
124+
125+
Feature Column Names
126+
^^^^^^^^^^^^^^^^^^^^^
127+
128+
Once you call **.featurize()**, you can also obtain a convenient list of the feature columns generated by the toolkit:
129+
130+
.. code-block:: python
131+
132+
my_feature_builder.chat_features # a list of the feature columns generated at the chat (utterance) level
133+
my_feature_builder.conv_features_base # a list of the base (non-aggregated) feature columns at the conversation level
134+
my_feature_builder.conv_features_all # a list of all feature columns at the conversation level, including aggregates
135+
136+
These lists may be useful to you if you'd like to inspect which features in the output dataframe come from the FeatureBuilder; for example:
137+
138+
.. code-block:: python
139+
140+
jury_output_chat_level[my_feature_builder.chat_features]
141+
142+
143+
Table of Contents
144+
******************
145+
81146
Use the Table of Contents below to learn more about our tool. We recommend that you begin in the "Introduction" section, then explore other sections of the documentation as they become relevant to you. We recommend reading :ref:`basics` for a high-level overview of the requirements and parameters, and then reading through the :ref:`examples` for a detailed walkthrough and discussion of considerations.
82147

83148
.. toctree::

docs/build/html/examples.html

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,11 @@
6666
</ul>
6767
</li>
6868
<li class="toctree-l3"><a class="reference internal" href="#additional-featurebuilder-considerations">Additional FeatureBuilder Considerations</a></li>
69+
<li class="toctree-l3"><a class="reference internal" href="#inspecting-generated-features">Inspecting Generated Features</a><ul>
70+
<li class="toctree-l4"><a class="reference internal" href="#feature-information">Feature Information</a></li>
71+
<li class="toctree-l4"><a class="reference internal" href="#feature-column-names">Feature Column Names</a></li>
72+
</ul>
73+
</li>
6974
</ul>
7075
</li>
7176
</ul>
@@ -373,6 +378,53 @@ <h3>Additional FeatureBuilder Considerations<a class="headerlink" href="#additio
373378
</ul>
374379
</div></blockquote>
375380
</section>
381+
<section id="inspecting-generated-features">
382+
<h3>Inspecting Generated Features<a class="headerlink" href="#inspecting-generated-features" title="Link to this heading"></a></h3>
383+
<section id="feature-information">
384+
<h4>Feature Information<a class="headerlink" href="#feature-information" title="Link to this heading"></a></h4>
385+
<p>Every FeatureBuilder object has an underlying property called the <strong>feature_dict</strong>, which lists information and references about the features included in the toolkit. Assuming that <strong>jury_feature_builder</strong> is the name of your FeatureBuilder, you can access the feature dictionary as follows:</p>
386+
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">jury_feature_builder</span><span class="o">.</span><span class="n">feature_dict</span>
387+
</pre></div>
388+
</div>
389+
<p>The keys of this dictionary are the formal feature names, and the value is a JSON blob with information about the feature or collection of features. A more nicely-displayed version of this dictionary is also available on our <a class="reference external" href="https://teamcommtools.seas.upenn.edu/HowItWorks">website</a>.</p>
390+
<p><strong>New in v.0.1.4</strong>: To access a list of the formal feature names that a FeatureBuilder will generate, you can use the <strong>feature_names</strong> property:</p>
391+
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">jury_feature_builder</span><span class="o">.</span><span class="n">feature_names</span> <span class="c1"># a list of formal feature names included in featurization (e.g., &quot;Team Burstiness&quot;)</span>
392+
</pre></div>
393+
</div>
394+
<p>You can also use the <strong>feature_names</strong> property in tandem with the <strong>feature_dict</strong> to learn more about a specific feature; for example, the following code will show the dictionary entry for the first feature in <strong>feature_names</strong>:</p>
395+
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">jury_feature_builder</span><span class="o">.</span><span class="n">feature_dict</span><span class="p">[</span><span class="n">jury_feature_builder</span><span class="o">.</span><span class="n">feature_names</span><span class="p">[</span><span class="mi">0</span><span class="p">]]</span>
396+
</pre></div>
397+
</div>
398+
<p>Here is some example output (for the RoBERTa sentiment feature):</p>
399+
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>{&#39;columns&#39;: [&#39;positive_bert&#39;, &#39;negative_bert&#39;, &#39;neutral_bert&#39;],
400+
&#39;file&#39;: &#39;./utils/check_embeddings.py&#39;,
401+
&#39;level&#39;: &#39;Chat&#39;,
402+
&#39;semantic_grouping&#39;: &#39;Emotion&#39;,
403+
&#39;description&#39;: &#39;The extent to which a statement is positive, negative, or neutral, as assigned by Cardiffnlp/twitter-roberta-base-sentiment-latest. The total scores (Positive, Negative, Neutral) sum to 1.&#39;,
404+
&#39;references&#39;: &#39;(Hugging Face, 2023)&#39;,
405+
&#39;wiki_link&#39;: &#39;https://conversational-featurizer.readthedocs.io/en/latest/features_conceptual/positivity_bert.html&#39;,
406+
&#39;function&#39;: &lt;function team_comm_tools.utils.calculate_chat_level_features.ChatLevelFeaturesCalculator.concat_bert_features(self) -&gt; None&gt;,
407+
&#39;dependencies&#39;: [],
408+
&#39;preprocess&#39;: [],
409+
&#39;vect_data&#39;: False,
410+
&#39;bert_sentiment_data&#39;: True}
411+
</pre></div>
412+
</div>
413+
</section>
414+
<section id="feature-column-names">
415+
<h4>Feature Column Names<a class="headerlink" href="#feature-column-names" title="Link to this heading"></a></h4>
416+
<p>Once you call <strong>.featurize()</strong>, you can also obtain a convenient list of the feature columns generated by the toolkit:</p>
417+
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">jury_feature_builder</span><span class="o">.</span><span class="n">chat_features</span> <span class="c1"># a list of the feature columns generated at the chat (utterance) level</span>
418+
<span class="n">jury_feature_builder</span><span class="o">.</span><span class="n">conv_features_base</span> <span class="c1"># a list of the base (non-aggregated) feature columns at the conversation level</span>
419+
<span class="n">jury_feature_builder</span><span class="o">.</span><span class="n">conv_features_all</span> <span class="c1"># a list of all feature columns at the conversation level, including aggregates</span>
420+
</pre></div>
421+
</div>
422+
<p>These lists may be useful to you if you’d like to inspect which features in the output dataframe come from the FeatureBuilder; for example:</p>
423+
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">jury_output_chat_level</span><span class="p">[</span><span class="n">jury_feature_builder</span><span class="o">.</span><span class="n">chat_features</span><span class="p">]</span>
424+
</pre></div>
425+
</div>
426+
</section>
427+
</section>
376428
</section>
377429
</section>
378430

0 commit comments

Comments
 (0)