Watts-Lab
diff --git a/‎.gitignore
Lines changed: 1 addition & 0 deletions b/‎.gitignore
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/build/doctrees/environment.pickle
25.5 KB b/‎docs/build/doctrees/environment.pickle
25.5 KB
diff --git a/‎docs/build/doctrees/examples.doctree
9.37 KB b/‎docs/build/doctrees/examples.doctree
9.37 KB
diff --git a/‎docs/build/doctrees/index.doctree
9.33 KB b/‎docs/build/doctrees/index.doctree
9.33 KB
diff --git a/‎docs/build/html/_sources/examples.rst.txt
Lines changed: 59 additions & 0 deletions b/‎docs/build/html/_sources/examples.rst.txt
Lines changed: 59 additions & 0 deletions
diff --git a/‎docs/build/html/_sources/index.rst.txt
Lines changed: 65 additions & 0 deletions b/‎docs/build/html/_sources/index.rst.txt
Lines changed: 65 additions & 0 deletions
diff --git a/‎docs/build/html/examples.html
Lines changed: 52 additions & 0 deletions b/‎docs/build/html/examples.html
Lines changed: 52 additions & 0 deletions
@@ -31,6 +31,7 @@ MANIFEST
 .DS_Store
 
 # unwanted files
+*/filtered_dict.json
 src/team_comm_tools/features/lexicons/liwc_lexicons/*
 src/team_comm_tools/features/lexicons/liwc_lexicons_small_test/*
 src/team_comm_tools/features/lexicons/certainty.txt
 
@@ -277,3 +277,62 @@ Here are some additional design details of the FeatureBuilder that you may wish
 		* The only caveat to this rule is if you happen to have a column that is named exactly the same as one of the conversation features that we generate. In that case, your column will be overwritten. Please refer to `<https://teamcommtools.seas.upenn.edu/HowItWorks>`_ for a list of all the features we generate, along with their column names.
 
 	* **When summarizing features from the utterance level to the conversation and speaker level, we only consider numeric features.** This is perhaps a simplifying assumption more than anything else; although we do extract non-numeric information (for example, a Dale-Chall label of whether an utterance is "Easy" to ready or not; a list of named entities identified), we cannot summarize these efficiently, so they are not considered.
+
+Inspecting Generated Features
+++++++++++++++++++++++++++++++
+
+Feature Information
+^^^^^^^^^^^^^^^^^^^^^
+Every FeatureBuilder object has an underlying property called the **feature_dict**, which lists information and references about the features included in the toolkit. Assuming that **jury_feature_builder** is the name of your FeatureBuilder, you can access the feature dictionary as follows:
+
+.. code-block:: python
+
+   jury_feature_builder.feature_dict
+
+The keys of this dictionary are the formal feature names, and the value is a JSON blob with information about the feature or collection of features. A more nicely-displayed version of this dictionary is also available on our `website <https://teamcommtools.seas.upenn.edu/HowItWorks>`_.
+
+**New in v.0.1.4**: To access a list of the formal feature names that a FeatureBuilder will generate, you can use the **feature_names** property: 
+
+.. code-block:: python
+   
+   jury_feature_builder.feature_names # a list of formal feature names included in featurization (e.g., "Team Burstiness")
+
+You can also use the **feature_names** property in tandem with the **feature_dict** to learn more about a specific feature; for example, the following code will show the dictionary entry for the first feature in **feature_names**:
+
+.. code-block:: python
+
+	jury_feature_builder.feature_dict[jury_feature_builder.feature_names[0]]
+
+Here is some example output (for the RoBERTa sentiment feature):
+
+.. code-block:: text
+	
+	{'columns': ['positive_bert', 'negative_bert', 'neutral_bert'],
+	 'file': './utils/check_embeddings.py',
+	 'level': 'Chat',
+	 'semantic_grouping': 'Emotion',
+	 'description': 'The extent to which a statement is positive, negative, or neutral, as assigned by Cardiffnlp/twitter-roberta-base-sentiment-latest. The total scores (Positive, Negative, Neutral) sum to 1.',
+	 'references': '(Hugging Face, 2023)',
+	 'wiki_link': 'https://conversational-featurizer.readthedocs.io/en/latest/features_conceptual/positivity_bert.html',
+	 'function': <function team_comm_tools.utils.calculate_chat_level_features.ChatLevelFeaturesCalculator.concat_bert_features(self) -> None>,
+	 'dependencies': [],
+	 'preprocess': [],
+	 'vect_data': False,
+	 'bert_sentiment_data': True}
+
+Feature Column Names
+^^^^^^^^^^^^^^^^^^^^^
+
+Once you call **.featurize()**, you can also obtain a convenient list of the feature columns generated by the toolkit:
+
+.. code-block:: python
+   
+   jury_feature_builder.chat_features # a list of the feature columns generated at the chat (utterance) level
+   jury_feature_builder.conv_features_base # a list of the base (non-aggregated) feature columns at the conversation level
+   jury_feature_builder.conv_features_all # a list of all feature columns at the conversation level, including aggregates
+
+These lists may be useful to you if you'd like to inspect which features in the output dataframe come from the FeatureBuilder; for example:
+
+.. code-block:: python
+
+	jury_output_chat_level[jury_feature_builder.chat_features]
@@ -44,6 +44,8 @@ After you import the package and install dependencies, you can then use our tool
 Using the Package
 ******************
 
+Declaring a FeatureBuilder
++++++++++++++++++++++++++++
 Once you import the tool, you will be able to declare a FeatureBuilder object, which is the heart of our tool. Here is some sample syntax:
 
 .. code-block:: python
@@ -78,6 +80,69 @@ Once you import the tool, you will be able to declare a FeatureBuilder object, w
    # this line of code runs the FeatureBuilder on your data
    my_feature_builder.featurize()
 
+Inspecting Generated Features
+++++++++++++++++++++++++++++++
+
+Feature Information
+^^^^^^^^^^^^^^^^^^^^^
+Every FeatureBuilder object has an underlying property called the **feature_dict**, which lists information and references about the features included in the toolkit. Assuming that **my_feature_builder** is the name of your FeatureBuilder, you can access the feature dictionary as follows:
+
+.. code-block:: python
+
+   my_feature_builder.feature_dict
+
+The keys of this dictionary are the formal feature names, and the value is a JSON blob with information about the feature or collection of features. A more nicely-displayed version of this dictionary is also available on our `website <https://teamcommtools.seas.upenn.edu/HowItWorks>`_.
+
+**New in v.0.1.4**: To access a list of the formal feature names that a FeatureBuilder will generate, you can use the **feature_names** property: 
+
+.. code-block:: python
+   
+   my_feature_builder.feature_names # a list of formal feature names included in featurization (e.g., "Team Burstiness")
+
+You can also use the **feature_names** property in tandem with the **feature_dict** to learn more about a specific feature; for example, the following code will show the dictionary entry for the first feature in **feature_names**:
+
+.. code-block:: python
+
+   my_feature_builder.feature_dict[my_feature_builder.feature_names[0]]
+
+Here is some example output (for the RoBERTa sentiment feature):
+
+.. code-block:: text
+   
+   {'columns': ['positive_bert', 'negative_bert', 'neutral_bert'],
+    'file': './utils/check_embeddings.py',
+    'level': 'Chat',
+    'semantic_grouping': 'Emotion',
+    'description': 'The extent to which a statement is positive, negative, or neutral, as assigned by Cardiffnlp/twitter-roberta-base-sentiment-latest. The total scores (Positive, Negative, Neutral) sum to 1.',
+    'references': '(Hugging Face, 2023)',
+    'wiki_link': 'https://conversational-featurizer.readthedocs.io/en/latest/features_conceptual/positivity_bert.html',
+    'function': <function team_comm_tools.utils.calculate_chat_level_features.ChatLevelFeaturesCalculator.concat_bert_features(self) -> None>,
+    'dependencies': [],
+    'preprocess': [],
+    'vect_data': False,
+    'bert_sentiment_data': True}
+
+Feature Column Names
+^^^^^^^^^^^^^^^^^^^^^
+
+Once you call **.featurize()**, you can also obtain a convenient list of the feature columns generated by the toolkit:
+
+.. code-block:: python
+   
+   my_feature_builder.chat_features # a list of the feature columns generated at the chat (utterance) level
+   my_feature_builder.conv_features_base # a list of the base (non-aggregated) feature columns at the conversation level
+   my_feature_builder.conv_features_all # a list of all feature columns at the conversation level, including aggregates
+
+These lists may be useful to you if you'd like to inspect which features in the output dataframe come from the FeatureBuilder; for example:
+
+.. code-block:: python
+
+   jury_output_chat_level[my_feature_builder.chat_features]
+
+
+Table of Contents
+******************
+
 Use the Table of Contents below to learn more about our tool. We recommend that you begin in the "Introduction" section, then explore other sections of the documentation as they become relevant to you. We recommend reading :ref:`basics` for a high-level overview of the requirements and parameters, and then reading through the :ref:`examples` for a detailed walkthrough and discussion of considerations.
 
 .. toctree::
 
@@ -66,6 +66,11 @@
 </ul>
 </li>
 <li class="toctree-l3"><a class="reference internal" href="#additional-featurebuilder-considerations">Additional FeatureBuilder Considerations</a></li>
+<li class="toctree-l3"><a class="reference internal" href="#inspecting-generated-features">Inspecting Generated Features</a><ul>
+<li class="toctree-l4"><a class="reference internal" href="#feature-information">Feature Information</a></li>
+<li class="toctree-l4"><a class="reference internal" href="#feature-column-names">Feature Column Names</a></li>
+</ul>
+</li>
 </ul>
 </li>
 </ul>
@@ -373,6 +378,53 @@ <h3>Additional FeatureBuilder Considerations<a class="headerlink" href="#additio
 </ul>
 </div></blockquote>
 </section>
+<section id="inspecting-generated-features">
+<h3>Inspecting Generated Features<a class="headerlink" href="#inspecting-generated-features" title="Link to this heading"></a></h3>
+<section id="feature-information">
+<h4>Feature Information<a class="headerlink" href="#feature-information" title="Link to this heading"></a></h4>
+<p>Every FeatureBuilder object has an underlying property called the <strong>feature_dict</strong>, which lists information and references about the features included in the toolkit. Assuming that <strong>jury_feature_builder</strong> is the name of your FeatureBuilder, you can access the feature dictionary as follows:</p>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">jury_feature_builder</span><span class="o">.</span><span class="n">feature_dict</span>
+</pre></div>
+</div>
+<p>The keys of this dictionary are the formal feature names, and the value is a JSON blob with information about the feature or collection of features. A more nicely-displayed version of this dictionary is also available on our <a class="reference external" href="https://teamcommtools.seas.upenn.edu/HowItWorks">website</a>.</p>
+<p><strong>New in v.0.1.4</strong>: To access a list of the formal feature names that a FeatureBuilder will generate, you can use the <strong>feature_names</strong> property:</p>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">jury_feature_builder</span><span class="o">.</span><span class="n">feature_names</span> <span class="c1"># a list of formal feature names included in featurization (e.g., &quot;Team Burstiness&quot;)</span>
+</pre></div>
+</div>
+<p>You can also use the <strong>feature_names</strong> property in tandem with the <strong>feature_dict</strong> to learn more about a specific feature; for example, the following code will show the dictionary entry for the first feature in <strong>feature_names</strong>:</p>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">jury_feature_builder</span><span class="o">.</span><span class="n">feature_dict</span><span class="p">[</span><span class="n">jury_feature_builder</span><span class="o">.</span><span class="n">feature_names</span><span class="p">[</span><span class="mi">0</span><span class="p">]]</span>
+</pre></div>
+</div>
+<p>Here is some example output (for the RoBERTa sentiment feature):</p>
+<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>{&#39;columns&#39;: [&#39;positive_bert&#39;, &#39;negative_bert&#39;, &#39;neutral_bert&#39;],
+ &#39;file&#39;: &#39;./utils/check_embeddings.py&#39;,
+ &#39;level&#39;: &#39;Chat&#39;,
+ &#39;semantic_grouping&#39;: &#39;Emotion&#39;,
+ &#39;description&#39;: &#39;The extent to which a statement is positive, negative, or neutral, as assigned by Cardiffnlp/twitter-roberta-base-sentiment-latest. The total scores (Positive, Negative, Neutral) sum to 1.&#39;,
+ &#39;references&#39;: &#39;(Hugging Face, 2023)&#39;,
+ &#39;wiki_link&#39;: &#39;https://conversational-featurizer.readthedocs.io/en/latest/features_conceptual/positivity_bert.html&#39;,
+ &#39;function&#39;: &lt;function team_comm_tools.utils.calculate_chat_level_features.ChatLevelFeaturesCalculator.concat_bert_features(self) -&gt; None&gt;,
+ &#39;dependencies&#39;: [],
+ &#39;preprocess&#39;: [],
+ &#39;vect_data&#39;: False,
+ &#39;bert_sentiment_data&#39;: True}
+</pre></div>
+</div>
+</section>
+<section id="feature-column-names">
+<h4>Feature Column Names<a class="headerlink" href="#feature-column-names" title="Link to this heading"></a></h4>
+<p>Once you call <strong>.featurize()</strong>, you can also obtain a convenient list of the feature columns generated by the toolkit:</p>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">jury_feature_builder</span><span class="o">.</span><span class="n">chat_features</span> <span class="c1"># a list of the feature columns generated at the chat (utterance) level</span>
+<span class="n">jury_feature_builder</span><span class="o">.</span><span class="n">conv_features_base</span> <span class="c1"># a list of the base (non-aggregated) feature columns at the conversation level</span>
+<span class="n">jury_feature_builder</span><span class="o">.</span><span class="n">conv_features_all</span> <span class="c1"># a list of all feature columns at the conversation level, including aggregates</span>
+</pre></div>
+</div>
+<p>These lists may be useful to you if you’d like to inspect which features in the output dataframe come from the FeatureBuilder; for example:</p>
+<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">jury_output_chat_level</span><span class="p">[</span><span class="n">jury_feature_builder</span><span class="o">.</span><span class="n">chat_features</span><span class="p">]</span>
+</pre></div>
+</div>
+</section>
+</section>
 </section>
 </section>