Skip to content

Commit

Permalink
Merge branch 'dev' of https://github.com/Watts-Lab/team_comm_tools in…
Browse files Browse the repository at this point in the history
…to amy/package_v2
  • Loading branch information
amytangzheng committed Nov 10, 2024
2 parents 3f31f07 + d15c105 commit 4f562cc
Show file tree
Hide file tree
Showing 86 changed files with 5,336 additions and 1,012 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ src/team_comm_tools/ipython_notebooks/.ipynb_checkpoints/
tests/ipython_notebooks/.ipynb_checkpoints/
tests/data/vector_data/
tests/test.log
tests/helper.ipynb
tests/output/*
tests/vector_data/*
src/utils/__pycache__/
Expand Down
Binary file modified docs/build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/build/doctrees/examples.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/feature_builder.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/basic_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/burstiness.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/certainty.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/discursive_diversity.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/fflow.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/get_all_DD_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/get_user_network.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/hedge.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/index.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/info_exchange_zscore.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/information_diversity.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/lexical_features_v2.doctree
Binary file not shown.
Binary file not shown.
Binary file modified docs/build/doctrees/features/other_lexical_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/politeness_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/politeness_v2.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/politeness_v2_helper.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/question_num.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/readability.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/reddit_tags.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/temporal_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/textblob_sentiment_analysis.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/turn_taking_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/variance_in_DD.doctree
Binary file not shown.
Binary file not shown.
Binary file modified docs/build/doctrees/features/word_mimicry.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features_conceptual/TEMPLATE.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified docs/build/doctrees/features_conceptual/index.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features_conceptual/mimicry_bert.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features_conceptual/moving_mimicry.doctree
Binary file not shown.
Binary file not shown.
Binary file modified docs/build/doctrees/features_conceptual/positivity_bert.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features_conceptual/turn_taking_index.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features_conceptual/word_ttr.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/index.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/intro.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/assign_chunk_nums.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/calculate_chat_level_features.doctree
Binary file not shown.
Binary file not shown.
Binary file modified docs/build/doctrees/utils/calculate_user_level_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/check_embeddings.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/gini_coefficient.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/index.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/preload_word_lists.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/preprocess.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/summarize_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/zscore_chats_and_conversation.doctree
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/build/html/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 9a01a2cd3d4384710101b4a99edd7683
config: d7678f479036f3220c73480ec4f2c467
tags: 645f666f9bcd5a90fca523b33c5a78b7
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,16 @@ Citation

Implementation Basics
**********************
To compute the feature, we count the number of shared content words (defined as anything that is not on the function word list) between the current and previous utterance in a conversation, then normalize it by the frequency of the word across all inputs in the dataset. This follows the original authors' method:
To compute the feature, we count the number of shared content words (defined as anything that is not on the function word list) between the current and previous utterance in a conversation, normalized by the frequency at which the word appears. This follows the original authors' method:

Content words are defined as any word that is not a function word. For each content word w in a given speaker’s turn, if w also occurs in the immediately preceding turn of the other, we count w as an accommodated content word. The raw count of accommodated content words is be the total number of these accommodated content words over every turn in the conversation side. Because content words vary widely in frequency, we normalized our counts by the frequency of each word.

For completeness, we interprete "the frequency of each word" in two distinct ways:

1. **The frequency of each word across the entire dataset (`content_word_accommodation`)**: here, we normalize non-function words with respect to the language used across all conversations in the dataset. This version of accommodation is useful if the entire dataset consists of similar conversations, or conversations about the same topic. Normalizing with respect to a larger dataset will be useful in establishing better estimates in identifying (and appropriately weigting) whichs words carry meaningful content in a particular domain.

2. **The frequency of each word within a given conversation (`content_word_accommodation_per_conv`)**: here, we normalize non-function words with respect only to the language in a given conversation. This version of accommodation is useful if the dataset consists of very distinct conversations, for which it may not make sense to assume that the distribution of which words are "important" will hold across different domains.

The feature requires a reference list of function words, which are defined by the original authors as follows.

**Auxiliary and copular verbs**
Expand Down
7 changes: 4 additions & 3 deletions docs/build/html/_static/searchtools.js
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ const Search = {

htmlToText: (htmlString, anchor) => {
const htmlElement = new DOMParser().parseFromString(htmlString, 'text/html');
for (const removalQuery of [".headerlinks", "script", "style"]) {
for (const removalQuery of [".headerlink", "script", "style"]) {
htmlElement.querySelectorAll(removalQuery).forEach((el) => { el.remove() });
}
if (anchor) {
Expand Down Expand Up @@ -328,13 +328,14 @@ const Search = {
for (const [title, foundTitles] of Object.entries(allTitles)) {
if (title.toLowerCase().trim().includes(queryLower) && (queryLower.length >= title.length/2)) {
for (const [file, id] of foundTitles) {
let score = Math.round(100 * queryLower.length / title.length)
const score = Math.round(Scorer.title * queryLower.length / title.length);
const boost = titles[file] === title ? 1 : 0; // add a boost for document titles
normalResults.push([
docNames[file],
titles[file] !== title ? `${titles[file]} > ${title}` : title,
id !== null ? "#" + id : "",
null,
score,
score + boost,
filenames[file],
]);
}
Expand Down
2 changes: 1 addition & 1 deletion docs/build/html/feature_builder.html
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@
<span id="feature-builder-module"></span><span id="feature-builder"></span><h1>feature_builder module<a class="headerlink" href="#module-feature_builder" title="Link to this heading"></a></h1>
<dl class="py class">
<dt class="sig sig-object py" id="feature_builder.FeatureBuilder">
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">feature_builder.</span></span><span class="sig-name descname"><span class="pre">FeatureBuilder</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="pre">input_df:</span> <span class="pre">~pandas.core.frame.DataFrame,</span> <span class="pre">vector_directory:</span> <span class="pre">./vector_data/,</span> <span class="pre">output_file_base='output',</span> <span class="pre">output_file_path_chat_level=None,</span> <span class="pre">output_file_path_user_level=None,</span> <span class="pre">output_file_path_conv_level=None,</span> <span class="pre">custom_features:</span> <span class="pre">list</span> <span class="pre">=</span> <span class="pre">[],</span> <span class="pre">analyze_first_pct:</span> <span class="pre">list</span> <span class="pre">=</span> <span class="pre">[1.0],</span> <span class="pre">turns:</span> <span class="pre">bool</span> <span class="pre">=</span> <span class="pre">False,</span> <span class="pre">conversation_id_col:</span> <span class="pre">str</span> <span class="pre">=</span> <span class="pre">'conversation_num',</span> <span class="pre">speaker_id_col:</span> <span class="pre">str</span> <span class="pre">=</span> <span class="pre">'speaker_nickname',</span> <span class="pre">message_col:</span> <span class="pre">str</span> <span class="pre">=</span> <span class="pre">'message',</span> <span class="pre">timestamp_col:</span> <span class="pre">str</span> <span class="pre">|</span> <span class="pre">tuple[str,</span> <span class="pre">str]</span> <span class="pre">=</span> <span class="pre">'timestamp',</span> <span class="pre">grouping_keys:</span> <span class="pre">list</span> <span class="pre">=</span> <span class="pre">[],</span> <span class="pre">cumulative_grouping=False,</span> <span class="pre">within_task=False,</span> <span class="pre">ner_training_df:</span> <span class="pre">~pandas.core.frame.DataFrame</span> <span class="pre">=</span> <span class="pre">None,</span> <span class="pre">ner_cutoff:</span> <span class="pre">int</span> <span class="pre">=</span> <span class="pre">0.9,</span> <span class="pre">regenerate_vectors:</span> <span class="pre">bool</span> <span class="pre">=</span> <span class="pre">False,</span> <span class="pre">compute_vectors_from_preprocessed:</span> <span class="pre">bool</span> <span class="pre">=</span> <span class="pre">False</span></em><span class="sig-paren">)</span><a class="headerlink" href="#feature_builder.FeatureBuilder" title="Link to this definition"></a></dt>
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">feature_builder.</span></span><span class="sig-name descname"><span class="pre">FeatureBuilder</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">input_df</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">DataFrame</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">vector_directory</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">'./vector_data/'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">output_file_base</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">'output'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">output_file_path_chat_level</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">output_file_path_user_level</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">output_file_path_conv_level</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">custom_features</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">list</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">[]</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">analyze_first_pct</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">list</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">[1.0]</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">turns</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">conversation_id_col</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">'conversation_num'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">speaker_id_col</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">'speaker_nickname'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">message_col</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">'message'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">timestamp_col</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">str</span><span class="w"> </span><span class="p"><span class="pre">|</span></span><span class="w"> </span><span class="pre">tuple</span><span class="p"><span class="pre">[</span></span><span class="pre">str</span><span class="p"><span class="pre">,</span></span><span class="w"> </span><span class="pre">str</span><span class="p"><span class="pre">]</span></span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">'timestamp'</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">grouping_keys</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">list</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">[]</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">cumulative_grouping</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">within_task</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">ner_training_df</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">DataFrame</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">ner_cutoff</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">int</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">0.9</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">regenerate_vectors</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">compute_vectors_from_preprocessed</span></span><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="n"><span class="pre">bool</span></span><span class="w"> </span><span class="o"><span class="pre">=</span></span><span class="w"> </span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#feature_builder.FeatureBuilder" title="Link to this definition"></a></dt>
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">object</span></code></p>
<p>The FeatureBuilder is the main engine that reads in the user’s inputs and specifications and generates
conversational features. The FeatureBuilder separately calls the classes (the ChatLevelFeaturesCalculator,
Expand Down
16 changes: 2 additions & 14 deletions docs/build/html/features/readability.html
Original file line number Diff line number Diff line change
Expand Up @@ -155,23 +155,11 @@
<dl class="py function">
<dt class="sig sig-object py" id="features.readability.count_syllables">
<span class="sig-prename descclassname"><span class="pre">features.readability.</span></span><span class="sig-name descname"><span class="pre">count_syllables</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">word</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#features.readability.count_syllables" title="Link to this definition"></a></dt>
<dd><p>Count the number of syllables in a word.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd"><p><strong>word</strong> (<em>str</em>) – The input word.</p>
</dd>
<dt class="field-even">Returns<span class="colon">:</span></dt>
<dd class="field-even"><p>The number of syllables in the word.</p>
</dd>
<dt class="field-odd">Return type<span class="colon">:</span></dt>
<dd class="field-odd"><p>int</p>
</dd>
</dl>
</dd></dl>
<dd></dd></dl>

<dl class="py function">
<dt class="sig sig-object py" id="features.readability.dale_chall_helper">
<span class="sig-prename descclassname"><span class="pre">features.readability.</span></span><span class="sig-name descname"><span class="pre">dale_chall_helper</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">text</span></span></em>, <em class="sig-param"><span class="o"><span class="pre">**</span></span><span class="n"><span class="pre">easy_words</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#features.readability.dale_chall_helper" title="Link to this definition"></a></dt>
<span class="sig-prename descclassname"><span class="pre">features.readability.</span></span><span class="sig-name descname"><span class="pre">dale_chall_helper</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">text</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">easy_words</span></span></em><span class="sig-paren">)</span><a class="headerlink" href="#features.readability.dale_chall_helper" title="Link to this definition"></a></dt>
<dd><p>Calculate the Dale-Chall readability score of a text. The Dale-Chall score are defined as:</p>
<blockquote>
<div><p>0.1579 * ((difficult_words / words) * 100) + 0.0496 * (words / sentences)</p>
Expand Down
Loading

0 comments on commit 4f562cc

Please sign in to comment.