Skip to content

Commit dbca34f

Browse files
Convokit 3.0 Mega Pull Request (#199)
* fix use of mutability in Coordination transformer. * run black formatter * fixed coordination with efficient implementation * comments for changes * metadata field deepcopy * documentation and website update for V3.0 * get dataframe mutation fix * fix get dataframe mutability * modify 3.0 documentation * revert get dataframe fixes * pairer maximize pair mode fix * backendMapper, config documentation * goodbye to python3.7 * release date update * remove all storage reference * update release date --------- Co-authored-by: Cristian Danescu-Niculescu-Mizil <[email protected]>
1 parent 7630163 commit dbca34f

File tree

62 files changed

+20667
-7521
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

62 files changed

+20667
-7521
lines changed

README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,13 @@
44
<!-- ALL-CONTRIBUTORS-BADGE:END -->
55

66
[![pypi](https://img.shields.io/pypi/v/convokit.svg)](https://pypi.org/pypi/convokit/)
7-
[![py\_versions](https://img.shields.io/badge/python-3.7%2B-blue)](https://pypi.org/pypi/convokit/)
7+
[![py\_versions](https://img.shields.io/badge/python-3.8%2B-blue)](https://pypi.org/pypi/convokit/)
88
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
99
[![license](https://img.shields.io/badge/license-MIT-green)](https://github.com/CornellNLP/ConvoKit/blob/master/LICENSE.md)
1010
[![Slack Community](https://img.shields.io/static/v1?logo=slack&style=flat&color=red&label=slack&message=community)](https://join.slack.com/t/convokit/shared_invite/zt-1axq34qrp-1hDXQrvSXClIbJOqw4S03Q)
1111

1212

13-
This toolkit contains tools to extract conversational features and analyze social phenomena in conversations, using a [single unified interface](https://convokit.cornell.edu/documentation/architecture.html) inspired by (and compatible with) scikit-learn. Several large [conversational datasets](https://github.com/CornellNLP/ConvoKit#datasets) are included together with scripts exemplifying the use of the toolkit on these datasets. The latest version is [2.5.3](https://github.com/CornellNLP/ConvoKit/releases/tag/v2.5.2) (released 16 Jan 2022); follow the [project on GitHub](https://github.com/CornellNLP/ConvoKit) to keep track of updates.
13+
This toolkit contains tools to extract conversational features and analyze social phenomena in conversations, using a [single unified interface](https://convokit.cornell.edu/documentation/architecture.html) inspired by (and compatible with) scikit-learn. Several large [conversational datasets](https://github.com/CornellNLP/ConvoKit#datasets) are included together with scripts exemplifying the use of the toolkit on these datasets. The latest version is [3.0.0](https://github.com/CornellNLP/ConvoKit/releases/tag/v3.0.0) (released July 17, 2023); follow the [project on GitHub](https://github.com/CornellNLP/ConvoKit) to keep track of updates.
1414

1515
Read our [documentation](https://convokit.cornell.edu/documentation) or try ConvoKit in our [interactive tutorial](https://colab.research.google.com/github/CornellNLP/ConvoKit/blob/master/examples/Introduction_to_ConvoKit.ipynb).
1616

convokit/convokitConfig.py

+5-5
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,13 @@
44

55

66
DEFAULT_CONFIG_CONTENTS = (
7-
"# Default Storage Parameters\n"
7+
"# Default Backend Parameters\n"
88
"db_host: localhost:27017\n"
99
"data_directory: ~/.convokit/saved-corpora\n"
10-
"default_storage_mode: mem"
10+
"default_backend: mem"
1111
)
1212

13-
ENV_VARS = {"db_host": "CONVOKIT_DB_HOST", "default_storage_mode": "CONVOKIT_STORAGE_MODE"}
13+
ENV_VARS = {"db_host": "CONVOKIT_DB_HOST", "default_backend": "CONVOKIT_BACKEND"}
1414

1515

1616
class ConvoKitConfig:
@@ -52,5 +52,5 @@ def data_directory(self):
5252
return self.config_contents.get("data_directory", "~/.convokit/saved-corpora")
5353

5454
@property
55-
def default_storage_mode(self):
56-
return self._get_config_from_env_or_file("default_storage_mode", "mem")
55+
def default_backend(self):
56+
return self._get_config_from_env_or_file("default_backend", "mem")

convokit/coordination/coordination.py

+14-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
from collections import defaultdict
22
from typing import Callable, Tuple, List, Dict, Optional, Collection, Union
3+
import copy
34

45
import pkg_resources
56

@@ -108,11 +109,22 @@ def transform(self, corpus: Corpus) -> Corpus:
108109
utterance_thresh_func=self.utterance_thresh_func,
109110
)
110111

112+
# Keep record of all score update for all (speakers, target) pairs to avoid redundant operations
113+
todo = {}
114+
111115
for (speaker, target), score in pair_scores.items():
112116
if self.coordination_attribute_name not in speaker.meta:
113117
speaker.meta[self.coordination_attribute_name] = {}
114-
speaker.meta[self.coordination_attribute_name][target.id] = score
115-
118+
key = (speaker, target.id)
119+
todo.update({key: score})
120+
121+
for key, score in todo.items():
122+
speaker = key[0]
123+
target = key[1]
124+
# For avoiding mutability for the sake of DB corpus
125+
temp_dict = copy.deepcopy(speaker.meta[self.coordination_attribute_name])
126+
temp_dict[target] = score
127+
speaker.meta[self.coordination_attribute_name] = temp_dict
116128
assert isinstance(speaker, Speaker)
117129

118130
return corpus

0 commit comments

Comments
 (0)