Skip to content

Commit

Permalink
v.0.1.4 Release (#315)
Browse files Browse the repository at this point in the history
* Bump path-to-regexp and express in /website (#298)

* Add Examples Notebook (#294)

* Urgent fix to remove LIWC lexicons from public repo (#279)

* delete small test lexicons

* move .pkl files to assets and remove from GH

* filesystem cleanup

* update certainty pickle location

* remove unpickling certainty

* remove lexicons from pyproject

* change lexical pkl path

* add error handling when lexicons are not found

* update warning message

* add legal caveat and update name of certainty pkl to be correct

* ensure lexicons are ignored

* Update Documentation (Complete Conceptual Documentation, Document Assumptions) (#289)

* new docs

* lexicons hotfix

* emilys doc edits

* update deprecated github actions to latest

* update last remaining text features

* update index

* update docs

* update index

* update docs

* update docs and the feature dictionary

* add basics.rst

* add new basics page

* update docs

---------

Co-authored-by: Xinlan Emily Hu <[email protected]>
Co-authored-by: Xinlan Emily Hu <[email protected]>

* update torch requirements to resolve compatibility issue on torch end (#290)

* Update Website (#291)

* website updates

* renaming tpm-website to website

* deploying via gh-pages

* changed from tpm-website to website

* deployed website

* copyright and team

* team headshots and footer

* edits to the pages

* website updates

* updated links

* updated homepage

* link updates

* mobile compatibility

* mobile adjustments

* navbar mobile updates

* whitespace edits

* homepage updates

* feature table

* website updates

* renaming tpm-website to website

* deploying via gh-pages

* changed from tpm-website to website

* edits to the pages

* website updates

* updated links

* updated homepage

* link updates

* mobile compatibility

* mobile adjustments

* navbar mobile updates

* homepage updates

* add table of features

* updated team page titles

* include flask in requirements.txt

* updates to table of features

* load pages from top

* fix to 404 issues

* moved build under website folder

* updates to package launch

* hyperlink ./setup.sh

* fix nav bar sizing and hamburger logo

* include preprint

* updates to "getting started"

* update team

---------

Co-authored-by: amytangzheng <[email protected]>

* update documentation for clarity and correct typos in positivity z-score and information exchange and liwc

* add demo notebook

* update notebook and add information to docs

* update documentation

---------

Co-authored-by: Shruti Agarwal <[email protected]>
Co-authored-by: amytangzheng <[email protected]>

* Bump path-to-regexp and express in /website

Bumps [path-to-regexp](https://github.com/pillarjs/path-to-regexp) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.

Updates `path-to-regexp` from 0.1.7 to 0.1.10
- [Release notes](https://github.com/pillarjs/path-to-regexp/releases)
- [Changelog](https://github.com/pillarjs/path-to-regexp/blob/master/History.md)
- [Commits](pillarjs/path-to-regexp@v0.1.7...v0.1.10)

Updates `express` from 4.19.2 to 4.21.0
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.0/History.md)
- [Commits](expressjs/express@4.19.2...4.21.0)

---
updated-dependencies:
- dependency-name: path-to-regexp
  dependency-type: indirect
- dependency-name: express
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Xinlan Emily Hu <[email protected]>
Co-authored-by: Shruti Agarwal <[email protected]>
Co-authored-by: amytangzheng <[email protected]>
Co-authored-by: Xinlan Emily Hu <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump nltk from 3.8.1 to 3.9 (#297)

* Add Examples Notebook (#294)

* Urgent fix to remove LIWC lexicons from public repo (#279)

* delete small test lexicons

* move .pkl files to assets and remove from GH

* filesystem cleanup

* update certainty pickle location

* remove unpickling certainty

* remove lexicons from pyproject

* change lexical pkl path

* add error handling when lexicons are not found

* update warning message

* add legal caveat and update name of certainty pkl to be correct

* ensure lexicons are ignored

* Update Documentation (Complete Conceptual Documentation, Document Assumptions) (#289)

* new docs

* lexicons hotfix

* emilys doc edits

* update deprecated github actions to latest

* update last remaining text features

* update index

* update docs

* update index

* update docs

* update docs and the feature dictionary

* add basics.rst

* add new basics page

* update docs

---------

Co-authored-by: Xinlan Emily Hu <[email protected]>
Co-authored-by: Xinlan Emily Hu <[email protected]>

* update torch requirements to resolve compatibility issue on torch end (#290)

* Update Website (#291)

* website updates

* renaming tpm-website to website

* deploying via gh-pages

* changed from tpm-website to website

* deployed website

* copyright and team

* team headshots and footer

* edits to the pages

* website updates

* updated links

* updated homepage

* link updates

* mobile compatibility

* mobile adjustments

* navbar mobile updates

* whitespace edits

* homepage updates

* feature table

* website updates

* renaming tpm-website to website

* deploying via gh-pages

* changed from tpm-website to website

* edits to the pages

* website updates

* updated links

* updated homepage

* link updates

* mobile compatibility

* mobile adjustments

* navbar mobile updates

* homepage updates

* add table of features

* updated team page titles

* include flask in requirements.txt

* updates to table of features

* load pages from top

* fix to 404 issues

* moved build under website folder

* updates to package launch

* hyperlink ./setup.sh

* fix nav bar sizing and hamburger logo

* include preprint

* updates to "getting started"

* update team

---------

Co-authored-by: amytangzheng <[email protected]>

* update documentation for clarity and correct typos in positivity z-score and information exchange and liwc

* add demo notebook

* update notebook and add information to docs

* update documentation

---------

Co-authored-by: Shruti Agarwal <[email protected]>
Co-authored-by: amytangzheng <[email protected]>

* Bump nltk from 3.8.1 to 3.9

Bumps [nltk](https://github.com/nltk/nltk) from 3.8.1 to 3.9.
- [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog)
- [Commits](nltk/nltk@3.8.1...3.9)

---
updated-dependencies:
- dependency-name: nltk
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

* Update pyproject.toml

* Update requirements.txt

* Update download_resources.py

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Xinlan Emily Hu <[email protected]>
Co-authored-by: Shruti Agarwal <[email protected]>
Co-authored-by: amytangzheng <[email protected]>
Co-authored-by: Xinlan Emily Hu <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump body-parser and express in /website (#296)

* Add Examples Notebook (#294)

* Urgent fix to remove LIWC lexicons from public repo (#279)

* delete small test lexicons

* move .pkl files to assets and remove from GH

* filesystem cleanup

* update certainty pickle location

* remove unpickling certainty

* remove lexicons from pyproject

* change lexical pkl path

* add error handling when lexicons are not found

* update warning message

* add legal caveat and update name of certainty pkl to be correct

* ensure lexicons are ignored

* Update Documentation (Complete Conceptual Documentation, Document Assumptions) (#289)

* new docs

* lexicons hotfix

* emilys doc edits

* update deprecated github actions to latest

* update last remaining text features

* update index

* update docs

* update index

* update docs

* update docs and the feature dictionary

* add basics.rst

* add new basics page

* update docs

---------

Co-authored-by: Xinlan Emily Hu <[email protected]>
Co-authored-by: Xinlan Emily Hu <[email protected]>

* update torch requirements to resolve compatibility issue on torch end (#290)

* Update Website (#291)

* website updates

* renaming tpm-website to website

* deploying via gh-pages

* changed from tpm-website to website

* deployed website

* copyright and team

* team headshots and footer

* edits to the pages

* website updates

* updated links

* updated homepage

* link updates

* mobile compatibility

* mobile adjustments

* navbar mobile updates

* whitespace edits

* homepage updates

* feature table

* website updates

* renaming tpm-website to website

* deploying via gh-pages

* changed from tpm-website to website

* edits to the pages

* website updates

* updated links

* updated homepage

* link updates

* mobile compatibility

* mobile adjustments

* navbar mobile updates

* homepage updates

* add table of features

* updated team page titles

* include flask in requirements.txt

* updates to table of features

* load pages from top

* fix to 404 issues

* moved build under website folder

* updates to package launch

* hyperlink ./setup.sh

* fix nav bar sizing and hamburger logo

* include preprint

* updates to "getting started"

* update team

---------

Co-authored-by: amytangzheng <[email protected]>

* update documentation for clarity and correct typos in positivity z-score and information exchange and liwc

* add demo notebook

* update notebook and add information to docs

* update documentation

---------

Co-authored-by: Shruti Agarwal <[email protected]>
Co-authored-by: amytangzheng <[email protected]>

* Bump body-parser and express in /website

Bumps [body-parser](https://github.com/expressjs/body-parser) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.

Updates `body-parser` from 1.20.2 to 1.20.3
- [Release notes](https://github.com/expressjs/body-parser/releases)
- [Changelog](https://github.com/expressjs/body-parser/blob/master/HISTORY.md)
- [Commits](expressjs/body-parser@1.20.2...1.20.3)

Updates `express` from 4.19.2 to 4.21.0
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.0/History.md)
- [Commits](expressjs/express@4.19.2...4.21.0)

---
updated-dependencies:
- dependency-name: body-parser
  dependency-type: indirect
- dependency-name: express
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Xinlan Emily Hu <[email protected]>
Co-authored-by: Shruti Agarwal <[email protected]>
Co-authored-by: amytangzheng <[email protected]>
Co-authored-by: Xinlan Emily Hu <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Check embedding update (#295)

* Add Examples Notebook (#294)

* Urgent fix to remove LIWC lexicons from public repo (#279)

* delete small test lexicons

* move .pkl files to assets and remove from GH

* filesystem cleanup

* update certainty pickle location

* remove unpickling certainty

* remove lexicons from pyproject

* change lexical pkl path

* add error handling when lexicons are not found

* update warning message

* add legal caveat and update name of certainty pkl to be correct

* ensure lexicons are ignored

* Update Documentation (Complete Conceptual Documentation, Document Assumptions) (#289)

* new docs

* lexicons hotfix

* emilys doc edits

* update deprecated github actions to latest

* update last remaining text features

* update index

* update docs

* update index

* update docs

* update docs and the feature dictionary

* add basics.rst

* add new basics page

* update docs

---------

Co-authored-by: Xinlan Emily Hu <[email protected]>
Co-authored-by: Xinlan Emily Hu <[email protected]>

* update torch requirements to resolve compatibility issue on torch end (#290)

* Update Website (#291)

* website updates

* renaming tpm-website to website

* deploying via gh-pages

* changed from tpm-website to website

* deployed website

* copyright and team

* team headshots and footer

* edits to the pages

* website updates

* updated links

* updated homepage

* link updates

* mobile compatibility

* mobile adjustments

* navbar mobile updates

* whitespace edits

* homepage updates

* feature table

* website updates

* renaming tpm-website to website

* deploying via gh-pages

* changed from tpm-website to website

* edits to the pages

* website updates

* updated links

* updated homepage

* link updates

* mobile compatibility

* mobile adjustments

* navbar mobile updates

* homepage updates

* add table of features

* updated team page titles

* include flask in requirements.txt

* updates to table of features

* load pages from top

* fix to 404 issues

* moved build under website folder

* updates to package launch

* hyperlink ./setup.sh

* fix nav bar sizing and hamburger logo

* include preprint

* updates to "getting started"

* update team

---------

Co-authored-by: amytangzheng <[email protected]>

* update documentation for clarity and correct typos in positivity z-score and information exchange and liwc

* add demo notebook

* update notebook and add information to docs

* update documentation

---------

Co-authored-by: Shruti Agarwal <[email protected]>
Co-authored-by: amytangzheng <[email protected]>

* update check embeddings with tqdm loading bar and BERT tokenization update

* (1) allow BERT sentiments to be generated from the messages with punctuation, rather than the preprocessed messages; (2) batch BERT sentiment generation to make it more efficient; (3) add loading bar for generation of chat-level features

---------

Co-authored-by: Shruti Agarwal <[email protected]>
Co-authored-by: amytangzheng <[email protected]>

* Update README.md to remove col = "message"

* Closes #302.

* Amy/website (#301)

* website updates

* renaming tpm-website to website

* deploying via gh-pages

* changed from tpm-website to website

* deployed website

* copyright and team

* team headshots and footer

* edits to the pages

* website updates

* updated links

* updated homepage

* link updates

* mobile compatibility

* mobile adjustments

* navbar mobile updates

* whitespace edits

* homepage updates

* feature table

* website updates

* renaming tpm-website to website

* deploying via gh-pages

* changed from tpm-website to website

* edits to the pages

* website updates

* updated links

* updated homepage

* link updates

* mobile compatibility

* mobile adjustments

* navbar mobile updates

* homepage updates

* add table of features

* updated team page titles

* include flask in requirements.txt

* updates to table of features

* load pages from top

* fix to 404 issues

* moved build under website folder

* updates to package launch

* hyperlink ./setup.sh

* fix nav bar sizing and hamburger logo

* include preprint

* updates to "getting started"

* update team

* gh actions and custom domain

* deploy to custom url

* deploy to custom url

* updates to cname

* changes to cname

* cname updates

* testing github actions

* updates to github-actions-website

* testing github actions

* updates to gh actions

* updates to github-actions

* update home for testing gh actions

* updates CNAME

* update testing email

* updates username/email

* updates to email in github-actions-website

* testing gh actions for feature_dict

* testing github-actions feature_dict

* updates to github-actions-feature_dict

* Update github-actions-feature_dict.yaml

* testing updates to feature_dict.py

* testing feature_dict updates

* testing updates to feature_dict.py

* testing feature_dict deployment

* Update github-actions-feature_dict.yaml

* testing feature_dict updates

* testing updates to feature_dict.py

* updates to feature_dict

* updates to github actions feature_dict

* testing auto updates to feature_dict

* Update feature_dict.py

* testing feature_dict auto updates

* testing feature_dict auto updates

* Update feature_dict.py

* testing feature_dict auto updates

* remove commented code in feature_dict.py

* Delete src/team_comm_tools/filtered_dict.json

delete test json file

* Update github-actions-website.yaml to deploy on update to dev

* put 'dev' in quotes

* Update github-actions-feature_dict.yaml to update upon dev

* re-add filtered dict

---------

Co-authored-by: Xinlan Emily Hu <[email protected]>
Co-authored-by: Xinlan Emily Hu <[email protected]>

* Update github-actions-website.yaml (#309)

* Update github-actions-feature_dict.yaml (#308)

* Package updates in Amy/website (#310)

* website updates

* renaming tpm-website to website

* deploying via gh-pages

* changed from tpm-website to website

* deployed website

* copyright and team

* team headshots and footer

* edits to the pages

* website updates

* updated links

* updated homepage

* link updates

* mobile compatibility

* mobile adjustments

* navbar mobile updates

* whitespace edits

* homepage updates

* feature table

* website updates

* renaming tpm-website to website

* deploying via gh-pages

* changed from tpm-website to website

* edits to the pages

* website updates

* updated links

* updated homepage

* link updates

* mobile compatibility

* mobile adjustments

* navbar mobile updates

* homepage updates

* add table of features

* updated team page titles

* include flask in requirements.txt

* updates to table of features

* load pages from top

* fix to 404 issues

* moved build under website folder

* updates to package launch

* hyperlink ./setup.sh

* fix nav bar sizing and hamburger logo

* include preprint

* updates to "getting started"

* update team

* gh actions and custom domain

* deploy to custom url

* deploy to custom url

* updates to cname

* changes to cname

* cname updates

* testing github actions

* updates to github-actions-website

* testing github actions

* updates to gh actions

* updates to github-actions

* update home for testing gh actions

* updates CNAME

* update testing email

* updates username/email

* updates to email in github-actions-website

* testing gh actions for feature_dict

* testing github-actions feature_dict

* updates to github-actions-feature_dict

* Update github-actions-feature_dict.yaml

* testing updates to feature_dict.py

* testing feature_dict updates

* testing updates to feature_dict.py

* testing feature_dict deployment

* Update github-actions-feature_dict.yaml

* testing feature_dict updates

* testing updates to feature_dict.py

* updates to feature_dict

* updates to github actions feature_dict

* testing auto updates to feature_dict

* Update feature_dict.py

* testing feature_dict auto updates

* testing feature_dict auto updates

* Update feature_dict.py

* testing feature_dict auto updates

* remove commented code in feature_dict.py

* Delete src/team_comm_tools/filtered_dict.json

delete test json file

* Update github-actions-website.yaml to deploy on update to dev

* put 'dev' in quotes

* Update github-actions-feature_dict.yaml to update upon dev

* re-add filtered dict

* update packages for website

---------

Co-authored-by: amytangzheng <[email protected]>

* Update package-lock.json to local version

* Update package-lock.json

* Update package.json

* Update package-lock.json

* Fix "@babel/plugin-proposal-private-property-in-object" error (#311)

* Update package-lock.json

* Update package.json

* upgrade node packages

* update team page + try to remove some of the deprecated packages

* Revert "update team page + try to remove some of the deprecated packages"

This reverts commit d04037d.

* revert attempts to upgrade packages

* Denormalize liwc (#312)

* address #306

* fix hedges reference and update dictionary

* address #300 (#313)

* Address issues with making feature names more clear; have cleaner defaults for output params (#314)

* address #304

* address #286 and #299

* small fix to ensure filtered_dict does not generate in every run

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Shruti Agarwal <[email protected]>
Co-authored-by: amytangzheng <[email protected]>
Co-authored-by: amytangzheng <[email protected]>
  • Loading branch information
5 people authored Oct 8, 2024
1 parent ea7dc19 commit 9b9ce16
Show file tree
Hide file tree
Showing 96 changed files with 6,198 additions and 4,579 deletions.
60 changes: 60 additions & 0 deletions .github/workflows/github-actions-feature_dict.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
name: Deploy feature_dict to AWS Lambda
run-name: ${{ github.actor }} is deploying the feature dictionary to AWS

on:
push:
branches:
- 'dev'
paths:
- 'src/team_comm_tools/feature_dict.py'

jobs:
deploy:
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Set Up Python
uses: actions/setup-python@v4
with:
python-version: "3.11"

- name: Install dependencies
run: |
python -m pip install --upgrade pip
./setup.sh
pip install flask
pip install awscli
- name: Install package
run: pip install .

# Run the feature_dict.py file to generate filtered_dict.json
- name: Run feature_dict.py
run: |
cd src
cd team_comm_tools
python feature_dict.py run
- name: Package Lambda function
run: |
mkdir package
pip install --target ./package flask
cp src/team_comm_tools/feature_dict.py ./package # Copies feature_dict.py
cp src/team_comm_tools/lambda_function.py ./package # Copies lambda_function.py
cp src/team_comm_tools/filtered_dict.json ./package # Copies filtered_dict.json
cd package
zip -r ../function.zip . # Packages the Lambda function
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ secrets.AWS_REGION }}

- name: Update Lambda function
run: |
aws lambda update-function-code --function-name ${{ secrets.LAMBDA_FUNCTION_NAME }} --zip-file fileb://function.zip
46 changes: 46 additions & 0 deletions .github/workflows/github-actions-website.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: Deploy Website on Commit
run-name: ${{ github.actor }} is deploying the website

on:
push:
branches:
- 'dev'
paths:
- 'website/**' # Only trigger when changes occur in the website folder

jobs:
deploy:

runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20.15.0'

- name: Install dependencies
run: npm ci
working-directory: ./website # Navigate to the website folder

- name: Build the project
run: npm run build
working-directory: ./website

- name: Add CNAME file
run: echo 'teamcommtools.seas.upenn.edu' > ./website/build/CNAME

- name: Configure Git
run: |
git config --global user.email "[email protected]"
git config --global user.name "team_comm_tools_admin"
working-directory: ./website

- name: Deploy
run: |
git remote set-url origin https://x-access-token:${{ secrets.GITHUB_TOKEN }}@github.com/${{ github.repository }}.git
npm run deploy
working-directory: ./website # Run deploy inside the website folder
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ MANIFEST
.DS_Store

# unwanted files
*/filtered_dict.json
src/team_comm_tools/features/lexicons/liwc_lexicons/*
src/team_comm_tools/features/lexicons/liwc_lexicons_small_test/*
src/team_comm_tools/features/lexicons/certainty.txt
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ my_feature_builder = FeatureBuilder(
)

# this line of code runs the FeatureBuilder on your data
my_feature_builder.featurize(col="message")
my_feature_builder.featurize()
```

### Data Format
Expand All @@ -112,4 +112,4 @@ For more information, please refer to the [Introduction on our Read the Docs Pag
Please visit our website, [https://teamcommtools.seas.upenn.edu/](https://teamcommtools.seas.upenn.edu/), for general information about our project and research. For more detailed documentation on our features and examples, please visit our [Read the Docs Page](https://conversational-featurizer.readthedocs.io/en/latest/).

# Becoming a Contributor
If you would like to make pull requests to this open-sourced repository, please read our [GitHub Repo Getting Started Guide](/github_repo_getting_started.md). We welcome new feature contributions or improvements to our framework.
If you would like to make pull requests to this open-sourced repository, please read our [GitHub Repo Getting Started Guide](/github_repo_getting_started.md). We welcome new feature contributions or improvements to our framework.
Binary file modified docs/build/doctrees/environment.pickle
Binary file not shown.
Binary file modified docs/build/doctrees/examples.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/feature_builder.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/basic_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/burstiness.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/certainty.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/discursive_diversity.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/fflow.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/get_all_DD_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/get_user_network.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/hedge.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/index.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/info_exchange_zscore.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/information_diversity.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/lexical_features_v2.doctree
Binary file not shown.
Binary file not shown.
Binary file modified docs/build/doctrees/features/other_lexical_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/politeness_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/politeness_v2.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/politeness_v2_helper.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/question_num.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/readability.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/reddit_tags.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/temporal_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/textblob_sentiment_analysis.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/turn_taking_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features/variance_in_DD.doctree
Binary file not shown.
Binary file not shown.
Binary file modified docs/build/doctrees/features/word_mimicry.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features_conceptual/TEMPLATE.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified docs/build/doctrees/features_conceptual/index.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features_conceptual/mimicry_bert.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features_conceptual/moving_mimicry.doctree
Binary file not shown.
Binary file not shown.
Binary file modified docs/build/doctrees/features_conceptual/positivity_bert.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features_conceptual/turn_taking_index.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/features_conceptual/word_ttr.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/index.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/intro.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/assign_chunk_nums.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/calculate_chat_level_features.doctree
Binary file not shown.
Binary file not shown.
Binary file modified docs/build/doctrees/utils/calculate_user_level_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/check_embeddings.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/gini_coefficient.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/index.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/preload_word_lists.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/preprocess.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/summarize_features.doctree
Binary file not shown.
Binary file modified docs/build/doctrees/utils/zscore_chats_and_conversation.doctree
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/build/html/.buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: d7678f479036f3220c73480ec4f2c467
config: 9a01a2cd3d4384710101b4a99edd7683
tags: 645f666f9bcd5a90fca523b33c5a78b7
100 changes: 92 additions & 8 deletions docs/build/html/_sources/examples.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -85,16 +85,17 @@ Now we are ready to call the FeatureBuilder on our data. All we need to do is de
timestamp_col = "timestamp",
grouping_keys = ["batch_num", "round_num"],
vector_directory = "./vector_data/",
output_file_path_chat_level = "./jury_output_chat_level.csv",
output_file_path_user_level = "./jury_output_user_level.csv",
output_file_path_conv_level = "./jury_output_conversation_level.csv",
output_file_base = "jury_output",
turns = True
)
jury_feature_builder.featurize(col="message")
jury_feature_builder.featurize()
Basic Input Columns
^^^^^^^^^^^^^^^^^^^^

Conversation Parameters
"""""""""""""""""""""""""

* The **input_df** parameter is where you pass in your dataframe. In this case, we want to run the FeatureBuilder on the juries data that we read in!

* The **speaker_id_col** refers to the name of the column containing a unique identifier for each speaker / participant in the conversation. Here, in the data, the name of our columns is called "speaker_nickname."
Expand All @@ -105,6 +106,8 @@ Basic Input Columns

* If you do not pass anything in, "message" is the default value for this parameter.

* We assume that all messages are ordered chronologically.

* The **timestamp_col** refers to the name of the column containing when each utterance was said. In this case, we have exactly one timestamp for each message, stored in "timestamp."

* If you do not pass anything in, "timestamp" is the default value for this parameter.
Expand All @@ -125,21 +128,39 @@ Basic Input Columns
conversation_id_col = "batch_num"
Vector Directory
""""""""""""""""""

* The **vector_directory** is the name of a directory in which we will store some pre-processed information. Some features require running inference from HuggingFace's `RoBERTa-based sentiment model <https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment>`_, and others require generating `SBERT vectors <https://sbert.net/>`_. These processes take time, and we cache the outputs so that subsequent runs of the FeatureBuilder on the same dataset will not take as much time. Therefore, we require you to pass in a location where you'd like us to save these outputs.

* By default, the directory is named "vector_data/."

* **Note that we do not require the name of the vector directory to be a folder that already exists**; if it doesn't exist, we will create it for you.

* Inside the folder, we will store the RoBERTa outputs in a subfolder called "sentiment", and the SBERT vectors in a subfolder called "sentence." We will create both of these subfolders for you.

* The **turns** parameter, which we will discuss later, controls whether or not you'd like the FeatureBuilder to treat successive utterances by the same individual as a single "turn," or whether you'd like them to be treated separately. We will cache different versions of outputs based on this parameter; we use a subfolder called "chats" (when **turns=False**) or "turns" (when **turns=True**).

* There are three output files for each run of the FeatureBuilder, which mirror the three levels of analysis: utterance-, speaker-, and conversation-level. (Please see the section on `Generating Features: Utterance-, Speaker-, and Conversation-Level <intro#generating_features>`_ for more details.) However, this means that we require you to provide a path for where you would like us to store each of the output files; **output_file_path_chat_level** (Utterance- or Chat-Level Features), **output_file_path_user_level** (Speaker- or User-Level Features), and **output_file_path_conv_level** (Conversation-Level Features).
.. _output_file_details:

Output File Naming Details
""""""""""""""""""""""""""""

* There are three output files for each run of the FeatureBuilder, which mirror the three levels of analysis: utterance-, speaker-, and conversation-level. (Please see the section on `Generating Features: Utterance-, Speaker-, and Conversation-Level <intro#generating_features>`_ for more details.) These are generated using the **output_file_base** parameter.

* **All of the outputs will be generated in a folder called "output."**

* Within the "output" folder, **we generate sub-folders such that the three files will be located in subfolders called "chat," "user," and "conv," respectively.**

* Similar to the **vector_directory** parameter, the "chat" directory will be renamed to "turn" depending on the value of the **turns** parameter.

* It is possible to generate different names for each of the three output files, rather than using the same base file path by modifying **output_file_path_chat_level** (Utterance- or Chat-Level Features), **output_file_path_user_level** (Speaker- or User-Level Features), and **output_file_path_conv_level** (Conversation-Level Features). However, because outputs are organized in the specific locations described above, **we have specific requirements for inputting the output paths, and we will modify the path under the hood to match our file naming schema,** rather than saving the file directly to the specified location.

* We expect that you pass in a **path**, not just a filename. For example, the path needs to be "./my_file.csv", and not just "my_file.csv"; you will get an error if you pass in only a name without the "/".

* Regardless of your path location, we will automatically append the name "output" to the fornt of your file path, such that **all of the outputs will be generated in a folder called "output."**
* Regardless of your path location, we will automatically append the name "output" to the fornt of your file path.

* Within the "output" folder, **we will also generate sub-folders such that the three files will be located in subfolders called "chat," "user," and "conv," respectively.**
* Within the "output" folder, **we will also generate the chat/user/conv sub-folders.**

* If you pass in a path that already contains the above automatically-generated elements (for example, "./output/chat/my_chat_features.csv"), we will skip these steps and directly save it in the relevant folder.

Expand All @@ -153,14 +174,18 @@ Basic Input Columns
output_file_path_chat_level = "./output/chat/jury_output_chat_level.csv"
* And these two ways of specifying an output path are equivalent, assumign that turns=True:
* And these two ways of specifying an output path are equivalent, assuming that turns=True:

.. code-block:: python
output_file_path_chat_level = "./jury_output_turn_level.csv"
output_file_path_chat_level = "./output/turn/jury_output_turn_level.csv"
Turns
""""""

* The **turns** parameter controls whether we want to treat successive messages from the same person as a single turn. For example, in a text conversation, sometimes individuals will send many message in rapid succession, as follows:

* **John**: Hey Michael
Expand Down Expand Up @@ -277,3 +302,62 @@ Here are some additional design details of the FeatureBuilder that you may wish
* The only caveat to this rule is if you happen to have a column that is named exactly the same as one of the conversation features that we generate. In that case, your column will be overwritten. Please refer to `<https://teamcommtools.seas.upenn.edu/HowItWorks>`_ for a list of all the features we generate, along with their column names.

* **When summarizing features from the utterance level to the conversation and speaker level, we only consider numeric features.** This is perhaps a simplifying assumption more than anything else; although we do extract non-numeric information (for example, a Dale-Chall label of whether an utterance is "Easy" to ready or not; a list of named entities identified), we cannot summarize these efficiently, so they are not considered.

Inspecting Generated Features
++++++++++++++++++++++++++++++

Feature Information
^^^^^^^^^^^^^^^^^^^^^
Every FeatureBuilder object has an underlying property called the **feature_dict**, which lists information and references about the features included in the toolkit. Assuming that **jury_feature_builder** is the name of your FeatureBuilder, you can access the feature dictionary as follows:

.. code-block:: python
jury_feature_builder.feature_dict
The keys of this dictionary are the formal feature names, and the value is a JSON blob with information about the feature or collection of features. A more nicely-displayed version of this dictionary is also available on our `website <https://teamcommtools.seas.upenn.edu/HowItWorks>`_.

**New in v.0.1.4**: To access a list of the formal feature names that a FeatureBuilder will generate, you can use the **feature_names** property:

.. code-block:: python
jury_feature_builder.feature_names # a list of formal feature names included in featurization (e.g., "Team Burstiness")
You can also use the **feature_names** property in tandem with the **feature_dict** to learn more about a specific feature; for example, the following code will show the dictionary entry for the first feature in **feature_names**:

.. code-block:: python
jury_feature_builder.feature_dict[jury_feature_builder.feature_names[0]]
Here is some example output (for the RoBERTa sentiment feature):

.. code-block:: text
{'columns': ['positive_bert', 'negative_bert', 'neutral_bert'],
'file': './utils/check_embeddings.py',
'level': 'Chat',
'semantic_grouping': 'Emotion',
'description': 'The extent to which a statement is positive, negative, or neutral, as assigned by Cardiffnlp/twitter-roberta-base-sentiment-latest. The total scores (Positive, Negative, Neutral) sum to 1.',
'references': '(Hugging Face, 2023)',
'wiki_link': 'https://conversational-featurizer.readthedocs.io/en/latest/features_conceptual/positivity_bert.html',
'function': <function team_comm_tools.utils.calculate_chat_level_features.ChatLevelFeaturesCalculator.concat_bert_features(self) -> None>,
'dependencies': [],
'preprocess': [],
'vect_data': False,
'bert_sentiment_data': True}
Feature Column Names
^^^^^^^^^^^^^^^^^^^^^

Once you call **.featurize()**, you can also obtain a convenient list of the feature columns generated by the toolkit:

.. code-block:: python
jury_feature_builder.chat_features # a list of the feature columns generated at the chat (utterance) level
jury_feature_builder.conv_features_base # a list of the base (non-aggregated) feature columns at the conversation level
jury_feature_builder.conv_features_all # a list of all feature columns at the conversation level, including aggregates
These lists may be useful to you if you'd like to inspect which features in the output dataframe come from the FeatureBuilder; for example:

.. code-block:: python
jury_output_chat_level[jury_feature_builder.chat_features]
Loading

0 comments on commit 9b9ce16

Please sign in to comment.