Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/xehu/team-process-map into …
Browse files Browse the repository at this point in the history
…main
  • Loading branch information
Xinlan Emily Hu committed Jan 26, 2024
2 parents ba2c2be + fe284db commit 35bb7ef
Showing 1 changed file with 3 additions and 15 deletions.
18 changes: 3 additions & 15 deletions feature_engine/README_featurizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,10 @@ The featurizer takes text conversations and transforms them into the following r

- A table of `chat-level features`, which generates a unique conversation feature for each chat (aka utterance, or message);
- A table of `conversation-level features`, which generates aggregations of features within each chat at the conversation level.
- A table of `user-level features`, which generates aggregations of features for each user, or speaker, in a conversation.

To set up and run the featurizer from scratch, you should do the following.

# Run separate iPython Scripts
Some features are computationally inefficient to run every time, so the featurizer performs some processing upfront. Before getting started, you should separately run the following iPython notebooks.

## Run ONCE (regardless of number of datasets)
- `features/preprocessing/preprocess_lexicons.ipynb` --> generates `features/lexicons_dict.pkl`

## Run once _per dataset_ (generates dataset-specific pre-processing / embeddings)
The following needs to be run upon initializing the directory:
- `features/preprocessing/process_sent_vectors.ipynb` --> generates `embeddings/*`

The following does not have to be run upon initialization (the outputs are already saved); however, as new datasets are added, this script needs to be re-run for each new dataset.
- `features/preprocessing/positivity_bert_analysis.ipynb` --> generates `sentiment_bert/*`

# Run the main featurizer [Do this every time you want to refresh/generate new features.]

In the terminal, run `python3 featurize.py`.
- Declare a new FeatureBuilder object inside `featurize.py`.
- In the terminal, run `python3 featurize.py`.

0 comments on commit 35bb7ef

Please sign in to comment.