Skip to content

Commit

Permalink
updates to message_original error
Browse files Browse the repository at this point in the history
  • Loading branch information
amytangzheng committed Dec 19, 2024
1 parent 5b2b9f0 commit d8291db
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion src/team_comm_tools/utils/check_embeddings.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,9 +72,14 @@ def check_embeddings(chat_data: pd.DataFrame, vect_path: str, bert_path: str, or
# check that message in vector data matches chat data
preprocessed_chat = chat_data[message_col].astype(str).apply(preprocess_text)

# preprocess vector data
# preprocess vector data, remove _original if message_col contains to preprocess the text
if '_original' in message_col:
message_col = message_col.replace('_original', '')

print(message_col, message_col[:-9])
preprocessed_vector = vector_df[message_col].astype(str).apply(preprocess_text)


mismatches = chat_data[preprocessed_chat != preprocessed_vector]
if len(mismatches) != 0:
print("Messages in the vector data do not match the chat data. Regenerating...")
Expand Down

0 comments on commit d8291db

Please sign in to comment.