Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backend][app] Always send generic email result as best GPT on the transcript, other workflows yolo #2

Merged
merged 3 commits into from
Sep 12, 2024

Conversation

petercsiba
Copy link
Owner

@petercsiba petercsiba commented Sep 11, 2024

Description by Cal

PR Description

This PR introduces a new feature to always send a generic email result as the best GPT on the transcript. It includes changes to various backend files to support this feature, such as modifying email sending functions, updating workflow handling, and adding new utility functions. Additionally, some CSS styles have been updated.

Diagrams of code changes
sequenceDiagram
    participant User
    participant API
    participant GPTClient
    participant EmailService
    participant AWS

    User->>API: GET /upload/voice
    API->>User: Returns presigned URL

    API->>GPTClient: Transcribe audio with prompt hint
    GPTClient->>API: Returns transcription

    API->>EmailService: Send networking result email
    EmailService->>API: Email sent confirmation

    API->>AWS: Check if running in AWS
    AWS-->>API: Returns status

    API->>GPTClient: Process generic prompt
    GPTClient->>API: Returns processed result

    API->>EmailService: Send generic result email
    EmailService->>API: Email sent confirmation
Loading

Key Issues

  • Possible Bug: In /backend/app/app.py, the variable GPTo_MODEL seems to be a typo and should likely be GPT_MODEL or similar. This could lead to runtime errors if the variable is used elsewhere.
  • Performance Concern: The poor_mans_token_counter function in /backend/app/contacts_dump.py may not accurately count tokens, which could lead to incorrect processing of transcripts. Consider using a more reliable token counting method.
  • Possible Bug: In /backend/app/gsheets.py, there is a TODO comment indicating a potential AttributeError which should be addressed to prevent runtime errors.

Files Changed

File: /backend/api/app.py Added a TODO comment to ensure compatibility with file uploads.
File: /backend/app/app.py Modified email sending functions and workflow handling to support the new feature. Added a new function `process_generic_prompt`.
File: /backend/app/contacts_dump.py Replaced `num_tokens_from_string` with `poor_mans_token_counter` for token counting.
File: /backend/app/emails.py Updated email body formatting and added a new function `send_generic_result`.
File: /backend/app/gsheets.py Added a TODO comment regarding a potential `AttributeError`.
File: /backend/database/email_log.py Added a return type hint to `get_email_reply_params_for_account_id`.
File: /backend/deploy_lambda_container.sh Added a TODO comment to move a command into the `/app` directory.
File: /backend/input/app_upload.py Added `prompt_hint` and `use_cache_hit` parameters to `transcribe_audio` calls.
File: /backend/input/call.py Added `prompt_hint` and `use_cache_hit` parameters to `transcribe_audio` calls.
File: /backend/input/common.py Added TODO comments regarding video format support and file size limitations.
File: /backend/input/email.py Added `prompt_hint` and `use_cache_hit` parameters to `transcribe_audio` calls.
File: /webapp/styles/app.css Added new CSS classes for margin, padding, and background color.

Copy link

vercel bot commented Sep 11, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
dumpsheet ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 12, 2024 4:50am

Copy link

@callstackai-action callstackai-action bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found.

Copy link

@callstackai-action callstackai-action bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

This pull request has been reviewed. Please check the comments and suggestions provided.

gpt_client.transcribe_audio(
audio_filepath=audio_filepath,
prompt_hint="voice memo",
use_cache_hit=not is_running_in_aws(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐛 Bug
The use_cache_hit parameter in the gpt_client.transcribe_audio method is set to not is_running_in_aws(). This logic might be flawed because it assumes that caching should only be disabled when running in AWS. However, if the environment variable checks in is_running_in_aws() are not comprehensive or if the environment changes, this could lead to unexpected behavior where caching is incorrectly enabled or disabled. Consider explicitly setting use_cache_hit based on a more reliable configuration or environment setting.

Suggested change
use_cache_hit=not is_running_in_aws(),
use_cache_hit=False

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess fair

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the suggested change is quite bad though

add paragraphs if needed. Especially make sure you keep all mentioned facts and details.
"""
# CHEAPEST_MODEL leads to overly short answers.
transcription = gpt_client.run_prompt(transcript_prompt + data_entry.output_transcript, model=GPTo_MODEL)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐛 Bug
The variable GPTo_MODEL used in the gpt_client.run_prompt call on line 377 seems to be a typo or undefined variable. It should likely be BEST_MODEL or another defined model variable. This will lead to a runtime error if GPTo_MODEL is not defined elsewhere in the code.

Suggested change
transcription = gpt_client.run_prompt(transcript_prompt + data_entry.output_transcript, model=GPTo_MODEL)
transcription = gpt_client.run_prompt(transcript_prompt + data_entry.output_transcript, model=BEST_MODEL)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong, this is defined on line 62

@@ -228,6 +228,7 @@ def add_form_datas_to_spreadsheet(self, form_datas: List[FormData]):
sheet_cache = {}

for form_data in form_datas:
# TODO: AttributeError: 'str' object has no attribute 'value'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐛 Bug
The line form_name = form_data.form.form_name.value assumes that form_data.form.form_name is an object with a value attribute. However, the comment suggests that an AttributeError is occurring because form_data.form.form_name is a str object, which does not have a value attribute. This will lead to a runtime error. To fix this, ensure that form_data.form.form_name is correctly accessed as a string, without the .value attribute.

Suggested change
# TODO: AttributeError: 'str' object has no attribute 'value'
form_name = form_data.form.form_name

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, done in a follow up diff

@@ -17,6 +16,13 @@
MAX_TRANSCRIPT_TOKEN_COUNT = 2500 # words


# https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them
def poor_mans_token_counter(text: str) -> int:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐛 Bug
The calculation in poor_mans_token_counter might lead to incorrect token counts due to integer division. The expression int(by_character + by_words) // 2 will first convert the sum to an integer, potentially losing precision before the division. This could lead to an inaccurate token count, especially for small values of text. To ensure more accurate results, the division should be performed before converting to an integer.

Additionally, the calculation could be simplified by using a more straightforward approach or by explaining the rationale behind the chosen formula. This would help in understanding why dividing by 4 and multiplying by 3/4 is used. The function uses a heuristic to estimate the number of tokens, but the logic could be more clearly documented or explained. Consider renaming the variables by_character and by_words to something more descriptive, such as character_estimate and word_estimate, to improve readability.

Suggested change
def poor_mans_token_counter(text: str) -> int:
return int((by_character + by_words) / 2)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a link explaining this formula
it's an estimate

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is too long I didn't read it

Co-authored-by: callstackai[bot] <177242706+callstackai[bot]@users.noreply.github.com>
@petercsiba petercsiba merged commit 6d21c70 into main Sep 12, 2024
3 checks passed
Copy link

@callstackai-action callstackai-action bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found.

Copy link
Owner Author

@petercsiba petercsiba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callstack Review
1 bug discovered with params.body_text, but proposed change was incorrect code
1 reasonable suggestion on is_running_in_aws, but suggested change wrong
1 comment was just plainly wrong

add paragraphs if needed. Especially make sure you keep all mentioned facts and details.
"""
# CHEAPEST_MODEL leads to overly short answers.
transcription = gpt_client.run_prompt(transcript_prompt + data_entry.output_transcript, model=GPTo_MODEL)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong, this is defined on line 62

@@ -17,6 +16,13 @@
MAX_TRANSCRIPT_TOKEN_COUNT = 2500 # words


# https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them
def poor_mans_token_counter(text: str) -> int:
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a link explaining this formula
it's an estimate

@@ -17,6 +16,13 @@
MAX_TRANSCRIPT_TOKEN_COUNT = 2500 # words


# https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them
def poor_mans_token_counter(text: str) -> int:
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is too long I didn't read it

@@ -228,6 +228,7 @@ def add_form_datas_to_spreadsheet(self, form_datas: List[FormData]):
sheet_cache = {}

for form_data in form_datas:
# TODO: AttributeError: 'str' object has no attribute 'value'
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, done in a follow up diff

gpt_client.transcribe_audio(
audio_filepath=audio_filepath,
prompt_hint="voice memo",
use_cache_hit=not is_running_in_aws(),
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess fair

@@ -179,7 +179,7 @@ def create_raw_email_with_attachments(params: EmailLog):
<head></head>
<body>
"""
+ params.body_text
+ + (params.body_text.replace("\n", "<br />") if params.body_text else "")
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have accepted your suggestion to use the above instead just params.body_text BUT it is actually a bad syntax code leading to this production error (I have tested the version pre-this change accepted).

bad operand type for unary +: 'str'

Traceback (most recent call last):
File "/home/dumpsheet/app/app.py", line 453, in lambda_handler
second_lambda_handler_wrapper(data_entry)
File "/home/dumpsheet/app/app.py", line 415, in second_lambda_handler_wrapper
process_generic_prompt(gpt_client, data_entry)
File "/home/dumpsheet/app/app.py", line 385, in process_generic_prompt
send_generic_result(
File "/home/dumpsheet/app/emails.py", line 536, in send_generic_result
return send_email(params=email_params)
File "/home/dumpsheet/app/emails.py", line 236, in send_email
raw_email = create_raw_email_with_attachments(params)
File "/home/dumpsheet/app/emails.py", line 182, in create_raw_email_with_attachments
+ + (params.body_text.replace("\n", "
") if params.body_text else "")
TypeError: bad operand type for unary +: 'str'

gpt_client.transcribe_audio(
audio_filepath=audio_filepath,
prompt_hint="voice memo",
use_cache_hit=not is_running_in_aws(),
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the suggested change is quite bad though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant