-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[backend][app] Always send generic email result as best GPT on the transcript, other workflows yolo #2
Conversation
…anscript, other workflows yolo
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Review Summary
This pull request has been reviewed. Please check the comments and suggestions provided.
gpt_client.transcribe_audio( | ||
audio_filepath=audio_filepath, | ||
prompt_hint="voice memo", | ||
use_cache_hit=not is_running_in_aws(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐛 Bug
The use_cache_hit
parameter in the gpt_client.transcribe_audio
method is set to not is_running_in_aws()
. This logic might be flawed because it assumes that caching should only be disabled when running in AWS. However, if the environment variable checks in is_running_in_aws()
are not comprehensive or if the environment changes, this could lead to unexpected behavior where caching is incorrectly enabled or disabled. Consider explicitly setting use_cache_hit
based on a more reliable configuration or environment setting.
use_cache_hit=not is_running_in_aws(), | |
use_cache_hit=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess fair
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the suggested change is quite bad though
add paragraphs if needed. Especially make sure you keep all mentioned facts and details. | ||
""" | ||
# CHEAPEST_MODEL leads to overly short answers. | ||
transcription = gpt_client.run_prompt(transcript_prompt + data_entry.output_transcript, model=GPTo_MODEL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐛 Bug
The variable GPTo_MODEL
used in the gpt_client.run_prompt
call on line 377 seems to be a typo or undefined variable. It should likely be BEST_MODEL
or another defined model variable. This will lead to a runtime error if GPTo_MODEL
is not defined elsewhere in the code.
transcription = gpt_client.run_prompt(transcript_prompt + data_entry.output_transcript, model=GPTo_MODEL) | |
transcription = gpt_client.run_prompt(transcript_prompt + data_entry.output_transcript, model=BEST_MODEL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong, this is defined on line 62
@@ -228,6 +228,7 @@ def add_form_datas_to_spreadsheet(self, form_datas: List[FormData]): | |||
sheet_cache = {} | |||
|
|||
for form_data in form_datas: | |||
# TODO: AttributeError: 'str' object has no attribute 'value' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐛 Bug
The line form_name = form_data.form.form_name.value
assumes that form_data.form.form_name
is an object with a value
attribute. However, the comment suggests that an AttributeError
is occurring because form_data.form.form_name
is a str
object, which does not have a value
attribute. This will lead to a runtime error. To fix this, ensure that form_data.form.form_name
is correctly accessed as a string, without the .value
attribute.
# TODO: AttributeError: 'str' object has no attribute 'value' | |
form_name = form_data.form.form_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, done in a follow up diff
@@ -17,6 +16,13 @@ | |||
MAX_TRANSCRIPT_TOKEN_COUNT = 2500 # words | |||
|
|||
|
|||
# https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them | |||
def poor_mans_token_counter(text: str) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🐛 Bug
The calculation in poor_mans_token_counter
might lead to incorrect token counts due to integer division. The expression int(by_character + by_words) // 2
will first convert the sum to an integer, potentially losing precision before the division. This could lead to an inaccurate token count, especially for small values of text
. To ensure more accurate results, the division should be performed before converting to an integer.
Additionally, the calculation could be simplified by using a more straightforward approach or by explaining the rationale behind the chosen formula. This would help in understanding why dividing by 4 and multiplying by 3/4 is used. The function uses a heuristic to estimate the number of tokens, but the logic could be more clearly documented or explained. Consider renaming the variables by_character
and by_words
to something more descriptive, such as character_estimate
and word_estimate
, to improve readability.
def poor_mans_token_counter(text: str) -> int: | |
return int((by_character + by_words) / 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a link explaining this formula
it's an estimate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is too long I didn't read it
Co-authored-by: callstackai[bot] <177242706+callstackai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Callstack Review
1 bug discovered with params.body_text, but proposed change was incorrect code
1 reasonable suggestion on is_running_in_aws, but suggested change wrong
1 comment was just plainly wrong
add paragraphs if needed. Especially make sure you keep all mentioned facts and details. | ||
""" | ||
# CHEAPEST_MODEL leads to overly short answers. | ||
transcription = gpt_client.run_prompt(transcript_prompt + data_entry.output_transcript, model=GPTo_MODEL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong, this is defined on line 62
@@ -17,6 +16,13 @@ | |||
MAX_TRANSCRIPT_TOKEN_COUNT = 2500 # words | |||
|
|||
|
|||
# https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them | |||
def poor_mans_token_counter(text: str) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a link explaining this formula
it's an estimate
@@ -17,6 +16,13 @@ | |||
MAX_TRANSCRIPT_TOKEN_COUNT = 2500 # words | |||
|
|||
|
|||
# https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them | |||
def poor_mans_token_counter(text: str) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is too long I didn't read it
@@ -228,6 +228,7 @@ def add_form_datas_to_spreadsheet(self, form_datas: List[FormData]): | |||
sheet_cache = {} | |||
|
|||
for form_data in form_datas: | |||
# TODO: AttributeError: 'str' object has no attribute 'value' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, done in a follow up diff
gpt_client.transcribe_audio( | ||
audio_filepath=audio_filepath, | ||
prompt_hint="voice memo", | ||
use_cache_hit=not is_running_in_aws(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess fair
@@ -179,7 +179,7 @@ def create_raw_email_with_attachments(params: EmailLog): | |||
<head></head> | |||
<body> | |||
""" | |||
+ params.body_text | |||
+ + (params.body_text.replace("\n", "<br />") if params.body_text else "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have accepted your suggestion to use the above instead just params.body_text
BUT it is actually a bad syntax code leading to this production error (I have tested the version pre-this change accepted).
bad operand type for unary +: 'str'
Traceback (most recent call last):
File "/home/dumpsheet/app/app.py", line 453, in lambda_handler
second_lambda_handler_wrapper(data_entry)
File "/home/dumpsheet/app/app.py", line 415, in second_lambda_handler_wrapper
process_generic_prompt(gpt_client, data_entry)
File "/home/dumpsheet/app/app.py", line 385, in process_generic_prompt
send_generic_result(
File "/home/dumpsheet/app/emails.py", line 536, in send_generic_result
return send_email(params=email_params)
File "/home/dumpsheet/app/emails.py", line 236, in send_email
raw_email = create_raw_email_with_attachments(params)
File "/home/dumpsheet/app/emails.py", line 182, in create_raw_email_with_attachments
+ + (params.body_text.replace("\n", "
") if params.body_text else "")
TypeError: bad operand type for unary +: 'str'
gpt_client.transcribe_audio( | ||
audio_filepath=audio_filepath, | ||
prompt_hint="voice memo", | ||
use_cache_hit=not is_running_in_aws(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the suggested change is quite bad though
✨
Description by Cal
PR Description
This PR introduces a new feature to always send a generic email result as the best GPT on the transcript. It includes changes to various backend files to support this feature, such as modifying email sending functions, updating workflow handling, and adding new utility functions. Additionally, some CSS styles have been updated.
Diagrams of code changes
Key Issues
/backend/app/app.py
, the variableGPTo_MODEL
seems to be a typo and should likely beGPT_MODEL
or similar. This could lead to runtime errors if the variable is used elsewhere.poor_mans_token_counter
function in/backend/app/contacts_dump.py
may not accurately count tokens, which could lead to incorrect processing of transcripts. Consider using a more reliable token counting method./backend/app/gsheets.py
, there is a TODO comment indicating a potentialAttributeError
which should be addressed to prevent runtime errors.Files Changed
File: /backend/api/app.py
Added a TODO comment to ensure compatibility with file uploads.File: /backend/app/app.py
Modified email sending functions and workflow handling to support the new feature. Added a new function `process_generic_prompt`.File: /backend/app/contacts_dump.py
Replaced `num_tokens_from_string` with `poor_mans_token_counter` for token counting.File: /backend/app/emails.py
Updated email body formatting and added a new function `send_generic_result`.File: /backend/app/gsheets.py
Added a TODO comment regarding a potential `AttributeError`.File: /backend/database/email_log.py
Added a return type hint to `get_email_reply_params_for_account_id`.File: /backend/deploy_lambda_container.sh
Added a TODO comment to move a command into the `/app` directory.File: /backend/input/app_upload.py
Added `prompt_hint` and `use_cache_hit` parameters to `transcribe_audio` calls.File: /backend/input/call.py
Added `prompt_hint` and `use_cache_hit` parameters to `transcribe_audio` calls.File: /backend/input/common.py
Added TODO comments regarding video format support and file size limitations.File: /backend/input/email.py
Added `prompt_hint` and `use_cache_hit` parameters to `transcribe_audio` calls.File: /webapp/styles/app.css
Added new CSS classes for margin, padding, and background color.