add `gpt-4o` support for `num_tokens` #2

nicklamiller · 2024-06-24T07:39:00Z

Without support in num_tokens for the gpt-4omodel, running OpenAIChatCompletionRequest with model=gpt-4o results in a Failure:

requests = [OpenAIChatCompletionRequest("what's the answer to life?", model="gpt-4o")]
responses = list(chat_completion_batch(requests, label="prompt"))

responses[0].response
# Failure(_value=(OpenAIChatCompletionRequest(messages=[{'role': 'user', 'content': "what's the answer to life?"}], model='gpt-4o', temperature=0.8, top_p=1.0, n=1, stop=None, max_tokens=None, presence_penalty=0.0, frequency_penalty=0.0, logit_bias=None, metadata=None), NotImplementedError('num_tokens_from_messages() is not implemented for model gpt-4o. See https://github.com/openai/openai-python/blob/main/chatml.md for information on how messages are converted to tokens.')))

Section 6 of the OpenAI cookbook for counting tokens doesn't lay out token counting rules for gpt-4o like it does for the other models (checked main for newest version of the notebook). I tested a few messages and got the same results for both gpt-4 and gpt-4o so it would appear they count tokens the same, though it would be good to confirm this.

Testing tokens are same between gpt-4 and gpt-4o

from oai_utils.openai_utils import num_tokens


all_example_messages = [
    	[{"role":"user", "content": "hi"}],
    	[{"role":"user", "content": "a longer message"}],
    	[{"role":"user", "content": "a much longer message indeed"}],
]
for example_messages in all_example_messages:
    for model in ["gpt-4", "gpt-4o"]:
        print(model)
        print(f"{num_tokens(example_messages, model)} prompt tokens counted by num_tokens().")
        response = openai.ChatCompletion.create(
            model=model,
            messages=example_messages,
            temperature=0,
            max_tokens=1,  # we're only counting input tokens here, so let's not waste tokens on the output
        )
        print(f'{response["usage"]["prompt_tokens"]} prompt tokens counted by the OpenAI API.')
        print()

ravwojdyla

lgtm! In general maybe we should take the snapshot on the internal OpenAI tooling and dump it in here again, aka "snapshot" style open source. Out of scope of this PR.

nicklamiller requested a review from ravwojdyla June 24, 2024 07:42

add gpt-4o num_tokens support

9536999

nicklamiller force-pushed the nic-support-tokens-for-gpt-4o branch from 119beb5 to 9536999 Compare June 24, 2024 07:43

ravwojdyla approved these changes Jun 24, 2024

View reviewed changes

nicklamiller changed the title ~~add gpt-4o num_tokens support~~ add gpt-4o support for num_tokens Jun 24, 2024

nicklamiller merged commit 88182fb into main Jun 24, 2024
2 checks passed

nicklamiller deleted the nic-support-tokens-for-gpt-4o branch June 24, 2024 08:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `gpt-4o` support for `num_tokens` #2

add `gpt-4o` support for `num_tokens` #2

nicklamiller commented Jun 24, 2024 •

edited

Loading

ravwojdyla left a comment

add gpt-4o support for num_tokens #2

add gpt-4o support for num_tokens #2

Conversation

nicklamiller commented Jun 24, 2024 • edited Loading

ravwojdyla left a comment

Choose a reason for hiding this comment

add `gpt-4o` support for `num_tokens` #2

add `gpt-4o` support for `num_tokens` #2

nicklamiller commented Jun 24, 2024 •

edited

Loading