-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[Feature] DSPy Audio/Video Support Tracking #7847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, I'd like to help implement this. |
Yes, I'd like to help implement this: I'd like to help with implementing support for Video input. Additionally, I'm looking into benchmarking video understanding performance to evaluate how well DSPy can process video-based inputs |
After some review, the PR takes us closer to supporting audio but there's still work to be done to support this. [1] https://docs.litellm.ai/docs/completion/audio#audio-input-to-a-model |
Hello guys for AZURE openAI we have to add in Image module
due to fact azure use for audio models
MIME like generate cause errors 'audio/x-wav'. litellm.BadRequestError: AzureException BadRequestError - Invalid value: 'audio/x-wav'. Supported values are: 'wav' and 'mp3'. I have no idea at which level we should check how to handle data. Best would be to check for provider type and create some sub pipeline to process final_list.append |
What feature would you like to see?
We have received a number of requests for Audio and Video input support over the last few months (#2037, #7844, etc.)
I implemented
DSPy.Image
, and am looking for someone to help out and create similar or better implementations for audio and/or video inputs. It would be shocking to me if some good prompting and few shot suppport for audio would greatly help in some use cases, and also being able to script with audio in the same way that you can with text inputs.For someone to implement this, there are a few required steps for the implementation I am imagining:
try_expand_audio_tags
method that will search and expand messages with multimodal inputstests/signatures/test_adapter_image.py
to make sure it can work with a variety of signature types and input methodsI don't know much about the audio input APIs to really know what the speedbumps on this implementation are going to be.
As a first step, I would choose either the OpenAI API or Gemini, and get it working for that provider with whatever hacky code is needed, then expand and abstract after that.
feel free to @ me in the discord username is ibmiller if you need help
Would you like to contribute?
Additional Context
No response
The text was updated successfully, but these errors were encountered: