-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions Regarding the Audio Data in the Dataset #3
Comments
Hello @badhorselgy Thank you for your interest in the BatVision dataset and for your questions. We truly appreciate your feedback and the detailed observations you've shared. Below are responses to your questions: 1. Audio Data1.1 Initial blank sampling points 1.2 Audio length
1.3 JBL direct path 2. SensorsUnfortunately, we no longer have access to the BatVision v1 robot, and the precise coordinate transformation parameters between the camera and microphone are unavailable. However, here are some details that may help:
Throughout our work, we assumed the cameras and microphones were co-located for simplicity. We hope these estimates provide a helpful starting point for your calculations. Please let us know if you need further clarification. We sincerely hope this information helps you move forward with your research. |
Thank you for making the Batvision dataset and the audio-only baseline open source. We sincerely appreciate your contributions to both the acoustic and open-source communities. However, we have encountered some confusing issues while using the open-source Batvision dataset and the UNetSoundOnly baseline. We would appreciate any assistance you could provide in clarifying these matters.
Taking the data file
2019.08.26/audio/raw_long/180336.309017_left.npy
from BatVisionv1 as an example, along with its corresponding camera images and depth maps, what is the meaning of the approximately 980 initial sampling points in the audio data that seem to be nearly blank?The paper mentions: "Designed for smaller spaces, audio recordings were cut at 72.5ms, including echoes from objects at a 12m distance." However, in the code, we only found a section that processes depth data, and the audio data input to the network appears to be around 0.1 seconds in length. How should the code be modified to correctly reproduce the audio input representation as described in the BatVision paper?
Additionally, does the provided audio data include the direct path signal emitted by the JBL Flip4 Bluetooth speaker?Since the JBL Flip4 Bluetooth speaker's sound units seem to be located at both ends of a cylindrical body, we are unable to determine whether the first path in the spectrum is due to a reflection from the corridor walls or if it originates from the direct path between the speaker's sound unit and the microphone.
Would you mind sharing the coordinate transformation parameters between the camera and the microphone?
Finally, thank you once again for making the dataset and baseline open source, which greatly aids future researchers. We look forward to your response.
The text was updated successfully, but these errors were encountered: