Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The need for complete code #1

Open
JiacliUstc opened this issue Dec 6, 2024 · 4 comments
Open

The need for complete code #1

JiacliUstc opened this issue Dec 6, 2024 · 4 comments

Comments

@JiacliUstc
Copy link

Hello, I am following the research direction of DNN Inference and prediction on Mobile Devices. I was lucky to read your paper titled "A Benchmark for ML Inference Latency on Mobile Devices". But I can't find the modified tflite benchmark tool you mentioned and the method to measure the stable inference delay in this repository. Can this part of the code be provided?

@zhuojinl
Copy link
Collaborator

Hi @JiacliUstc ,

Thank you for your interest in our work. You can find part of our changes to the TensorFlow source here.

I would like to kindly remind you that our modifications were based on an older version of TFLite. Since TFLite is regularly updated, the measurements may differ in the latest version. Our modifications mainly include: (1) updating the TFLite benchmark GPU delegate to report opeartion-wise latency; (2) repeating the dispatching of GPU kernels to acquire stable measurements. However, as far as I know, recent versions of TFLite have already introduced both GPU profiling and the benchmark option gpu_invoke_loop_times to enhance the stability of measurements. Therefore, I would suggest you download the latest pre-built native command-line binaries for TFLite benchmark tools.

@JiacliUstc
Copy link
Author

JiacliUstc commented Dec 11, 2024 via email

@zhuojinl
Copy link
Collaborator

Hi @JiacliUstc ,

Correct. Importing models from another framework can introduce additional operations, such as transposes, in your tflite models. This is primarily due to differences in memory formats (e.g., NHWC in TensorFlow vs NCHW in PyTorch). It is recommended to implement your models directly in TensorFlow rather than converting them from PyTorch. For example, you should use the TensorFlow implementation provided by imgclsmob.

@JiacliUstc
Copy link
Author

Hello zhuojinl
Now I have finished generating the model, but there are some problems in the tflite benchmark model section. I noticed that you also found that the benchmark model compiled by tensorflow was inconsistent with the benchmark model provided by nn-meter. I tried to use --config=android_arm and --config=android_arm64 in tf2.1 version, but the results of nn-meter benchmark model could not be reproduced. Did you solve this problem at that time? On the other hand, I would like to ask about the part of kerenl fusion rules. In your icpe article, you mentioned that kernel fusion is related to backend, kernel selection is related to hardware. So can kernel fusion rules be reused under different tf versions and different hardware? Looking forward to your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants