The need for complete code #1

JiacliUstc · 2024-12-06T07:45:49Z

Hello, I am following the research direction of DNN Inference and prediction on Mobile Devices. I was lucky to read your paper titled "A Benchmark for ML Inference Latency on Mobile Devices". But I can't find the modified tflite benchmark tool you mentioned and the method to measure the stable inference delay in this repository. Can this part of the code be provided?

zhuojinl · 2024-12-11T04:55:35Z

Hi @JiacliUstc ,

Thank you for your interest in our work. You can find part of our changes to the TensorFlow source here.

I would like to kindly remind you that our modifications were based on an older version of TFLite. Since TFLite is regularly updated, the measurements may differ in the latest version. Our modifications mainly include: (1) updating the TFLite benchmark GPU delegate to report opeartion-wise latency; (2) repeating the dispatching of GPU kernels to acquire stable measurements. However, as far as I know, recent versions of TFLite have already introduced both GPU profiling and the benchmark option gpu_invoke_loop_times to enhance the stability of measurements. Therefore, I would suggest you download the latest pre-built native command-line binaries for TFLite benchmark tools.

JiacliUstc · 2024-12-11T06:24:10Z

Hi Zhuojin, I really appreciate your taking time out of your busy schedule to reply to my email. During this time, I found the code related to model_generator from the branch code of nn-meter and successfully executed it through debugging. Before that, I tried to convert the tf model provided by imgclsmob repository mentioned in your paper to tflite. Unfortunately, the tf model is converted from another framework. After converting to tflie model, there are many transpose operators, which obviously affect the reasoning efficiency. I decided to use the nn-meter method to generate the variant dataset, although there are far fewer types of models. I may try to add more types of models on top of nn-meter later, although it might be complicated for me. Anyway, thanks for your reply. Thank you -----原始邮件----- 发件人: Zhuojin ***@***.***> 发送时间: 2024-12-11 12:55:56 (星期三) 收件人: qed-usc/mobile-ml-benchmark ***@***.***> 抄送: Jiacli ***@***.***>, Mention ***@***.***> 主题: Re: [qed-usc/mobile-ml-benchmark] The need for complete code (Issue #1) Hi @JiacliUstc , Thank you for your interest in our work. You can find part of our changes to the TensorFlow source here. I would like to kindly remind you that our modifications were based on an older version of TFLite. Since TFLite is regularly updated, the measurements may differ in the latest version. Our modifications mainly include: (1) updating the TFLite benchmark GPU delegate to report opeartion-wise latency; (2) repeating the dispatching of GPU kernels to acquire stable measurements. However, as far as I know, recent versions of TFLite have already introduced both GPU profiling and the benchmark option gpu_invoke_loop_times to enhance the stability of measurements. Therefore, I would suggest you download the latest pre-built native command-line binaries for TFLite benchmark tools. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

zhuojinl · 2024-12-12T21:43:36Z

Hi @JiacliUstc ,

Correct. Importing models from another framework can introduce additional operations, such as transposes, in your tflite models. This is primarily due to differences in memory formats (e.g., NHWC in TensorFlow vs NCHW in PyTorch). It is recommended to implement your models directly in TensorFlow rather than converting them from PyTorch. For example, you should use the TensorFlow implementation provided by imgclsmob.

JiacliUstc · 2024-12-26T12:39:08Z

Hello zhuojinl
Now I have finished generating the model, but there are some problems in the tflite benchmark model section. I noticed that you also found that the benchmark model compiled by tensorflow was inconsistent with the benchmark model provided by nn-meter. I tried to use --config=android_arm and --config=android_arm64 in tf2.1 version, but the results of nn-meter benchmark model could not be reproduced. Did you solve this problem at that time? On the other hand, I would like to ask about the part of kerenl fusion rules. In your icpe article, you mentioned that kernel fusion is related to backend, kernel selection is related to hardware. So can kernel fusion rules be reused under different tf versions and different hardware? Looking forward to your reply！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The need for complete code #1

The need for complete code #1

JiacliUstc commented Dec 6, 2024

zhuojinl commented Dec 11, 2024

JiacliUstc commented Dec 11, 2024 via email

zhuojinl commented Dec 12, 2024

JiacliUstc commented Dec 26, 2024

The need for complete code #1

The need for complete code #1

Comments

JiacliUstc commented Dec 6, 2024

zhuojinl commented Dec 11, 2024

JiacliUstc commented Dec 11, 2024 via email

zhuojinl commented Dec 12, 2024

JiacliUstc commented Dec 26, 2024