Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VenuSQ loses connection to the phone after a while #26

Open
jones139 opened this issue Sep 20, 2023 · 54 comments · Fixed by #67
Open

VenuSQ loses connection to the phone after a while #26

jones139 opened this issue Sep 20, 2023 · 54 comments · Fixed by #67
Assignees
Labels
Milestone

Comments

@jones139
Copy link
Member

A user has reported that his VenuSQ loses the connection to the phone after a while (1-2 hours), and the only way to restore it is to re-boot the watch. He sees Error -101 which means BLE queue full, than it goes to Error -104 which means no BLE connection. The watch is shown as disconnected in the Garmin Connect App.

Fundamentally this sounds like a bug in the watch firmware, but OSD is causing it to occur.

@jones139
Copy link
Member Author

I have tried it on my VenuSQ, and Samsung A20 phone and have seen the same issue - it crashed twice overnight, so it is not just a problem with the user's watch hardware - needs fixing ASAP!

@jones139 jones139 added the bug label Sep 20, 2023
@jones139 jones139 added this to the V1.4.x milestone Sep 20, 2023
@jones139
Copy link
Member Author

I made a test version where it checks to see if a communications request is in progress before it sends another one. This ran ok for about 6 hours this afternoon, so it could be that too many comms requests is causing the issue.
The problem is that the test version just gave up if there was a comms request in progress, so we lost every other 5 second batch of data, which is no use.

I have made V4.1.1g for testing which will re-try after one second if there is a request in progress - this is working better - we receive data every 6 or 7 seconds, so we will still drop the occasional batch of data, but most will get through.
Will test this overnight to see if it fixes the crashing issue, then we will need to look at why it is taking so long to send a 5 second batch of data.

@jones139
Copy link
Member Author

V4.1.1g was not successful - it stopped several times overnight (app stopped by itself).
I now have a V4.1.1k which has better logging as well as a re-try if there is a request in progress when accelerometer data is received, but my VenuSQ is crashing and re-booting itself after a few minutes of running. It is running ok on the emulator and on my Forerunner 245.
If I was the only one having a problem, I'd be saying that my VenuSQ has a memory fault and needs to be scrapped....

@jones139
Copy link
Member Author

I'm still not really sure what is happening here, but it appears that we are seeing slow data transfer between the watch and the phone, which leads to multiple coincident data transfer requests. For some reason this is crashing the watch and locking the BLE connection between the watch and the phone.

V4.1.1 introduces a check whether a data transfer request is in progress or not when new data is ready to be sent. If it is, the new request is not sent, but is re-tried after half a second. I think this will avoid the BLE lock-up issue, but does result in some lost data. To address this we now have a Low Data mode where we do not send 3d accelerometer data, just the vector magnitude. The current OSD algorithms will still work in this mode, but it means the data sent to the data sharing system is not as useful as we do not have the 3d data.

V4.1.1n is currently available to try here: https://github.com/OpenSeizureDetector/Garmin_SD/tree/V1.4.x/build if anyone would like to try it.

@jones139
Copy link
Member Author

Well for me V4.1.1n ran a lot longer than the previous versions when using 'Low Data' mode, but still crashed after several hours and needed a watch re-boot.

Another user has reported a similar issue with a VenuSQ, so it is sounding more and more like a Garmin firmware issue than a hardware fault as the watch hardware from Garmin has always been very reliable.

@jones139
Copy link
Member Author

Software version numbers that are experiencing the problem:
Garmin Connect: 4.69.1, VenuSQ firmware: 4.90
Garmin Connect: 4.71, VenuSQ firmware: 4.90

@jones139
Copy link
Member Author

@jones139 jones139 self-assigned this Sep 28, 2023
@44616E
Copy link

44616E commented Sep 28, 2023

Can confirm, seeing the same issue.

@jones139
Copy link
Member Author

Can confirm, seeing the same issue.

Thanks. Do you know when it started (I'm trying to work out which watch firmware version introduced it).

@jones139
Copy link
Member Author

Another user has reported that the newer VenuSq 2 is working OK with Garmin Connect V4.71.

@44616E
Copy link

44616E commented Sep 29, 2023

Can confirm, seeing the same issue.

Thanks. Do you know when it started (I'm trying to work out which watch firmware version introduced it).

Durring the last week. Not sure when the watch may have updated, possibly when we restarted the phone on friday last week or saturday.

@jones139
Copy link
Member Author

***** DO NOT UPGRADE YOUR VENUSQ *******

  • A new firmware has been released by Garmin for the VenuSQ.
  • I have updated mine and the watch settings->System->SW_Versions page now shows the software as Version 5.00.
  • I ran GarminSD V4.1.n and it ran for a few minutes then the whole watch re-booted.
  • There is no CIQ_LOG.TXT file that you might expect for a system crash, and the GarminSD.TXT log file does not show anything unusual - just reporting successful data transfer to the watch.

So the new Firmware does not fix the issue - it appears to make it worse.

@pmithrandir
Copy link
Collaborator

Hi,

I have 2 suggestions to resolve that issue.

First, I believe the new 1.5 and 2.0 version are lowering drastically the risk of having more than one active request at the same time.

Also, because we have a specific error code, we could call the cancelAllRequests() as Void method to clear the queue when that issue happen.

Would it be possible for someone to test the last version and confirm if it's stil there and the error code associated?

Pierre

@jones139
Copy link
Member Author

jones139 commented Mar 31, 2024 via email

@jones139
Copy link
Member Author

jones139 commented Apr 1, 2024

Ran V2.0.1 on the original VenuSQ. After 5 hours the Android App went into FAULT because it was not receiving data. Garmin Connect showed that the watch was not connected. Watch face showing errors, but I can't remember exactly which ones - alas we have too much 'onTick' logging in the watch app logs so I have lost the watch log from the time - will need to build a version with less regular logging.

It is interesting that Garmin Connect showed the watch as not being connected - I am sure when we first saw this issue that Garmin connect showed it as connected, but it would not sync - at least now it shows as not connected - the only way I could get it to reconnect was to reboot the watch.

I am on watch SW Version 5.0.0 and ANT/BLE Version 3.0.1 after an update a few days ago.

@pmithrandir
Copy link
Collaborator

Hi,

If you build the branch 26 (be careful it's based on typechecking V2, so will need the newer version of the SDK), I added a cancel request in the timeout mechanism.

@jones139
Copy link
Member Author

jones139 commented Apr 1, 2024

Excellent, thank you @pmithrandir. I'll build that version now and try it tonight.

@pmithrandir pmithrandir linked a pull request Apr 1, 2024 that will close this issue
@jones139
Copy link
Member Author

jones139 commented Apr 2, 2024

Alas, even with v2.0.3 with the cancel requests feature, it stopped after about 5 hours again.

I am suspecting a memory leak might make a version that reports available memory as it runs?

I'll use the venusq today without any apps running to check it stays connected.

@pmithrandir
Copy link
Collaborator

pmithrandir commented Apr 2, 2024 via email

@jones139
Copy link
Member Author

jones139 commented Apr 2, 2024 via email

@jones139
Copy link
Member Author

jones139 commented Apr 2, 2024 via email

@pmithrandir
Copy link
Collaborator

I think another test would be to disable getStatusSd.
I would not be surprised if removing half the request could help...

My expectation would be that either make request are not closed quickly enough, or they are closed only when callback is finalized.

In both case, we could be able to mitigate the issue.

Another interesting information would be the time to receive the data.

@jones139
Copy link
Member Author

jones139 commented Apr 2, 2024

That's a good idea- it might double the time between failures. I'll run the no-communications test overnight to be sure, then will try that tomorrow.

Thanks!

@jones139
Copy link
Member Author

jones139 commented Apr 3, 2024

The version with no network communications at all ran for 18.5 hours without disconnecting from Garmin connect, so I am pretty sure that it is the makeWebRequest calls that are causing the problems. I will do a test today with a V2.0.4Y which re-enables the sending of accelerometer and settings data, but does not query the phone for the seizure detector status.
If the issue is the number of calls to makeWebRequest, this will fail after about 10 hours (I'll have to calculate this properly later - it failed after 5-6 hours previously and we are going to do just over half as many calls as before).
Will let you know what happens (started 0800)....

@jones139
Copy link
Member Author

jones139 commented Apr 3, 2024

.....well, that was surprisingly unsuccessful. The watch disconnected after about 3 hours 15 min. I am trying the test again now, starting at 1600.

@jones139 jones139 closed this as completed Apr 3, 2024
@jones139 jones139 reopened this Apr 3, 2024
@jones139
Copy link
Member Author

jones139 commented Apr 4, 2024

2nd test lasted 7:25, which is more like I was expecting. Had better do a third test...started 0825.

@pmithrandir
Copy link
Collaborator

pmithrandir commented Apr 4, 2024 via email

@jones139
Copy link
Member Author

jones139 commented Apr 4, 2024

I haven't measured the latency of the phone web server. I think it is quite slow, but comfortably less than 5 seconds.

The OpenSeizureDatabase has a testRunner algorithm that uses a real device to process the data. We could modify that to record the time it takes?

The fact that I have had two very different test results on the venusq makes me think it is not just a simple count of makewebrequest calls. But I don't know what is causing it yet!

@jones139
Copy link
Member Author

jones139 commented Apr 4, 2024

Ok, test results for V2.0.4Y (all comms running except requesting seizure detector status from phone), running on the original VenuSQ with SW Version 5.0.0 and ANT/BLE Version 3.0.1:

(I don't know why this doesn't appear as a nice table!)

| --- | ---- | ---- |
| No. | Time to Fail (hh:mm) | Notes |
| 1 | 03:15 | Surprisingly quick to fail - I wonder if it is to do with what I had been doing with the watch before I started the test? |
| 2 | 07:25 | |
| 3 | 07:25 | Well, consistent with test 2 - and this was run straight after a re-boot. |

So if we add a user selectable option to switch off the seizure detector status feedback, we can get a fair bit extra run time from the VenuSQ without having to re-boot the watch.
@lundstrj, how does 7h compare to your experience with the VenuSQ?

@lundstrj
Copy link
Contributor

lundstrj commented Apr 4, 2024

7h is longer than what I was getting. My setup could occasionally perform for almost a night but would mostly fail much sooner. I'd estimate that my setup would fail as early as 15-30 minutes even though it usually would run for at least a few hours.

Have we had any word from our friends at Garmin regarding if/when they might make the API we are using more stable again?

(Since you got the BangleJS widget working and the VenuSQ2 getting almost instantly ruined by Garmin, I have completely abandoned even trying a Garmin device. I'm quite crossed with them about the whole thing)

@jones139
Copy link
Member Author

jones139 commented Apr 4, 2024 via email

@pmithrandir
Copy link
Collaborator

Ok, test results for V2.0.4Y (all comms running except requesting seizure detector status from phone), running on the original VenuSQ with SW Version 5.0.0 and ANT/BLE Version 3.0.1:

(I don't know why this doesn't appear as a nice table!)

| --- | ---- | ---- | | No. | Time to Fail (hh:mm) | Notes | | 1 | 03:15 | Surprisingly quick to fail - I wonder if it is to do with what I had been doing with the watch before I started the test? | | 2 | 07:25 | | | 3 | 07:25 | Well, consistent with test 2 - and this was run straight after a re-boot. |

So if we add a user selectable option to switch off the seizure detector status feedback, we can get a fair bit extra run time from the VenuSQ without having to re-boot the watch. @lundstrj, how does 7h compare to your experience with the VenuSQ?

From my perspective, I would say that number of hour is irrelevant.

It looks like one factor(request time, request closure, timeout not managed, etc...) is generating a situation where the bug occurs.

It looks like we are talking about probabilities here, the more time it run the more it's probable to fail.

I think our best option would be to add logs to identify the last request done, their time stamp, maybe the request time, etc... Anything that can give us context and help us figure out what is happening here.

@jones139
Copy link
Member Author

jones139 commented Apr 4, 2024

@pmithrandir, I think you are right, but I was looking for a quick and easy way of making an improvement, because I'd rather concentrate on PineTime reliability :).

@pmithrandir
Copy link
Collaborator

Hi,

I think I improved a bit the mechanism.

It was possible we entered in the following scheme.

  • setting or status ongoing from previous loop. 1
  • accel data ongoing 2
  • timeout reached at 5 sec(because the condition was excluding 4 sec)
  • releasing the lock
  • launching the new query 3
  • cancelling the queries too late !!

the new code check if we reached the timeout, start by canceling request, write the error, then only it will release the lock.

        if ((waitingTime as Time.Duration).compare(TIMEOUT) >= 0){
          Comm.cancelAllRequests();
          var tagStr = "SDComms.onTick()";
          writeLog(tagStr, "Sending accelData failed");
          mAccelHandler.mStatusStr = Ui.loadResource(Rez.Strings.Error_abbrev).toString() + ": " + Ui.loadResource(Rez.Strings.Error_request_in_progress).toString();
          mDataRequestInProgress = false;

if someone can try the branch 26 on a watch it would be great. I have only a VA4 which pose no issue.

@jones139
Copy link
Member Author

jones139 commented Apr 5, 2024 via email

@jones139
Copy link
Member Author

jones139 commented Apr 6, 2024

Unfortunately it failed after 4.5 hours...

How about combining this change with an option to switch off the seizure detector status requests so we can squeeze out as much run time (on average) as we can, and call that the best we can do until Garmin fixes the issue?

The only other idea I have is to switch to the messaging api, which is a significant change on the android side....and we do not know it will help.

@pmithrandir
Copy link
Collaborator

Hi,

May you share the logs please ? Just in case something pop up in my mind.

Pierre

@jones139
Copy link
Member Author

jones139 commented Apr 6, 2024

Log files below - It looks like we had one failure, from which it recovered at 02:25:03, then it failed completely at 02:28:43.
I am not sure what the 'onTick' messages are telling us though - I think that is a regular check rather than a direct result of a makeWebRequest call?

Garmin_SD.TXT
Garmin_SD_BAK.txt

@pmithrandir
Copy link
Collaborator

Did you try compiling with a higher targeted SDK version ?
Just in case it's just the retro compatibility which is broken ?

@jones139
Copy link
Member Author

jones139 commented Apr 6, 2024

I compiled with SDK 6.4.2, which is from January, so more recent than the fault developed. I have not tried the version released on 01 April (V7.1.0) though.

@pmithrandir
Copy link
Collaborator

I was talking about the target SDK in manifest.xml.

If you set 3.3 for example, which should be the exact version of venuSQ, how does that work ?

@jones139
Copy link
Member Author

jones139 commented Apr 6, 2024 via email

@jones139
Copy link
Member Author

jones139 commented Apr 6, 2024

I just found a forum post that pointed to an old (beta test) version of the VenuSQ firmware (https://www8.garmin.com/support/download_details.jsp?id=16030)
If the idea of setting the minimum SDK to the actual watch SDK version does not work, I might try that on my watch. It is a pity

@jones139
Copy link
Member Author

jones139 commented Apr 7, 2024

It only lasted about 2.5 hours last night, even compiled for the specific sdk version.
So I will try the old firmware above tonight.

@jones139
Copy link
Member Author

jones139 commented Apr 7, 2024

Attempt at Downgrading VenuSQ Firmware

Before doing anything, the software versions were:

  • SW Version 5.00
  • DFU edb833
  • GPS Version 6.01
  • ANT/BLE Version 3.01
  • TSC Version 2.50
  • Sensor Version: 25.60
  • WHR Version: 5.04.00
  • Connect IQ: System 5
  • BMX 5.4.0
  • Secondary BMX 12.0.2
  • API Level 3.3.6

Downloaded the firmware based on this forum post that pointed to an old (beta test) version of the VenuSQ firmware (https://www8.garmin.com/support/download_details.jsp?id=16030).

The firmware file needed 6.4MB of spare space, but there was only 3.6 MB available. Searched through the filesystem (using du -s) to find the largest files. The only one with enough used space to be worth deleting was GARMIN/EXTDATA, so removed one file D844800.FNT which was over 3MB. This gave enough space to copy GUPDATE-410.GCD into the GARMIN folder. It took a long time to copy and got stuck at 1MB progress on my linux file manager app that I used for the copy. Both du -s and ls -l appear to show the copy as complete though. The file transfer dialog disappeared after 5 minutes, suggesting it is complete.
Renamed GUPDATE-4.10.GCD to GUPDATE.GCD from the command line.
Ejected the GARMIN device from the file manager.
The watch prompted saying software updates are available and I told it to install them....

Watch did the install and said 'Restarting....' for a while, then watch rebooted and showed a cog with a progress bar to show it is updating.
Paired with phone using Garmin Connect (the latest version from play store, I have not downgraded that)

After this, the software versions shown are:

  • SW Version 4.10 (was 5.00)
  • DFU: Not Listed (was edb833)
  • GPS Version 6.01 (unchanged)
  • ANT/BLE Version 3.01 (unchanged)
  • TSC Version 2.50 (unchanged)
  • Sensor Version: 25.60 (unchanged)
  • WHR Version: 5.04.00 (unchanged)
  • Connect IQ: System 5 (this is a surprise - maybe this is related to the version of Garmin Connect on the phone?)
  • BMX 5.4.0 (unchanged)
  • Secondary BMX 12.0.2 (9.0.0)
  • API Level 3.3.3 (was 3.3.6)

GarminSD was still installed, so selected that as a favourite app.

Started GarminSD and it crashed with an IQ! error. This will be because we had compiled that version for a later sdk version, although I don't understand the log file error CIQ_LOG_YML.txt.
The error in the backup CIQ log file CIQ_LOG_.BAK.txt says something about not finding the 'Hydration' symbol, which is consistent with us running on an old sdk firmware, even though we are not using he hydration function....

Changed the minimum SDK verson back to 2.4.0 in manifest.xml and re-compiled GarminSD.
After that V2.0.5 started ok and connects to the OSD android app. Started running at 20:11

I received an email notification on the watch while OSD was running, so went into the Garmin Connect settings for the watch to set it to only notify calls and texts, switch off move prompts, and set 24 hours. Garmin Connect said the change would take effect next time the watch syncs.

But when I went to the devices list, the watch was not connected, and the watch was completely dead - no display and no response to button presses - long press of top or bottom button did not have any effect. Did a very long press of both buttons (about 30 secs, which should do a reset), and after that a long (couple of seconds) press of the top button booted the watch.

Restarted GarminSD at 20:33

Saw an ERR: COMMS warning on the watch at 20:34, but noticed that Garmin Connect on the phone says it is sending a software update - will have to try to avoid this being installed!

The system worked for most of the night, but I did get a FAULT warning after about 8 hours. The watch app had shutdown - no indications of a crash - re-starting the watch app got it working again, so I might have accidentally pushed the buttons to shut it down when I was asleep.
It has then run for another 16 hours without problem.

So I think that downgrading the firmware can resolve the issue.

@pmithrandir
Copy link
Collaborator

good luck to you !!

My son went in vacation with the watch... could not really help on the subject for a couple of weeks.

@jones139
Copy link
Member Author

jones139 commented Apr 7, 2024

good luck to you !!

My son went in vacation with the watch... could not really help on the subject for a couple of weeks.

Thank you for all the effort you have put in to this over the last few weeks - the code is in a much better state now!
I have the advantage that the VenuSQ is not our 'production' device so I can play with it without worrying about breaking it!

@seaside1
Copy link

seaside1 commented Apr 8, 2024

I see a similar issue on the Vivoactive3. It sometimes just stopps updating. BLE gets stale.
It does not work to just restart the app, but if I toggle bluetooth on the phone on/off it start working again (without app restart). I have noticed the same on

  • Nexus 6P
  • Samsung S9
  • Samsung S23 Ultra
  • OnePlus 7 Pro

It happens approximately every 5-8 hours, but not always, sometime it works for 16+ hours.

@jones139
Copy link
Member Author

jones139 commented Apr 8, 2024

@seaside1 that is very odd - it sounds like a different issue to this one, because for this one the only solution has been to re-boot the watch - changing things on the phone does not cure the VenuSQ....and I don't think I have seen it on my Vivoactive 3 when we have been testing out battery life. When I get chance I'll set the VenuSQ running for as long as possible to see if I can reproduce this issue.

@jones139
Copy link
Member Author

jones139 commented Apr 8, 2024

It looks like downgrading the VenuSQ firmware does resolve this issue - see long comment above.

@pmithrandir
Copy link
Collaborator

That good news, sort off...

It would be nice to identify exactly which firmware came with the issue to get better chance from garmin to work on it.

But honnestly, I have big doubt they will do anything at all.( old model, didn't seem to be prioritized somehow).

But maybe if we get the exact version we could get the list of changes made, and see if it gives us hint about local workaround

@jones139
Copy link
Member Author

jones139 commented Apr 9, 2024

I have published some installation instructions for users here: https://openseizuredetector.github.io/pages-user/garmin_venusq_downgrade.html.

I also added a comment on the Garmin bug tracker pointing them to which version works - if they show any sign of helping I might do some more to narrow it down for them, but I have had no response so far!

@brja
Copy link

brja commented Oct 25, 2024

This pre-release helped with recent connection issues we got on our Venu sq 2. Maybe it helps and it is worth giving a try https://www.openseizuredetector.org.uk/?p=2202

@jones139
Copy link
Member Author

jones139 commented Oct 25, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants