Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simultanuous use of BLE for heartrate and app connection crashes bluetooth stack #69

Open
JaapvanEkris opened this issue Jan 7, 2025 · 84 comments · Fixed by #73 or #56
Open

Simultanuous use of BLE for heartrate and app connection crashes bluetooth stack #69

JaapvanEkris opened this issue Jan 7, 2025 · 84 comments · Fixed by #73 or #56
Labels
bug Something isn't working

Comments

@JaapvanEkris
Copy link
Owner

JaapvanEkris commented Jan 7, 2025

With the newer verions of Raspberry Pi OS seem to introduce a problem: the Bluetooth stack seems to be slightly modified, and this blocks simultanuous use of Bluetooth for recieving Heartrate and sending metrics to recieving apps. Without HRM via Bluetooth it works well. But as soon as the HRM via Bluetooth is enabled, it crashes.

Several have observed this (see also laberning#174) and upgrading the bluetooth npm packages doesn't seem to help. It is an issue on both Bullseye and Bookworm, although I had a working version on Bullseye at one moment (but can't get it to reproduce after an install).

I upgraded from @ababdonware to @soprocent npm libs, and looked at https://github.com/noble/noble#bleno-compatibility but I can't get it to work. Lars has put a hack in the HeartRateManager.js to get around it, but it seems to be failing.
I have stored my progress in the https://github.com/JaapvanEkris/openrowingmonitor/tree/V1Beta-Architecture_Revision-EXTREMELY_EXPERIMENTAL branch, but after banging my head against walls for a day, I think I need someone else to have a look to help fix this. Anyone (@Abasz)?

@JaapvanEkris JaapvanEkris changed the title Simultanuous use of BLE for heartrate and app connection Simultanuous use of BLE for heartrate and app connection crashes bluetooth stack Jan 7, 2025
@Abasz
Copy link
Collaborator

Abasz commented Jan 7, 2025

This is difficult. Lars's hack was that he spun up another process with anther BLE stack and that worked for the receiving end.

Can you let me know the exact version of bookworm that does not work? I will try to reproduce the issue and find a fix.

@JaapvanEkris
Copy link
Owner Author

This is difficult. Lars's hack was that he spun up another process with anther BLE stack and that worked for the receiving end.

Yeah. I added the environment variable in the fork command that tells Noble that Bleno is also active, and now it works well on Bullseye 64 Bit Lite (latest release).

Can you let me know the exact version of bookworm that does not work? I will try to reproduce the issue and find a fix.

I did a clean install of Bookworm 64 Bit Lite on a Raspberry Pi 4, 8Gb (same machine I used for the Bullseye test), but EXR won't even connect decently. The process seems to be locked in some waiting loop: the bluetooth buttons are not responsive as soon as BLE HRM is active.

@JaapvanEkris
Copy link
Owner Author

Can you let me know the exact version of bookworm that does not work? I will try to reproduce the issue and find a fix.

I did an endurance test, and Bullseye seems to unregister the rowingdata service after 45 minutes. Tomorrow I'll try a fresh install on Bullseye and retrofit the relevant changes to the @abandonware version and see what happens.

@Abasz
Copy link
Collaborator

Abasz commented Jan 8, 2025

So I should use latest bullseye and the extremely_experimental branch right?

@JaapvanEkris
Copy link
Owner Author

So I should use latest bullseye and the extremely_experimental branch right?

That should result in a stack that works for about 45 minutes. When you install the extremely_experimental branch on Bookwork, it crashes immediatly.

@Abasz
Copy link
Collaborator

Abasz commented Jan 8, 2025

Do you use 32 or 64 bit version?

@JaapvanEkris
Copy link
Owner Author

JaapvanEkris commented Jan 8, 2025

Do you use 32 or 64 bit version?

64 bit.

An update from my side. I installed main on Bullseye, and it worked flawlessly for over 100 minutes with BLE HRM active, but no HRM connected. When I connect a HRM, it disconnects within 10 minutes with the following error:

Jan  8 14:14:54 roeimachine npm[3711]: RowerDataCharacteristic - central unsubscribed
Jan  8 14:14:54 roeimachine npm[3711]: ble central disconnected: f8:e4:e3:f9:89:12
Jan  8 14:14:54 roeimachine npm[3741]: heart rate peripheral disconnected, searching new one

Next session for over half an hour, I get the follwong:

Jan  8 15:40:22 roeimachine npm[4426]: RowerDataCharacteristic - central unsubscribed
Jan  8 15:40:22 roeimachine npm[4426]: ble central disconnected: f8:e4:e3:f9:89:12
Jan  8 15:40:22 roeimachine npm[4457]: heart rate peripheral disconnected, searching new one

(still on a freshly installed main branch on Bullseye)

@JaapvanEkris
Copy link
Owner Author

Another observation, still on Bullseye and main: when I switch to PM5 mode, it is shown in the GUI, but EXR doesn't see it even with BLE HRM off. Weird.

@Abasz
Copy link
Collaborator

Abasz commented Jan 8, 2025

I have also frequent connection drop on the HR side. It is irrespective whether any other device is connected or not... This is very difficult to debug... I am now not even sure if this is really with the bleno or noble actually... This seems more like the hci layer not being able to separate the two.

The HR is connected starts sending data and suddenly gets disconnected:

> ACL Data RX: Handle 64 flags 0x02 dlen 9                                             #80 [hci0] 28.360575
      ATT: Handle Value Notification (0x1b) len 4
        Handle: 0x0030
          Data: 063d
> HCI Event: Disconnect Complete (0x05) plen 4                                         #81 [hci0] 30.341227
        Status: Success (0x00)
        Handle: 64
        Reason: Remote User Terminated Connection (0x13)
@ MGMT Event: Device Disconnected (0x000c) plen 8                                 {0x0002} [hci0] 30.341331
        LE Address: D8:6D:3F:82:E2:09 (Static)
        Reason: Connection terminated by remote host (0x03)
@ MGMT Event: Device Disconnected (0x000c) plen 8                                 {0x0001} [hci0] 30.341331
        LE Address: D8:6D:3F:82:E2:09 (Static)
        Reason: Connection terminated by remote host (0x03)

Bluetooth log says that the remote host terminted the connections but that does not seem right. In a few seconds HR reconnects and the this goes on and on.

@Abasz
Copy link
Collaborator

Abasz commented Jan 8, 2025

Might related to this: bluez/bluez#914 Which in case it is true, it will be difficult to deal with the upstream issue...

But also can you please try this: https://github.com/noble/noble?tab=readme-ov-file#bleno-compatibility for me it does not work either but may be you have a different setup :)

@Abasz
Copy link
Collaborator

Abasz commented Jan 8, 2025

Ok, I think I have some improvement.

I tried to use the HR broadcast of my Garmin but that does not work for some reason as it disconnects contionusly in every scenario (even if its standalone). I switch to my polar HRM

Now I am able to run things. I created a branch on my fork "ble-fix" but basically this is what has been done:

package.json:

removed the hci socket package and updated both bleno and noble.

In hte HR peripheral I switch to the default noble import and remove Lars's hack.

And most importantly I run the npm start with NOBLE_MULTI_ROLE=1 environment variable as per the docs of noble.

@JaapvanEkris
Copy link
Owner Author

JaapvanEkris commented Jan 8, 2025

Ok, I think I have some improvement.

I tried to use the HR broadcast of my Garmin but that does not work for some reason as it disconnects contionusly in every scenario (even if its standalone). I switch to my polar HRM

Now I am able to run things. I created a branch on my fork "ble-fix" but basically this is what has been done:

package.json:

removed the hci socket package and updated both bleno and noble.

In hte HR peripheral I switch to the default noble import and remove Lars's hack.

And most importantly I run the npm start with NOBLE_MULTI_ROLE=1 environment variable as per the docs of noble.

Oke, small trick here (see https://github.com/JaapvanEkris/openrowingmonitor/blob/V1Beta-Architecture_Revision-EXTREMELY_EXPERIMENTAL/app/peripherals/ble/HrmPeripheral.js) it spawns off the HRM peripheral in a seperate proces, and I added it as an environment variable there. I checked via logging process.env.NOBLE_MULTI_ROLE in the child process, and that way it is recieved. I don't know if environment variables are inherited by child processes.

@Abasz
Copy link
Collaborator

Abasz commented Jan 8, 2025

Creating a separate process is fine I suppose. I am not sure how much this brings to the table as noble and bleno are already async anyway.

on the new library you are using, are you sure @stoprocent/node-bluetooth-hci-socket needed as an explicit dependency?

I am asking as I was looking at the docs and the code currently does not seem to use the UART version hence it is unnecessary. This is the same on my fork where I removed it so bleno and noble can be updated to their latest version for abandonware.

But it makes sense to update a the bleno and noble to a something that is being actually more actively maintained and has more downloads. But still I would try without node-bluetooth-hci-socket package.

EDIT:

Also I would remove Lars's workaround as I think that is no longer necessary.

@JaapvanEkris
Copy link
Owner Author

Creating a separate process is fine I suppose. I am not sure how much this brings to the table as noble and bleno are already async anyway.

It PROBABLY (not sure here) is to create as much distance between the bleno and noble instances. But I don't think it indeed brings many advantages, but it adds complexity.

on the new library you are using, are you sure @stoprocent/node-bluetooth-hci-socket needed as an explicit dependency?

Not at all. My approach was to just move forward with newer packages and hope that the issues we had a year ago would be resolved by updates, as they originated upstream. If it works without, lets remove it. As I see it, @stoprocent forked the repo and started fixing stuff. It is 1-on-1 compatible with @abandonware's packages, so it was an easy replacement.

As an aside, if you look carefully at the package list, you can see that I'm clearing out several packages as well. Snowpack was also included, which caused over 20 security issues, but is only needed for specific development environments. eslint is frustrating during checking but helpful, but also needs a major cleanup. Babel is in there for some obscure reason I haven't figured out yet. So I'll do some cleaning in the next year.

I am asking as I was looking at the docs and the code currently does not seem to use the UART version hence it is unnecessary. This is the same on my fork where I removed it so bleno and noble can be updated to their latest version for abandonware.

Yeah. I'm not sure how the hack actually worked, or why it was implemented in the first place. The NOBLE_MULTI_ROLE environment variable was introduced 7 years ago and is part of all documentation, even the original one (@abandonware is already a fork of an abandoned project...). The code for the hrm is about 4 years old. So why Lars impelemented it this way is still a mystery for me.

But it makes sense to update a the bleno and noble to a something that is being actually more actively maintained and has more downloads. But still I would try without node-bluetooth-hci-socket package.

Completely agree. But I rather do it after fixing the bug and have some decent due dilligence, as the @stoprocent doesn't fix the bug anyway. So your approach to hook onto EXPERIMENTAL is a good one.

Also I would remove Lars's workaround as I think that is no longer necessary.

Completely agree. It does weird stuff, and you get exposed to all kind of undocumented changes. In this case, the HeartRateManager calls 'const noble = new Noble(new NobleBindings())' which must get a {} as argument with newer versions of noble. But it is completely undocumented as nobody is expected to use internal parameters like this.

@Abasz
Copy link
Collaborator

Abasz commented Jan 8, 2025

Completely agree. It does weird stuff, and you get exposed to all kind of undocumented changes. In this case, the HeartRateManager calls 'const noble = new Noble(new NobleBindings())' which must get a {} as argument with newer versions of noble. But it is completely undocumented as nobody is expected to use internal parameters like this.

Actually it is documented under "Advanced usage". Just variable names are different. New version requires a params object. That is not really relevant for ORM. Hence I am confident this is removable.

Actually I should have done the cleaning of these things year ago when did the rework of the peripherals, but at the time I did not look deep into the inners of ble and the code worked. This is the issue with legacy code noone dares touching it :)

Now I went through high level the API of bleno and noble and I am confident that they should be able to work simultaneously without real issues.

The only part I don't get is why my Garmin watch get disconnected very frequently while a normal HRM strap works. That is a mystery to me....

Note to self: Could it be that it does not have a battery characteristic and the current OEM code does not tolerate this? But there are no errors nothing....

@Abasz
Copy link
Collaborator

Abasz commented Jan 8, 2025

Also, the latest abandonware supports async await that simplifies the noble code. I have not converted the code because first it should work, but this is also something that would make the code simpler. Once this bug is cleared I am happy to do the refactoring.

@JaapvanEkris
Copy link
Owner Author

JaapvanEkris commented Jan 9, 2025

Actually I should have done the cleaning of these things year ago when did the rework of the peripherals, but at the time I did not look deep into the inners of ble and the code worked. This is the issue with legacy code noone dares touching it :)

You can't fix everything in one go. And indeed, legacy code is scary business. The issue is, @laberning 's fix has been implemented while the NOBLE_MULTI_ROLE environment variable already existed for years. The issue then becomes "Did he copy an outdated workaround thinking it was the only way to do it, or did he deliberatly do this because he knew the NOBLE_MULTI_ROLE wouldn't work for ORM?"

Now I went through high level the API of bleno and noble and I am confident that they should be able to work simultaneously without real issues.

True, but I still get

Jan 09 10:47:04 roeimachine npm[18784]: RowerDataCharacteristic - central unsubscribed
Jan 09 10:47:04 roeimachine npm[18784]: ble central disconnected: 77:13:76:6c:6c:03
Jan 09 10:47:04 roeimachine npm[18816]: heart rate peripheral disconnected, searching new one

I agree with your feeling that something deeper in the OS may be at play here: seeing both HRM and FTMS disconnect at the same time suggest that the underlying stack has a hickup or quick reset. But the logs show nothing of that kind. The thing is that the disconnects sometimes come at minutes into the row, and sometimes it takes ages. (over an hour...) Could it be some kind of collission thing below the surface that isn't handled well (i.e. peripheral both wanting to send and recieve at the same time?).

I added

  // The environmental variable below is needed to keep BleNo and NoBle apart, as they both want access to the same hardware but for different roles
  // See https://github.com/stoprocent/noble?tab=readme-ov-file#bleno-compatibility-linux-specific
  const env = { env: { NOBLE_MULTI_ROLE: 1 } }

  const bleHrmProcess = child_process.fork('./app/peripherals/ble/hrm/HrmService.js', env)

To app/peripherals/ble/HrmPeripheral.js but that didn't help either.

An option is to force this by adding process.env[' NOBLE_MULTI_ROLE'] = 1 before every invoke. But it doesn't do justice to setting an environment variable.

I also tried switching to stoprocent 's version, and it delivers some error messages. So it definitely is different. Not sure this is an improvement. It might be that abandonware's version just fails silently and stoprocent is vocal about it.

@JaapvanEkris
Copy link
Owner Author

JaapvanEkris commented Jan 9, 2025

Also, the latest abandonware supports async await that simplifies the noble code. I have not converted the code because first it should work, but this is also something that would make the code simpler. Once this bug is cleared I am happy to do the refactoring.

Could our dance around the lack of async in earlier BleNo/NoBle code introduce this bug?

@Abasz
Copy link
Collaborator

Abasz commented Jan 9, 2025

Could our dance around the lack of async in earlier BleNo/NoBle code introduce this?

based on the info in the docs, I dont think it should. But it is possible that some order of method invokations via the callbacks are cuasing this. but I dont see this too realistic.

For logging I use btmon, as it shows the HCI calls on the disconnect.

Anyway, I will give it another go, today evening and refactor to async, clean up the code etc. I will stay with no child process fork, I start the process like this: sudo NOBLE_MULTI_ROLE=1 npm start. I think this is important so the whole process (including bleno and noble) inherits this env var. Though bleno does not use this in code as far as I can see, only noble does: https://github.com/abandonware/noble/blob/49a889e4ffbf33aabbdbcc54a2d487b2aa784f3e/lib/hci-socket/gatt.js#L96

This is how I run anyways when I was testing, but to be fair, I have not run it for hours, only for 5-10 minutes as testing and not a real excersize.

The issue is, @laberning 's fix has been implemented while the NOBLE_MULTI_ROLE environment variable already existed for years.

The workaround especially the part where the new Noble is created essentially just creates the default of everything actually, with the small exception that those additional events are added. I believe it does the same thing as the env flag: i.e. it removes the warnings that are raised when there is a message meant for the peripheral rather than the central.

If you look at the code https://github.com/abandonware/noble/blob/49a889e4ffbf33aabbdbcc54a2d487b2aa784f3e/lib/hci-socket/bindings.js#L359 it shows that the patch is almost exactly the same apart from the else block that raises a warning. basically Lars switched this off. So I believe the patch is actually a no-op for our case.

What can be done (and I will try this out) is adding the DEBUG=* environmental variable at start, that should enable further messages on the noble and bleno level.

@JaapvanEkris
Copy link
Owner Author

This is how I run anyways when I was testing, but to be fair, I have not run it for hours, only for 5-10 minutes as testing and not a real excersize.

Yeah, I'll run those. Set ORM to simulation (the C2 2K sample can be set in a loop :), which is set to a Full Marathon), and let my HR belt report and connect to EXR. Just listening to the paddling stop :)

What can be done (and I will try this out) is adding the DEBUG=* environmental variable at start, that should enable further messages on the noble and bleno level.

I'll do the crazy thing then. I'm going to install laberning's main (over 3 years old) on Bullseye, see if the issue is present there as well. Lars has pinned bleno and noble to very specific versions. If this fails as well, it excludes any unmanaged update of the core packages and dependencies.

@Abasz
Copy link
Collaborator

Abasz commented Jan 9, 2025

'll do the crazy thing then. I'm going to install laberning's main (over 3 years old) on Bullseye, see if the issue is present there as well. Lars has pinned bleno and noble to very specific versions. If this fails as well, it excludes any unmanaged update of the core packages and dependencies.

Yes that is a good idea!

@antaron1973
Copy link

antaron1973 commented Jan 9, 2025

'll do the crazy thing then. I'm going to install laberning's main (over 3 years old) on Bullseye, see if the issue is present there as well. Lars has pinned bleno and noble to very specific versions. If this fails as well, it excludes any unmanaged update of the core packages and dependencies.

Yes that is a good idea!

Following your conversation here, as I have in general the BLE connection problem with a PI Zero 2W. I currently work with a Raspberry Pi OS Lite 32 Bit Bullseye and the laberning version. This version is the only combination that works for me. I‘m using EXR and a Wahoo Tickr Fit HRM.

@stoprocent
Copy link

Can I help? I'm not aware of the specifics of the issue but if you can help me reproduce something I can look into this.

@stoprocent
Copy link

Ok I have looked a bot through the code and the issue and I understand you would like to run both Noble and Bleno concurrently. That should not be a problem I will have a look tonight. I think it's only about sharing the HCI socket instance and queueing requests properly from both libs. I'm already doing a lot of other changes to Bleno now to support multi-connection and other features and this should not be crazy complex. I have rewritten entire native C++ binding for node HCI so I know where there could be an issue here.

@JaapvanEkris
Copy link
Owner Author

JaapvanEkris commented Jan 9, 2025

Following your conversation here, as I have in general the BLE connection problem with a PI Zero 2W. I currently work with a Raspberry Pi OS Lite 32 Bit Bullseye and the laberning version. This version is the only combination that works for me. I‘m using EXR and a Wahoo Tickr Fit HRM.

Interesting, I have just tried this version on a Raspberry Pi 4 8Gb (using Lars install script, and his package list, installing @abandonware/bleno at 0.5.1-4, @abandonware/bluetooth-hci-socket at 0.5.3-7 and @abandonware/noble at 1.9.2-15 on Nodev16):

  • 64 Bit: same issue: as long as BLE HRM is deactivated via the config, it all works. If BLE HRM is active in the config it will acquire heartrate data and display it locally, but ORM will not even advertise its presence via BLE. In /var/logmessages I see reporting that FTMS is actived, but from there on it lacks any further error reporting from the bluetooth stack. So I guess it fails silently or is stuck in some deadlock.
  • 32 bit: same as 64 bit. As soon as BLE HRM is active, the FTMS service isn't advertised. As soon as I disable BLE HRM in the config, FTMS is advertised again and works flawlessly.

@JaapvanEkris
Copy link
Owner Author

I will have a look tonight.

Thank you, that is extremely kind (as is the help from @Abasz and @antaron1973 in resolving this).

@Abasz
Copy link
Collaborator

Abasz commented Jan 9, 2025

@JaapvanEkris Some update on whether starting a new process for HRM and Lars's comment Starts the central manager in a forked thread since noble does not like to run in the same thread as bleno

I tried to remove the forking and I get a HCI error (command disallowed). Its possible that this was the reason for Lars starting a new process.

I think this error is due because the BLE central and peripheral is trying to access the HCI socket at the same time and one command interfere with another and does not work. For testing purposes I added a 1 sec delay in starting the central (i.e. HRM) and that fixes. I will try things without the forked process. I will test if things are within the same thread would make things better or worth.

I have made significantly changed the HRM setup (moved to async, added better listener clean up etc.) I will do testing.

think it's only about sharing the HCI socket instance and queueing requests properly from both libs.

Yes this seems to be the issue here.

@Abasz
Copy link
Collaborator

Abasz commented Jan 9, 2025

Made some changes: https://github.com/Abasz/openrowingmonitor/tree/ble-fix.

This seems to be working for me (polar H9 strap) but I did not try a marathon :)

I what I had to do is, I changed the order of peripheral setup, first central role with noble (HRM) then add a big delay 10 sec before setting up of the peripheral (FTMS). Delay can most certainly be reduced but I wanted to make sure there are no conflicting message on the startup.

Note, this still spins up a new process, but I've tested the version where I dont and it works the same.

EDIT:

my take on the whole thing is that noble and bleno are sending conflicting messages to the HCI socket like @stoprocent suggested. This version still uses abandonware but I tested stoprocent/bleno and noble and it has the same issue.

@Abasz
Copy link
Collaborator

Abasz commented Jan 20, 2025

I dug into the cpp addon code and run a lot of tests and I think this is what blocks:

https://github.com/stoprocent/node-bluetooth-hci-socket/blob/93d429ba3d572d3f0a24132ad75652b3488cd786/src/BluetoothHciSocket.cpp#L44

I suppose it is blocking becuase the HCI socket is busy (though this is not how thing should work when reading the bytes in the socket...). I will not be able to find a solution for this with my current understanding of this driver.

The abandonware version has similar polling code (though there is no while loop, and there is a call to an Async scope, but based on my testing its not the while or the sync jsCallback call that blocks, but specifically the read method which is the same in the abandonware version). Also we should note that with previous kernels and previous version of abandonware this was working. I checked the version of abandonware 0.5.3-7 that was the version in Lars's code and I dont see any obvious change there not to mention that latest abandonware worked for a while.

The part I cannot wrap my head around is the fact that everything is working when HRM reconnect (noble) happens before EXR connection (bleno)...

I wonder whether the HCI socket supports interrupt based polling (assuming this is the actual problem).

@JaapvanEkris JaapvanEkris linked a pull request Jan 21, 2025 that will close this issue
@Abasz
Copy link
Collaborator

Abasz commented Jan 21, 2025

cdd2dce

I've done tests on: Linux rowingmonitor 6.6.62+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.62-1+rpt1 (2024-11-25) aarch64 GNU/Linux bookworm

@stoprocent
Copy link

I am following up on what you guys are trying to do but I don't understand the patch you are trying to do. Can you explain simply?

@Abasz
Copy link
Collaborator

Abasz commented Jan 21, 2025

I am following up on what you guys are trying to do but I don't understand the patch you are trying to do. Can you explain simply?

Basically what we are trying to achieve is that we broadcast BLE data from our app (such as FTMS BLE Profile) as a peripheral to centrals (mobile phone, smart watches etc), while we also receive data (e.g. from a heart rate strap) as central.

Our patch is just a throttling the bleno and noble initial setup and pray :) So I would not say this is an actual fix.

There are two types of issues with two types of solutions that we were trying to think of

  1. The original idea was that noble runs in a forked process so two threads communicate to the HCI socket
    However when the forked processed is terminated because e.g. the user switches of the connection to the heart rate monitor off, as noble tears down it self (as part of the termination), it resets/closes the HCI socket which also kills the bleno connection stopping the data broadcast (for now we have choose this one as this protects the data broadcasted from our app as peripheral).

  2. The second attempt was to move the noble process into the main process (i.e. do not fork).
    However, if not forked, under certain conditions (there is a central, e.g. a mobile phone already connected to the bleno peripheral) the connection attempt to the external peripheral (Hear rate monitor) stalls the bleno communication and no data is broadcasted.

I tracked this down and this is the blocking line in node: https://github.com/stoprocent/noble/blob/ecb43339054cd3ebc2f5ff3da2600eb67c8e3fe9/lib/hci-socket/hci.js#L578

And I think this is what blocks in the cpp driver:
https://github.com/stoprocent/node-bluetooth-hci-socket/blob/93d429ba3d572d3f0a24132ad75652b3488cd786/src/BluetoothHciSocket.cpp#L44

Let me know if you have any questions

@JaapvanEkris
Copy link
Owner Author

I am following up on what you guys are trying to do but I don't understand the patch you are trying to do. Can you explain simply?

@stoprocent any progress on your side on this (were you able to find ot what we are going?)?

@Abasz
Copy link
Collaborator

Abasz commented Jan 28, 2025

I did some testing while looking for an alternative BLE library and I was able to achieve some success with this:

https://github.com/Emill/node-ble-host/

It was able to support central and peripheral role at the same time, including the handling of turning on and off each functionality. Actually I had a better and more stable connection on the central side (HRM) as it was able to connect to my FR255 flawlessly with which both abandonware and stoprocent noble library struggled.

I checked the HCI driver of this library and it seems that it communicates directly to the Bluetooth controller via kernel (so it does not need bluetooth service or hciconfig hci0 up). I suppose this is the reason that it does not work together with the other noble and bleno libraries. But I dont fully understand the driver as it has a lot of similarities with stoprocent and abandonware. It uses libuv polling for changes (similar to abandonware) in the contrary to stoprocent where the polling is seemingly offloaded to a new thread.

Also as BUT, it seems that it has not been updated for two years (there is one fork that made updates but mostly aesthetic). So if issue arises later on I expect no updates (or not quickly anyway).

The API is completely different than the current, so moving to this library would require a complete refactor of the BLE part. The API is not bad but different very much (its fully callback based but I can turn it into async/await with little effort fort more streamlined usage).

I currently lean towards stoprocent's library assuming that he is able to make the necessary changes to allow seamless simultaneous access. Considering support as well as requirement for limited changes.

@JaapvanEkris
Copy link
Owner Author

JaapvanEkris commented Jan 29, 2025

I did some testing while looking for an alternative BLE library and I was able to achieve some success with this:

https://github.com/Emill/node-ble-host/

It was able to support central and peripheral role at the same time, including the handling of turning on and off each functionality. Actually I had a better and more stable connection on the central side (HRM) as it was able to connect to my FR255 flawlessly with which both abandonware and stoprocent noble library struggled.

That is promissing. I notice that my new Garmin HRM belt is struggling somehow. It doesn't work well via Bluetooth (ANT+ is much more reliable).

Also as BUT, it seems that it has not been updated for two years (there is one fork that made updates but mostly aesthetic). So if issue arises later on I expect no updates (or not quickly anyway).

Well, Abandonware hasn't seen an update in 2 years either. When I look at the download statistics and open issues, it doesn't look like there are fundamental things going on (and if it ain't broken, why modify). If they interact on a fairly low (kernel) level, I guess API's aren't that dynamic. So it might be a bit more stable hook than higher level packages that move a bit faster.

The API is completely different than the current, so moving to this library would require a complete refactor of the BLE part. The API is not bad but different very much (its fully callback based but I can turn it into async/await with little effort fort more streamlined usage).

That sucks, especially since that kind of development isn't my strong point. I'll help where I can, and having a good example will help me a lot, but this stuff is a bit closer to I/O management than I' used to.

I currently lean towards stoprocent's library assuming that he is able to make the necessary changes to allow seamless simultaneous access. Considering support as well as requirement for limited changes.

Let's give @stoprocent until the end of next week to see if he can come up with a solution (or at least a prognosis of a resolution time). By then he had at least a month to see what the possibilities are for Bleno/Noble to resolve this. If he can't give us a solution or estimate, I am inclined to assume it is a dead end for us, despite being a a very promising and potentially better maintained package. I think ORM is better served with making a decision one way or the other, otherwise this issue keeps dragging on, potentially blocking other development as well. While such a major change remains undecided, it will block any functional development in the BLE area (for example, adding Force Curves to the PM5 interface, receiving workouts, etc.).

@Abasz
Copy link
Collaborator

Abasz commented Jan 29, 2025

That sucks, especially since that kind of development isn't my strong point. I'll help where I can, and having a good example will help me a lot, but this stuff is a bit closer to I/O management than I' used to.

That is fine, I was going to suggest I take on the task of redoing it anyway :) I would start with moving FTMS (and HRM) and then share so it can be tested. Its the testing that takes the longest actually. I would do the changes on V1Beta-Architecture_Revision-EXPERIMENTAL_BLE-Fix branch. Actually since there is a good separation between the controller logic (Peripheral Manager) and the service implementation the need changes are well I isolated.

If they interact on a fairly low (kernel) level, I guess API's aren't that dynamic

I fully agree.

I think ORM is better served with making a decision one way or the other, otherwise this issue keeps dragging on, potentially blocking other development as well. While such a major change remains undecided, it will block any functional development in the BLE area (for example, adding Force Curves to the PM5 interface, receiving workouts, etc.).

I fully agree, and also everything is source control so rolling back is not that issue.

EDIT:

The only question that remains is what should be done to the merging of the Architecture revision into main. As I said above, BLE rework does not interfere because of the controller abstraction, but clearly there is an error, and I think the current BLE fix provides a usable solution (yes one cannot change peripherals on the fly but needs to start with the correct settings) So I would actually merge that into main too with a big README update.

@JaapvanEkris
Copy link
Owner Author

That sucks, especially since that kind of development isn't my strong point. I'll help where I can, and having a good example will help me a lot, but this stuff is a bit closer to I/O management than I' used to.

That is fine, I was going to suggest I take on the task of redoing it anyway :) I would start with moving FTMS (and HRM) and then share so it can be tested. Its the testing that takes the longest actually. I would do the changes on V1Beta-Architecture_Revision-EXPERIMENTAL_BLE-Fix branch. Actually since there is a good separation between the controller logic (Peripheral Manager) and the service implementation the need changes are well I isolated.

I can help out here. I have tested a lot of BLE interaction with EXR and with a dedicated test-RPi.

The only question that remains is what should be done to the merging of the Architecture revision into main. As I said above, BLE rework does not interfere because of the controller abstraction, but clearly there is an error, and I think the current BLE fix provides a usable solution (yes one cannot change peripherals on the fly but needs to start with the correct settings) So I would actually merge that into main too with a big README update.

I agree with that one. Could you start a new PR with the V1Beta-Architecture_Revision as a base? The merge of V1Beta-Architecture_Revision-EXPERIMENTAL into the V1Beta-Architecture_Revision had a weird knock-on effect that the V1Beta-Architecture_Revision-EXPERIMENTAL_BLE-Fix branch now has merge conflicts. Not problematic, but the package-lock.json has 331 conflics, which is quite a hassle to resolve.

@Abasz
Copy link
Collaborator

Abasz commented Jan 29, 2025

I agree with that one. Could you start a new PR with the V1Beta-Architecture_Revision as a base? The merge of V1Beta-Architecture_Revision-EXPERIMENTAL into the V1Beta-Architecture_Revision had a weird knock-on effect that the V1Beta-Architecture_Revision-EXPERIMENTAL_BLE-Fix branch now has merge conflicts. Not problematic, but the package-lock.json has 331 conflics, which is quite a hassle to resolve.

What I would do with the lock.json is that I would merge with confilcts then create a commit where I would regenerate it (delete lock.json and the whole node folder and hit npm install.

@JaapvanEkris
Copy link
Owner Author

JaapvanEkris commented Feb 3, 2025

I've merged the branch succesfully, including some small tweaks. I've tested it today and Bluetooth FTMS has proven to be stable. I'll try the PM5 emulator tomorrow.

My HRM belt isn't that stable itself, so testing might prove difficult, but I'll do some tests on wednesday until sunday to see if it behaves well. If that is succesfull, I'll squash-merge the branches into main. Then at least we have a working version by the weekend.

From there on, we are in the position to look for a long term solution and implement it, be it stoprocent's library or another.

There were other tasks on my personal todo-list I also wanted to release (a GUI extension for setting targets and cleaning up the total package mess that Babel and eslint are creating), but we'll bump them to the next version as well. It will probably line up with the final version of the bluetooth implementation, making a nice interim-release. These things aren't perfect, but they will do for now. Rome wasn't build in a day either. If you look at the colossal changes we made under the hood, it is surprising we still got a working app.

@86bjorn
Copy link

86bjorn commented Feb 8, 2025

Is it the same issue? After a clean raspian x64 & rowingmonitor install can be found by the apps like kinomap and exr. After a reboot of the system the bluetooth is gone? Also the bluetooth service is dead
I cant find what im doing wrong.

`
● bluetooth.service - Bluetooth service
Loaded: loaded (/lib/systemd/system/bluetooth.service; disabled; preset: e>
Active: active (running) since Sat 2025-02-08 14:30:28 CET; 33s ago
Docs: man:bluetoothd(8)
Main PID: 801 (bluetoothd)
Status: "Running"
Tasks: 1 (limit: 1568)
CPU: 212ms
CGroup: /system.slice/bluetooth.service
└─801 /usr/libexec/bluetooth/bluetoothd

Feb 08 14:30:28 rowingmonitor bluetoothd[801]: Starting SDP server
Feb 08 14:30:28 rowingmonitor bluetoothd[801]: profiles/audio/vcp.c:vcp_init() >
Feb 08 14:30:28 rowingmonitor bluetoothd[801]: src/plugin.c:plugin_init() Faile>
Feb 08 14:30:28 rowingmonitor bluetoothd[801]: profiles/audio/mcp.c:mcp_init() >
Feb 08 14:30:28 rowingmonitor bluetoothd[801]: src/plugin.c:plugin_init() Faile>
Feb 08 14:30:28 rowingmonitor bluetoothd[801]: profiles/audio/bap.c:bap_init() >
Feb 08 14:30:28 rowingmonitor bluetoothd[801]: src/plugin.c:plugin_init() Faile>
Feb 08 14:30:28 rowingmonitor bluetoothd[801]: Bluetooth management interface 1>
Feb 08 14:30:28 rowingmonitor bluetoothd[801]: profiles/sap/server.c:sap_server>
Feb 08 14:30:28 rowingmonitor bluetoothd[801]: sap-server: Operation not permit>
lines 1-21/21 (END)
`

`openrowingmonitor.service - Open Rowing Monitor
Loaded: loaded (/lib/systemd/system/openrowingmonitor.service; enabled; preset: enabled)
Active: active (running) since Sat 2025-02-08 14:27:19 CET; 4min 49s ago
Main PID: 682 (npm start)
Tasks: 39 (limit: 1568)
CPU: 26.180s
CGroup: /system.slice/openrowingmonitor.service
├─682 "npm start"
├─694 sh -c "node app/server.js"
├─695 node app/server.js
└─709 /usr/bin/node ./app/gpio/GpioTimerService.js

Feb 08 14:27:22 rowingmonitor npm[682]: > [email protected] start
Feb 08 14:27:22 rowingmonitor npm[682]: > node app/server.js
Feb 08 14:27:25 rowingmonitor npm[695]: ==== Open Rowing Monitor 0.9.0 ====
Feb 08 14:27:25 rowingmonitor npm[695]: bluetooth profile: FTMS Rower
Feb 08 14:27:25 rowingmonitor npm[695]: Starting a just row session, no time or distance target set
Feb 08 14:27:25 rowingmonitor npm[695]: heart rate profile: Off
Feb 08 14:27:25 rowingmonitor npm[695]: ant plus profile: Off
Feb 08 14:27:25 rowingmonitor npm[695]: webserver running on port 80
Feb 08 14:27:26 rowingmonitor npm[709]: Gpio-service: pin number 17, polling interval 5 us, triggered on Up flank, mi>
Feb 08 14:27:55 rowingmonitor npm[695]: websocket client connected
`

@JaapvanEkris
Copy link
Owner Author

Is it the same issue? After a clean raspian x64 & rowingmonitor install can be found by the apps like kinomap and exr. After a reboot of the system the bluetooth is gone? Also the bluetooth service is dead I cant find what im doing wrong.

Could be. Which branch did you install? Looking at the logs you installed the 'main' branch. Is that correct?

I planned to merge the updated (fixed) branch into main this evening.

@86bjorn
Copy link

86bjorn commented Feb 8, 2025

Could be. Which branch did you install? Looking at the logs you installed the 'main' branch. Is that correct?

I planned to merge the updated (fixed) branch into main this evening.

yes its the main branch.
Okay i will try a new install tomorrow then. I will keep you updated if the merge solves it

JaapvanEkris added a commit that referenced this issue Feb 8, 2025
# New functionality in 0.9.5

- Added **FIT-File support**: you an now automatically generate a FIT-file after a rowing session, which alows for a more detailed reporting than the tcx-format, and is commonly accepted by most platforms
- **Introduction of the session manager**, which provides support for intervals, splits, rest intervals and spontanuous pauses in the session and also adds these to the FIT, tcx and RowingData recordings. Please note, setting predetermined intervals and splits in a user friendly way (via PM5 and webinterface) is still a ToDo that is intended for 1.0.0.
- **Improvement of Magnetic rower support**: the new session manager makes sure that the session is nicely stopped, even when the flywheel has stopped quite abruptly before pause timeouts have time to kick in. This is the case on some magnetic rowers which have an extreme high drag, resulting in very short spin down times of their flywheel.

# Bugfixes and robustness improvements in 0.9.5

- **Improvement of the architecture**: we cleaned up the old architecture and went to a more message bus structure where clients are responsible for listening to the datatransmissions they are interested in. See [the architecture description](Architecture.md) for a deep-dive of the implementation. Key benefit is that this is more maintainable as it allows serving data more easily to totally different clients (webGUI, recorders and BLE/ANT+) with totally different needs, making future enhancements easier.
- **Improvement of Bluetooth stability**: we moved away from abandonware's BLE NoBle/BleNo implementation and moved to stoprocent's implementation, as that package is better maintained and works better with newer installs of BlueZ, which should fix some issues on Raspberry Pi Bookworm. Unfortunatly, none of the NoBle/BleNo descendents are immune to some specific BlueZ issues (see known issues).
- **Performance improvement of the TS estimator**, further reducing CPU load, which significantly improves accuracy of the measurements and metrics as the Linux kernel has an easier job keeping the time accurate.
- **Removed a lot of memory leaks**, although only being problematic in large simulations (i.e. over 3000K), we want to keep our code to behave nice
- **Improved robustness of the stroke detection algorithm**
- **Validation of the engine against a PM5 for over 3000KM**, where the deviation is a maximum of 0.1%

# Known issues in 0.9.5

- **Bluetooth Heartrate can't be switched dynamically**: due to some underlying changes in the OS, BLE heartrate monitors can't be activated through the GUI without crashing the BLE metrics broadcast (see [the description of issue 69](#69)). As this is an issue in the OS, and all previous versions of OpenRowingMonitor are also affected by this issue. Version 0.9.5 has a workaround implemented that mitigates this at startup. So configuring the use of a BLE heartrate monitor in the config file should work.
@JaapvanEkris JaapvanEkris reopened this Feb 8, 2025
@JaapvanEkris
Copy link
Owner Author

JaapvanEkris commented Feb 8, 2025

The workaround is now in the main branch. I've tested it over 250K in a live environment, both with FTMS and PM5 Mode, and it works. I wasn't able to test it with my HRM belt, as itself has an unstable BLE signal.

As stoprocent hasn't responded in over a week, I'm assuming no fix will come from his branch of NoBle/Bleno. So for the long term, I guess we have to move to another package?

@86bjorn
Copy link

86bjorn commented Feb 9, 2025

The reboot problem seem to be fixed. nice job!

@Abasz
Copy link
Collaborator

Abasz commented Feb 9, 2025

The workaround is now in the main branch. I've tested it over 250K in a live environment, both with FTMS and PM5 Mode, and it works. I wasn't able to test it with my HRM belt, as itself has an unstable BLE signal.

As stoprocent hasn't responded in over a week, I'm assuming no fix will come from his branch of NoBle/Bleno. So for the long term, I guess we have to move to another package?

I will create a new branch for the new ble. I will start with the full ftms rowing implementation. To set up a example and all that and we will see how it goes. I would say end of Feb fir this. Once that works we can start with the other ble profiles (ftms is probably the most used anyway)

@JaapvanEkris
Copy link
Owner Author

I will create a new branch for the new ble. I will start with the full ftms rowing implementation. To set up a example and all that and we will see how it goes. I would say end of Feb fir this. Once that works we can start with the other ble profiles (ftms is probably the most used anyway)

Sounds like a great plan to me.

@Abasz
Copy link
Collaborator

Abasz commented Feb 19, 2025

The webbluetooth library seems to be having an update that would resolve the issue I had in the past. So I will do another attempt with using this in combination with bleno. This would be much easier as no re implementing the whole peripheral part would be required. This does not support peripheral so some reliance would remain, but would provide a quick fix for now.

thegecko/webbluetooth#289 (comment)

I mean, we may still move to the other BLE library (that handles central and peripheral at the same time as one library) as a long term plan

@JaapvanEkris
Copy link
Owner Author

The webbluetooth library seems to be having an update that would resolve the issue I had in the past. So I will do another attempt with using this in combination with bleno. This would be much easier as no re implementing the whole peripheral part would be required. This does not support peripheral so some reliance would remain, but would provide a quick fix for now.

We could, but I have the impression that the current workaround is sufficient for us to buy some decent time to get a good fix in place. I already made a 0.9.6 branch, and I plan to release some features pretty soon in that branch. But I am not in a rush here to release it, as the 0.9.5 versions seems to be working well.

If webbluetooth would provide a good solution, that would be fine. But if there are better (more widely used) packages, I rather take the time to do it right in one go.

@Abasz
Copy link
Collaborator

Abasz commented Feb 24, 2025

I agree with you in terms of a more robust solution. I want to push the webbluetooth trough (even though it may not be the final solution) as testing to see what works and what does not. If it fails than the choice seems to be simple :) if it works we have a choice :)

If it works this is very easy to implement, it has a very good API. Its a shame it only supports central role :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
5 participants