Skip to content

Regression with MQTT clients - Cross User QoS 1 delivery #7694

@persinac

Description

@persinac

Observed behavior

Problem Description

When two different MQTT users (mapped via mTLS certificate CN) attempt pub/sub communication:

  1. Publisher sends QoS 1 message and receives PUBACK (message stored in JetStream)
  2. Subscriber's JetStream consumer receives the message (consumer sequence increments)
  3. However the subscriber's MQTT client never receives the message
  4. Messages accumulate as outstanding acks and get redelivered indefinitely

The internal $MQTT.sub.* delivery subject binding appears broken for cross-user scenarios.

Initial though: a Wildcard Translation Bug

Fun fact - it's not a wildcard translation issue.

Initial investigation suggested MQTT # wildcards weren't being translated to NATS .>. This was incorrect. Further testing revealed:

  1. Consumer filters ARE correct - Both .> and exact-match consumers are created
  2. Messages ARE delivered to consumers - Consumer sequence increments
  3. But acknowledgments never come - Ack floor stays at 0
  4. The bug is in delivery subject binding - Messages go to $MQTT.sub.xxx but the client isn't receiving on that subject

Diagnostics

Consumer State Shows Delivery Without Acknowledgment

$ nats consumer info '$MQTT_msgs' '111SUQEl_UO4iyguny1Jz4clB1xwhjy'

Configuration:
    Delivery Subject: $MQTT.sub.UO4iyguny1Jz4clB1xwhgO
      Filter Subject: $MQTT.msgs.bridge.alex-garage-1.down.>   # <-- Filter IS correct!

State:
   Last Delivered Message: Consumer sequence: 9 Stream sequence: 289769
     Acknowledgment floor: Consumer sequence: 0 Stream sequence: 0   # <-- NOTHING acknowledged!
         Outstanding Acks: 3 out of maximum 1024                      # <-- Messages stuck
     Redelivered Messages: 3                                          # <-- Being redelivered
     Unprocessed Messages: 0

This proves:

  1. The consumer filter is correct (includes .>)
  2. Messages ARE being delivered (consumer sequence = 9)
  3. But the client NEVER acknowledges (ack floor = 0)
  4. Messages accumulate and get redelivered indefinitely

Consumer Lifecycle is Correct

Testing confirmed that consumers are properly cleaned up on disconnect and recreated on connect:

# Before reconnect:
0EuYuueO_5epcT7g45uz49HkuBdAjBh  (Delivery: $MQTT.sub.5epcT7g45uz49HkuBdAj8M)

# After reconnect - old consumer deleted, new one created:
0EuYuueO_UO4iyguny1Jz4clB1xwds4  (Delivery: $MQTT.sub.UO4iyguny1Jz4clB1xwdoU)

MQTT Wildcard Translation is Correct

NATS correctly creates two consumers for # wildcard subscriptions:

Consumer Filter Subject Purpose
...hjy $MQTT.msgs.bridge.alex-garage-1.down.> Matches subtopics
...hyI $MQTT.msgs.bridge.alex-garage-1.down Matches exact topic

This is correct behavior since MQTT # matches zero or more levels, but NATS .> matches one or more.

NATS Debug Logs

With -DV flags enabled:

# Device subscribes
[TRC] "[email protected]" - <<- [SUBSCRIBE [bridge/alex-garage-1/down/# QoS=1] pi=53284]
[TRC] "[email protected]" - ->> [SUBACK pi=53284]

# Service publishes
[TRC] "[email protected]" - <<- [PUBLISH bridge/alex-garage-1/down/keys/response QoS=1 size=152 pi=4]
[TRC] "[email protected]" - ->> [PUBACK pi=4]

# NOTE: No message forwarded to device!

Theory

I'm not the most privy to NATs jetstream user/subject bindings, however, something smells with how NATS 2.12 binds the JetStream consumer's delivery subject to the MQTT client's session in cross-user scenarios.

  1. Device subscribes to bridge/alex-garage-1/down/#
  2. NATS creates JetStream consumer with delivery subject $MQTT.sub.xxx
  3. In same-user scenarios, the internal subscription on $MQTT.sub.xxx is properly connected
  4. In cross-user scenarios, the binding is broken - messages are delivered to $MQTT.sub.xxx but the MQTT session isn't receiving from that subject
  5. Messages accumulate as outstanding acks since the client can't acknowledge what it never received

Additional Observations

  1. Cross-user delivery fails: Messages between different MQTT users (mapped via mTLS certs) are not delivered
  2. Same-user delivery works: When publisher and subscriber use the same certificate, delivery works perfectly
  3. Messages are stored: The $MQTT_msgs stream receives and stores the messages correctly
  4. Consumer filters are correct: The .> wildcard is properly added to filter subjects
  5. Consumer lifecycle is correct: Consumers are properly created/deleted on connect/disconnect
  6. Delivery attempts happen: Consumer sequence increments, showing NATS tries to deliver
  7. Acknowledgments never come: Ack floor stays at 0, messages redelivered indefinitely

Workaround

None known. Downgrading to NATS 2.11.11 restores correct behavior

Expected behavior

Test Scenario

Subscriber (Device):

  • Connects via MQTT with mTLS (certificate CN: [email protected])
  • Subscribes to: bridge/alex-garage-1/down/# with QoS 1

Publisher (Service):

  • Connects via MQTT with mTLS (certificate CN: [email protected])
  • Publishes to: bridge/alex-garage-1/down/keys/response with QoS 1

Result:

  • Publisher receives PUBACK (message stored in JetStream)
  • Subscriber receives the message
  • Consumer shows messages delivered and acknowledged

Server and client version

Environment

  • NATS Server Version: 2.12.3-alpine
  • Previous Working Version: 2.11.x (issue appeared after upgrade)
  • Protocol: MQTT over TLS (port 8883)
  • Authentication: mTLS with verify_and_map: true
  • JetStream: Enabled
  • Accounts: Using multi-account setup (SYS + APP accounts)

Host environment

Running in a docker container on an ec2 instance running Amazon Linux 2023

Steps to reproduce

Reproduction Steps

1. NATS Configuration

# nats.conf
server_name: nats-mqtt

port: 4222
http: 8222

jetstream: {
  store_dir: "/data/jetstream"
}

include "/includes/users.inc"

mqtt {
  port: 1883
}

mqtt {
  host: 0.0.0.0
  port: 8883
  tls {
    cert_file: "/etc/nats/certs/server-cert.pem"
    key_file:  "/etc/nats/certs/server-key.pem"
    ca_file:   "/etc/nats/certs/root-ca.pem"
    verify_and_map: true
  }
}

2. Users Configuration (users.inc)

accounts {
  SYS {
    users = [
      { user: "admin", password: "...", permissions: { publish: [">"], subscribe: [">"] } }
    ]
  }

  APP {
    jetstream { max_file: 25Gb }
    users = [
      { user: "[email protected]", permissions: {"publish": [">"], "subscribe": [">", "$MQTT.sub.>"]}, allowed_connection_types: ["MQTT"] },
      { user: "[email protected]", permissions: {"publish": [">"], "subscribe": [">", "$MQTT.sub.>"]}, allowed_connection_types: ["MQTT"] }
    ]
  }
}

system_account: SYS

3. Test Scenario

Subscriber (Device):

  • Connects via MQTT with mTLS (certificate CN: [email protected])
  • Subscribes to: bridge/alex-garage-1/down/# with QoS 1

Publisher (Service):

  • Connects via MQTT with mTLS (certificate CN: [email protected])
  • Publishes to: bridge/alex-garage-1/down/keys/response with QoS 1

Result:

  • Publisher receives PUBACK (message stored in JetStream)
  • Subscriber NEVER receives the message
  • Consumer shows messages delivered but never acknowledged

4. Same-User Test (Works)

When both publisher and subscriber use the same certificate/user, message delivery works correctly. This rules out permission issues and confirms the bug is specific to cross-user scenarios.

Metadata

Metadata

Assignees

Labels

defectSuspected defect such as a bug or regression

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions