Skip to content

[BUG] NAT ACK-Routing after Re-INVITEs - Inconsistent nat_uac_test behavior #3826

@MrM0bi

Description

@MrM0bi

OpenSIPS version you are running

version: opensips 3.6.3 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, HP_MALLOC, F_PARALLEL_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: d5222226a
main.c compiled on  with gcc 12

Describe the bug
We recently updated our OpenSIPS Servers from 2.2 to 3.6 and without altering the logic of the OpenSIPS config, the Routing behavior of ACK Packets after Re-INVITEs seems to have changed if the clients are behind NAT and do not use a STUN Server.

First off, here is the general setup and an example Flow with the SIP of the 4 ACK Packets:

We have 3 OpenSIPS instances between the Provider and the End-User:

Provider <--> OpenSIPS "Carrier" <--> OpenSIPS "Core" <--> OpenSIPS "Consumer" <--> End-User

Every instance has its jobs, but in this issue we are only looking at the communication between the "Core", "Consumer" and the End-User.

Here is the Flow of an example Call having the Problem:

Image

First ACK from Core to Consumer:

Src: CoreIP:5060, Dst: ConsumerIP:5060
Request-Line: ACK sip:CalleeNum@ConsumerIP;rdlg=f41.4b0b0d81 SIP/2.0
Via: SIP/2.0/UDP CoreIP:5060;branch=z9hG4bK4788.46908034.2
Call-ID: Core2_ExwGCwAcIRksagAuN1VgFj0oM3UBLTQzRhI2HlEDBzk0DWMGLTYGdQo9IDNFBTYmK3QbBwoNYwUfFCdqdiM-
From: "+39CallerNum" <sip:0039CallerNum@domain.it>;tag=cf479e78.Acc134.B2b1271.DbjgqPYNnasBcMet5LT7sF086
To: <sip:CalleeNum@domain.it>;tag=31aebc9bd42dbb07
CSeq: 1 ACK
Max-Forwards: 30
Content-Length: 0
P-hint: rr-enforced
P-Correlation-ID: f1cf3d607f02a5b2-Acc134-B2b1271
ITL-RR: AcCaPl0134.AcCpNo0206.AcCiSt01271.DbjgqPYNnasBcMet5LT7sF086.DiAgOuT00.Outgoing.ModB2B

First ACK being proxied to UAC

Src: ConsumerIP:5060, Dst: 195.254.X.Y:32820
Request-Line: ACK sip:CalleeNum@195.254.X.Y:32820;transport=udp SIP/2.0
Via: SIP/2.0/UDP ConsumerIP:5060;branch=z9hG4bK681a.570b28d.2
Call-ID: Core2_ExwGCwAcIRksagAuN1VgFj0oM3UBLTQzRhI2HlEDBzk0DWMGLTYGdQo9IDNFBTYmK3QbBwoNYwUfFCdqdiM-
From: "+39CallerNum" <sip:0039CallerNum@domain.it>;tag=cf479e78.Acc134.B2b1271.DbjgqPYNnasBcMet5LT7sF086
To: <sip:CalleeNum@domain.it>;tag=31aebc9bd42dbb07
CSeq: 1 ACK
Max-Forwards: 30
Content-Length: 0
P-hint: rr-enforced
ITL-RR: AcCaPl0134.AcCpNo0206.AcCiSt01271.DbjgqPYNnasBcMet5LT7sF086.DiAgOuT00.Outgoing.ModB2B

Second ACK from Core to Consumer after Re-INVITE

Src: CoreIP:5060, Dst: ConsumerIP:5060
Request-Line: ACK sip:CalleeNum@ConsumerIP;rdlg=f41.4b0b0d81 SIP/2.0
Via: SIP/2.0/UDP CoreIP:5060;branch=z9hG4bK1788.4e62f6e1.2
Call-ID: Core2_ExwGCwAcIRksagAuN1VgFj0oM3UBLTQzRhI2HlEDBzk0DWMGLTYGdQo9IDNFBTYmK3QbBwoNYwUfFCdqdiM-
From: "+39CallerNum" <sip:0039CallerNum@domain.it>;tag=cf479e78.Acc134.B2b1271.DbjgqPYNnasBcMet5LT7sF086
To: <sip:CalleeNum@domain.it>;tag=31aebc9bd42dbb07
CSeq: 2 ACK
Max-Forwards: 30
Content-Length: 0
P-hint: rr-enforced
P-Correlation-ID: f1cf3d607f02a5b2-Acc134-B2b1271
ITL-RR: AcCaPl0134.AcCpNo0206.AcCiSt01271.DbjgqPYNnasBcMet5LT7sF086.DiAgOuT00.Outgoing.ModB2B

Second ACK being proxied to UAC (to private IP instead of the public one)

Src: Consumer:5060, Dst: 192.168.178.2:5060
Request-Line: ACK sip:CalleeNum@192.168.178.2:5060;transport=udp SIP/2.0
Via: SIP/2.0/UDP ConsumerIP:5060;branch=z9hG4bK381a.85652b24.2
Call-ID: Core2_ExwGCwAcIRksagAuN1VgFj0oM3UBLTQzRhI2HlEDBzk0DWMGLTYGdQo9IDNFBTYmK3QbBwoNYwUfFCdqdiM-
From: "+39CallerNum" <sip:0039CallerNum@domain.it>;tag=cf479e78.Acc134.B2b1271.DbjgqPYNnasBcMet5LT7sF086
To: <sip:CalleeNum@domain.it>;tag=31aebc9bd42dbb07
CSeq: 2 ACK
Max-Forwards: 30
Content-Length: 0
P-hint: rr-enforced
ITL-RR: AcCaPl0134.AcCpNo0206.AcCiSt01271.DbjgqPYNnasBcMet5LT7sF086.DiAgOuT00.Outgoing.ModB2B

We can see, that in all Contact-Headers in the Responses from UAC (195.254.X.Y) the Contact-IP was 192.168.178.2. So this in theory should be where new Requests (like the ACK) should be sent. Obviously in this case the UAC does not know his public IP and sets 192.168.178.2 (which we cannot reach), so upon arrival of the first ACK, fix_nated_contact fixes the Address & Port of the RURI and sets it to the public one, where the packet actually came from.
But why won't OpenSIPS do the same for the second ACK? Everything should be the same, the two ACK packages the "Consumer" receives from the "Core" are basically identical.

Here is my assumption on where the problem lies (on the "Consumer"):

It seems that the matching behavior of the nat_uac_test function has changed or is bugged. Or maybe our config is wrong.
In OpenSIPS 2.2 in the following part of the config, nat_uac_test would match and thus call fix_nated_contact(); on every incoming Subsequent ACK Request. With this version the End-User never experienced any problems.

// Consumer OpenSIPS 2.2:
route{
    ...
    if (has_totag()) {
        if (topology_hiding_match()) {
            if (nat_uac_test("127")) {
                xlog("L_INFO", "Topology hidden, Contact fixed - LF_BASE");
                fix_nated_contact();
            }
        ...

In OpenSIPS 3.6 nat_uac_test no longer uses the Bit mask, but giving the same coma-separated parameters as "127" before, the function (on the second ACK Request) does not match and thus won't fix the RURI leading to ACK Routing Problems.
Looking at the snippet below, since "Topology hidden, Got ACK - LF_BASE" is printed, we know that has_totag & topology_hiding_match match, but "Topology hidden, Contact fixed - LF_BASE" is not printed so we know fix_nated_contact is not executed (and we see the second ACK being sent to the wrong IP).

// Consumer OpenSIPS 3.6:
route{
    ...
    if (has_totag()){
        if (topology_hiding_match()){
            if (nat_uac_test("private-contact, diff-ip-src-via, private-via, private-sdp, diff-port-src-via, diff-ip-src-contact, diff-port-src-contact")){
                xlog("L_INFO", "Topology hidden, Contact fixed - LF_BASE");
                fix_nated_contact();
            }

            if (is_method("ACK")){
                xlog("L_INFO", "Topology hidden, Got ACK - LF_BASE");
            }
        ...

This in our Setup leads to ACK Routing Problems with some clients that are behind NAT and do not use a STUN Server and causes dropped calls after a certain amount of time.

Is nat_uac_test really bugged, or are we handling something wrong?

To Reproduce

  1. Configure OpenSIPS 3.6 similarly like below
  2. Register a User to OpenSIPS from behind a NAT, where it sends its private IP in Contacts
  3. Start a call to the User
  4. Trigger a Re-INVITE from the Caller side
  5. Check the Routing of the ACK packet

Expected behavior
The arrival of the second ACK Packet should match the same nat_uac_test and trigger the fix_nated_contact function to fix the IP & Port in the RURI of the proxied ACK packet.

Relevant System Logs
Unfortunately I did not save the Logs from this example. But here are the relevant Config parts where fix_nated_contact is called in addition to the above snippets:

route{
    ...

    if(loose_route()){
        if (nat_uac_test("private-contact, diff-ip-src-contact, diff-port-src-contact")){
            xlog("L_INFO", "Loose routed, Contact fixed - LF_BASE");
            fix_nated_contact();
        } else {
            xlog("L_INFO", "Loose routed - LF_BASE");
        }

    ...

    if (is_method("INVITE") && ! has_totag()){
        ...
        if (nat_uac_test("private-contact, diff-ip-src-via, private-via, private-sdp, diff-port-src-via, diff-ip-src-contact, diff-port-src-contact")){
            xlog("L_INFO", "Invite, Fixing initial INVITE contact - LF_BASE");
            fix_nated_contact();
        } else {
            xlog("L_INFO", "Starting dialog, no NAT - LF_BASE");
        }
        ...
    }

    ...

    if (has_totag()){
        if (topology_hiding_match()){
            if (nat_uac_test("private-contact, diff-ip-src-via, private-via, private-sdp, diff-port-src-via, diff-ip-src-contact, diff-port-src-contact")){
                xlog("L_INFO", "Topology hidden, Contact fixed - LF_BASE");
                fix_nated_contact();
            }

    ...


onreply_route[RR_STANDARD]{

    if (! is_method("REGISTER") && nat_uac_test("private-contact, diff-ip-src-contact, diff-port-src-contact")){
        xlog("L_INFO", "Reply, Contact fixed - LF_REPLY");
        fix_nated_contact();
    }else{
        xlog("L_INFO", "Reply - LF_REPLY");
    }

}

OS/environment information

  • Operating System: Debian 12
  • OpenSIPS installation: apt Packages
  • other relevant information: We are running this Debian 12 as a VM on a VMWare Hypervisor

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions