-
Notifications
You must be signed in to change notification settings - Fork 636
Description
OpenSIPS version you are running
version: opensips 3.6.3 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, HP_MALLOC, F_PARALLEL_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: d5222226a
main.c compiled on with gcc 12
Describe the bug
We recently updated our OpenSIPS Servers from 2.2 to 3.6 and without altering the logic of the OpenSIPS config, the Routing behavior of ACK Packets after Re-INVITEs seems to have changed if the clients are behind NAT and do not use a STUN Server.
First off, here is the general setup and an example Flow with the SIP of the 4 ACK Packets:
We have 3 OpenSIPS instances between the Provider and the End-User:
Provider <--> OpenSIPS "Carrier" <--> OpenSIPS "Core" <--> OpenSIPS "Consumer" <--> End-User
Every instance has its jobs, but in this issue we are only looking at the communication between the "Core", "Consumer" and the End-User.
Here is the Flow of an example Call having the Problem:
First ACK from Core to Consumer:
Src: CoreIP:5060, Dst: ConsumerIP:5060
Request-Line: ACK sip:CalleeNum@ConsumerIP;rdlg=f41.4b0b0d81 SIP/2.0
Via: SIP/2.0/UDP CoreIP:5060;branch=z9hG4bK4788.46908034.2
Call-ID: Core2_ExwGCwAcIRksagAuN1VgFj0oM3UBLTQzRhI2HlEDBzk0DWMGLTYGdQo9IDNFBTYmK3QbBwoNYwUfFCdqdiM-
From: "+39CallerNum" <sip:0039CallerNum@domain.it>;tag=cf479e78.Acc134.B2b1271.DbjgqPYNnasBcMet5LT7sF086
To: <sip:CalleeNum@domain.it>;tag=31aebc9bd42dbb07
CSeq: 1 ACK
Max-Forwards: 30
Content-Length: 0
P-hint: rr-enforced
P-Correlation-ID: f1cf3d607f02a5b2-Acc134-B2b1271
ITL-RR: AcCaPl0134.AcCpNo0206.AcCiSt01271.DbjgqPYNnasBcMet5LT7sF086.DiAgOuT00.Outgoing.ModB2BFirst ACK being proxied to UAC
Src: ConsumerIP:5060, Dst: 195.254.X.Y:32820
Request-Line: ACK sip:CalleeNum@195.254.X.Y:32820;transport=udp SIP/2.0
Via: SIP/2.0/UDP ConsumerIP:5060;branch=z9hG4bK681a.570b28d.2
Call-ID: Core2_ExwGCwAcIRksagAuN1VgFj0oM3UBLTQzRhI2HlEDBzk0DWMGLTYGdQo9IDNFBTYmK3QbBwoNYwUfFCdqdiM-
From: "+39CallerNum" <sip:0039CallerNum@domain.it>;tag=cf479e78.Acc134.B2b1271.DbjgqPYNnasBcMet5LT7sF086
To: <sip:CalleeNum@domain.it>;tag=31aebc9bd42dbb07
CSeq: 1 ACK
Max-Forwards: 30
Content-Length: 0
P-hint: rr-enforced
ITL-RR: AcCaPl0134.AcCpNo0206.AcCiSt01271.DbjgqPYNnasBcMet5LT7sF086.DiAgOuT00.Outgoing.ModB2BSecond ACK from Core to Consumer after Re-INVITE
Src: CoreIP:5060, Dst: ConsumerIP:5060
Request-Line: ACK sip:CalleeNum@ConsumerIP;rdlg=f41.4b0b0d81 SIP/2.0
Via: SIP/2.0/UDP CoreIP:5060;branch=z9hG4bK1788.4e62f6e1.2
Call-ID: Core2_ExwGCwAcIRksagAuN1VgFj0oM3UBLTQzRhI2HlEDBzk0DWMGLTYGdQo9IDNFBTYmK3QbBwoNYwUfFCdqdiM-
From: "+39CallerNum" <sip:0039CallerNum@domain.it>;tag=cf479e78.Acc134.B2b1271.DbjgqPYNnasBcMet5LT7sF086
To: <sip:CalleeNum@domain.it>;tag=31aebc9bd42dbb07
CSeq: 2 ACK
Max-Forwards: 30
Content-Length: 0
P-hint: rr-enforced
P-Correlation-ID: f1cf3d607f02a5b2-Acc134-B2b1271
ITL-RR: AcCaPl0134.AcCpNo0206.AcCiSt01271.DbjgqPYNnasBcMet5LT7sF086.DiAgOuT00.Outgoing.ModB2BSecond ACK being proxied to UAC (to private IP instead of the public one)
Src: Consumer:5060, Dst: 192.168.178.2:5060
Request-Line: ACK sip:CalleeNum@192.168.178.2:5060;transport=udp SIP/2.0
Via: SIP/2.0/UDP ConsumerIP:5060;branch=z9hG4bK381a.85652b24.2
Call-ID: Core2_ExwGCwAcIRksagAuN1VgFj0oM3UBLTQzRhI2HlEDBzk0DWMGLTYGdQo9IDNFBTYmK3QbBwoNYwUfFCdqdiM-
From: "+39CallerNum" <sip:0039CallerNum@domain.it>;tag=cf479e78.Acc134.B2b1271.DbjgqPYNnasBcMet5LT7sF086
To: <sip:CalleeNum@domain.it>;tag=31aebc9bd42dbb07
CSeq: 2 ACK
Max-Forwards: 30
Content-Length: 0
P-hint: rr-enforced
ITL-RR: AcCaPl0134.AcCpNo0206.AcCiSt01271.DbjgqPYNnasBcMet5LT7sF086.DiAgOuT00.Outgoing.ModB2BWe can see, that in all Contact-Headers in the Responses from UAC (195.254.X.Y) the Contact-IP was 192.168.178.2. So this in theory should be where new Requests (like the ACK) should be sent. Obviously in this case the UAC does not know his public IP and sets 192.168.178.2 (which we cannot reach), so upon arrival of the first ACK, fix_nated_contact fixes the Address & Port of the RURI and sets it to the public one, where the packet actually came from.
But why won't OpenSIPS do the same for the second ACK? Everything should be the same, the two ACK packages the "Consumer" receives from the "Core" are basically identical.
Here is my assumption on where the problem lies (on the "Consumer"):
It seems that the matching behavior of the nat_uac_test function has changed or is bugged. Or maybe our config is wrong.
In OpenSIPS 2.2 in the following part of the config, nat_uac_test would match and thus call fix_nated_contact(); on every incoming Subsequent ACK Request. With this version the End-User never experienced any problems.
// Consumer OpenSIPS 2.2:
route{
...
if (has_totag()) {
if (topology_hiding_match()) {
if (nat_uac_test("127")) {
xlog("L_INFO", "Topology hidden, Contact fixed - LF_BASE");
fix_nated_contact();
}
...In OpenSIPS 3.6 nat_uac_test no longer uses the Bit mask, but giving the same coma-separated parameters as "127" before, the function (on the second ACK Request) does not match and thus won't fix the RURI leading to ACK Routing Problems.
Looking at the snippet below, since "Topology hidden, Got ACK - LF_BASE" is printed, we know that has_totag & topology_hiding_match match, but "Topology hidden, Contact fixed - LF_BASE" is not printed so we know fix_nated_contact is not executed (and we see the second ACK being sent to the wrong IP).
// Consumer OpenSIPS 3.6:
route{
...
if (has_totag()){
if (topology_hiding_match()){
if (nat_uac_test("private-contact, diff-ip-src-via, private-via, private-sdp, diff-port-src-via, diff-ip-src-contact, diff-port-src-contact")){
xlog("L_INFO", "Topology hidden, Contact fixed - LF_BASE");
fix_nated_contact();
}
if (is_method("ACK")){
xlog("L_INFO", "Topology hidden, Got ACK - LF_BASE");
}
...This in our Setup leads to ACK Routing Problems with some clients that are behind NAT and do not use a STUN Server and causes dropped calls after a certain amount of time.
Is nat_uac_test really bugged, or are we handling something wrong?
To Reproduce
- Configure OpenSIPS 3.6 similarly like below
- Register a User to OpenSIPS from behind a NAT, where it sends its private IP in Contacts
- Start a call to the User
- Trigger a Re-INVITE from the Caller side
- Check the Routing of the ACK packet
Expected behavior
The arrival of the second ACK Packet should match the same nat_uac_test and trigger the fix_nated_contact function to fix the IP & Port in the RURI of the proxied ACK packet.
Relevant System Logs
Unfortunately I did not save the Logs from this example. But here are the relevant Config parts where fix_nated_contact is called in addition to the above snippets:
route{
...
if(loose_route()){
if (nat_uac_test("private-contact, diff-ip-src-contact, diff-port-src-contact")){
xlog("L_INFO", "Loose routed, Contact fixed - LF_BASE");
fix_nated_contact();
} else {
xlog("L_INFO", "Loose routed - LF_BASE");
}
...
if (is_method("INVITE") && ! has_totag()){
...
if (nat_uac_test("private-contact, diff-ip-src-via, private-via, private-sdp, diff-port-src-via, diff-ip-src-contact, diff-port-src-contact")){
xlog("L_INFO", "Invite, Fixing initial INVITE contact - LF_BASE");
fix_nated_contact();
} else {
xlog("L_INFO", "Starting dialog, no NAT - LF_BASE");
}
...
}
...
if (has_totag()){
if (topology_hiding_match()){
if (nat_uac_test("private-contact, diff-ip-src-via, private-via, private-sdp, diff-port-src-via, diff-ip-src-contact, diff-port-src-contact")){
xlog("L_INFO", "Topology hidden, Contact fixed - LF_BASE");
fix_nated_contact();
}
...
onreply_route[RR_STANDARD]{
if (! is_method("REGISTER") && nat_uac_test("private-contact, diff-ip-src-contact, diff-port-src-contact")){
xlog("L_INFO", "Reply, Contact fixed - LF_REPLY");
fix_nated_contact();
}else{
xlog("L_INFO", "Reply - LF_REPLY");
}
}OS/environment information
- Operating System: Debian 12
- OpenSIPS installation: apt Packages
- other relevant information: We are running this Debian 12 as a VM on a VMWare Hypervisor
Additional context