Stupid Mistakes – New UPF and IMS

Our team recently shipped a new UPF which is a huge improvement on our old UPF, and I drew the short straw of doing all the interop testing for the IMS.

Initially I thought there was an issue with IP routing, as I’d never see the SIP register from the UE, but I would see the IMS APN coming up.

I could access the internet from the UE IPs just fine, but that’s going to public IPs, whereas the P-CSCF is in private address space, and hosted on the same box as the UPF.

I spent hours on this as my lab servers do routing on a stick, and I thought some hardware offload somewhere was trying to fast path my packets and send them back to the server without going via the router.

Then I dug a little deeper and found I could see the 3 way handhake between the UE an the P-CSCF, but no SIP packets.

Successful 3 way handshake between the UE and the P-CSCF on TCP 5060

This was confusing, clearly we had at least intermittent two way comms – the 3 way TCP handshake confirmed that, but then why were packets not getting across?

We have an XCAP server hosted on our P-CSCF instances, so I tried hitting that from the phone in case there was something weird about routing to the network segment that hosts the P-CSCF, but I could hit the XCAP server just fine, so now I was certain the UE IP pool could route to the P-CSCF and 3 way handshake for TCP was working and payload could be pushed.

Clearly we can route to the P-CSCF as that’s where this XCAP server is hosted

Then I dug into what happened after the 3 way handshake, and I found a TCP payload containing the start of the SIP REGISTER.

Hmm, we have a SIP Fragment here at least…

I traced it all the way through and lo, it’s hitting the P-CSCF:

And the fragment is recieved on the P-CSCF

Okay, but then what happens, because it’s only a fragment, not the complete re-assembled packet, so what’s going on?

Well, the P-CSCF sends a TCP ACK back to the UE.

And the TCP fragment containing the first part of the REGISTER gets an ACK back from the P-CSCF

The ACK gets forwarded to the UPF:

And that TCP ack makes it to the P-CSCF

And then… Nothing? The UPF never encaps the TCP ACK back into GTP-U and never sends it onto base station.

Eventually the UE re-sends the payload with the start of the REGISTER, but it does not get the ACK from the P-CSCF.

Retransmitted TCP segment containing the REGISTER from the UE

So naughty UPF right? Not forwarding that ACK for some reason?

I started digging, maybe the ACK was getting routed weirdly and landing on the UPF without going through the router?

Well not quite…

When I started digging into the QER rules being installed I noticed the MBR bitrate we had on the IMS APN in the HSS was tiny.

The UPF can only gate on traffic to the UE, so was gating the ACK traffic, as the QER had consumed all the bandwidth so the ACK never made it back.

Time wasted – About 4 hours, but I will not make this mistake again!

Leave a Reply

Your email address will not be published. Required fields are marked *