Skip to content
Alexander Clouter edited this page Jun 12, 2024 · 23 revisions

Issues and Fixes 2

Because RFC5080 was not enough.

This Wiki is an "ad hoc" place-holder for issues that people find with RADIUS implementations.

Please request access via the IETF RADEXT WG mailing list

State Machine Issues

  • The NAS repeatedly sends accounting start packets when the RADIUS server isn't available. They fail the Start after retransmission interval * retransmission count. Every 15 minutes they try and resend the start packet instead of switching to interims.

  • Multiple implementations set the Event-Timestamp to be the time that the packet was sent, instead of the time the event happened. This makes no sense. This means that (for example) if the NAS is retransmitting accounting start packets for 30 minutes, then the start of the session is when the packet is finally received, and not when the session started.

  • The Event-Timestamp should be when the event happened. The Acct-Delay-Time is the time between the event and the packet being sent.

Once we have Event-Timestamp, the Acct-Delay-Time becomes significantly less useful. See the accounting page for details.

  • Acct-Delay-Time is a garbage value, it starts off at zero, and then seems to cycle through some fixed numbers, going up to millions of seconds. If you can't put a sane value into the field, then don't send it.

  • Acct-Input-Octets/Acct-Output-Octets have garbage values. Doing some math, the average packet size is 2.7MB

  • When sending a Disconnect-Request, the re-authentication occurs several minutes after the packet has been received. The reauthenticationg is driven by DHCP traffic, so this is perhaps not terrible. What is not expected is that the accounting start packet is then delayed by several minutes after the Access-Accept.

  • The NAS uses a global timer to send Interim-Updates. Which means they send updates for every subscriber at the same time instead of staggering them, so it sends 10s of thousands of interims at the same time, and then nothing for 15 minutes. This is effectively a DoS attack. The timers should start when the session starts, and the packets should be sent with jitter as per RFC 5080.

  • A different User-Name value is used for Access-Requests and Accounting-Requests so it is not possible to tie them together. Reading 2866 etc. shows that this behavior isn't forbidden, likely because no one thought an implementation would be this bizarre.

  • The NAS-Identifier contains a copy of the NAS-Port-Id. Which means that there is no real NAS-Identifier sent in Access-Requests.

  • NAS-Port-ID values in Access-Requests and Accounting-Requests are different.

  • User-Name in CoA truncated at 43 chars.

  • Sending Class or User-Name in Access-Accepts results in authentication failing.

  • As a result, any Class/User-Name not echoed in Accounting-Requests

  • RADIUS dynauth server that returned NAK when Message-Authenticator was in a dynauth request. The attribute is not required, but these kinds of small variations make it hard to create dynauth messages in multivendor systems.

  • Incorrect use of zero-length (empty) attributes. For example, clients that support RFC 7268 EAP-Key-Name and RFC 4372 Chargeable-User-Identity must use a single all-zero bits octet to request action from the server. Some clients use a zero-length value instead. This is forbidden by the respective RFCs and RFC 2865 string type definition.

  • when should the NAS send the accounting start? 2866 doesn't define any way to know when a session starts. It should probably be as soon as the Access-Accept is received.

Reject delays

Rejects should be delayed, so that badly behaving clients can't hammer servers:

  • supplicant is configured with a "perfectly valid" client certificate and proper config

  • CA revokes client certificate

  • supplicant has a known-good config, attempts to authenticate, server checks CRL, rejects

  • supplicant has a known-good config, attempts to authenticate, server checks CRL, rejects

  • supplicant has a known-good config, attempts to authenticate, server checks CRL, rejects

  • supplicant has a known-good config, attempts to authenticate, server checks CRL, rejects

  • ...

So here it's not about retransmits at all. It's one end saying "I know I'm right" with the other saying "I know you're wrong" and then they keep yelling that at line speed.

'Dead' Servers

Just because a single request/response times out it should not result in the upstream to be marked completely dead:

"be aware that the marks the RADIUS server as dead for the Dead Time duration if a user does not respond to an MFA challenge."

Status-Server (RFC5997, section 4.3) is well suited to detect upstream 'aliveness' otherwise a method that focuses on if all requests timeout during a time frame that the upstream should be marked dead.

Session probe packets

CoA variant for checking or retrieving session information

Diameter allows the state of an active session to be retrieved from the NAS. It would be useful to have similar capability for RADIUS.

This could be a simple CoA-Request, with a special Service-Type (similar to Authenticate-Only), where the session doesn't terminate, and no new session attributes are applied, but in the CoA-Ack (assuming the session exists), attributes describing the session are returned.

There could be different levels of implementation. At a basic level a CoA-ACK comes back with no attributes, and the full implementation would contain session attributes.

Non-Standardised workarounds for this are including all session attributes in Accounting-Requests, but, when used for timely reconciliation, this means interim-updates need to be sent at a greater frequency than would otherwise be required.

Use cases

  • Session and BSS/OSS reconciliation, i.e. make sure the subscriber is receiving the service they should.
  • Stale session detection. Where only a single session per set of credentials is allowed, this allows accounting sessions to be forcefully closed in the accounting database if the CoA-Request is sent to active sessions and receives a NAK. Disconnect-Request could be used for this... but we don't always want to disconnect the existing session, especially as this can lead to rapid session flip/flop in the case of a genuine incidence of credential sharing.

Bulk Accounting-Request transfer

Similar to DHCPv4 lease transfer.

We could either piggyback on top of CoA, define a new packet type, or simply send an Accounting-On/Off to the NAS. On receipt of one of these trigger packets, the NAS would send an interim-update for all active sessions.

We could potentially define filtering attributes

Use cases

  • Migrating RADIUS servers or databases
  • Conveying network equipment between ISPs
  • Recovering rapidly after service disruption where interims aren't used

Protocol Design

There are some "inventive" uses of RADIUS. RADIUS as a transport for device authentication. Earlier messages in that thread describe the design in a little more detail.

As best can be understood reading between the lines:

  • each device has network access, and is configured:
    • with a static password (same for all devices) link
    • to do EAP-TTLS (or PEAP with MS-CHAPv2)
    • with ??? for the RADIUS shared secret
    • to do RADIUS to ??? for "device authentication"

The claim is that RADIUS is nothing more than "an extremely awkward transport mechanism for EAP-xTLS, with user = "anonymous" and password = some widely-known dummy value at the RADIUS level so there's no security there to begin with," and that RADIUS is being used "purely as a transport mechanism for something else."

Such use-cases are firmly outside of the scope of RADIUS. For one, what does it mean when an unauthenticated device does RADIUS, and gets an Access-Accept? Is there some other network element which snoops that traffic, and then gives the device larger permissions / access based on the Access-Accept?

It's difficult to understand what the actual design is, because the description is vague and confused. Reading between the lines, it appears (perhaps as a guess) that RADIUS is being used as a home-grown alternative to PANA. Perhaps because PANA was never really deployed. For example, there is an OpenPANA on github, but there have been no code changes in 12+ years.

In any case, systems MUST NOT use RADIUS with fixed user credentials that are shared across multiple machines. Systems MUST NOT use RADIUS as a transport mechanism for other protocols. Random systems on the network MUST NOT be configured to be RADIUS clients.

Offsite Links

https://radiatorsoftware.com/wp-content/uploads/rs-wifi-roaming-security-and-privacy-2023-05-23.pdf

https://datatracker.ietf.org/meeting/118/materials/slides-118-madinas-hackathon-openroaming-update-00

Clone this wiki locally