Skip to content

Feature: Add assertion to test XPath filters against an allow-list for axes and functions #33

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions src/Assert/Assert.php
Original file line number Diff line number Diff line change
Expand Up @@ -9,20 +9,23 @@
/**
* @package simplesamlphp/xml-common
*
* @method static void validAllowedXPathFilter(mixed $value, array $allowed_axes, array $allowed_functions, string $message = '', string $exception = '')
* @method static void validHexBinary(mixed $value, string $message = '', string $exception = '')
* @method static void validNMToken(mixed $value, string $message = '', string $exception = '')
* @method static void validNMTokens(mixed $value, string $message = '', string $exception = '')
* @method static void validDuration(mixed $value, string $message = '', string $exception = '')
* @method static void validDateTime(mixed $value, string $message = '', string $exception = '')
* @method static void validNCName(mixed $value, string $message = '', string $exception = '')
* @method static void validQName(mixed $value, string $message = '', string $exception = '')
* @method static void nullOrValidAllowedXPathFilter(mixed $value, array $allowed_axes, array $allowed_functions, string $message = '', string $exception = '')
* @method static void nullOrValidHexBinary(mixed $value, string $message = '', string $exception = '')
* @method static void nullOrValidNMToken(mixed $value, string $message = '', string $exception = '')
* @method static void nullOrValidNMTokens(mixed $value, string $message = '', string $exception = '')
* @method static void nullOrValidDuration(mixed $value, string $message = '', string $exception = '')
* @method static void nullOrValidDateTime(mixed $value, string $message = '', string $exception = '')
* @method static void nullOrValidNCName(mixed $value, string $message = '', string $exception = '')
* @method static void nullOrValidQName(mixed $value, string $message = '', string $exception = '')
* @method static void allValidAllowedXPathFilter(mixed $value, array $allowed_axes, array $allowed_functions, string $message = '', string $exception = '')
* @method static void allValidHexBinary(mixed $value, string $message = '', string $exception = '')
* @method static void allValidNMToken(mixed $value, string $message = '', string $exception = '')
* @method static void allValidNMTokens(mixed $value, string $message = '', string $exception = '')
Expand All @@ -38,4 +41,5 @@ class Assert extends BaseAssert
use HexBinTrait;
use NamesTrait;
use TokensTrait;
use XPathFilterTrait;
}
79 changes: 79 additions & 0 deletions src/Assert/XPathFilterTrait.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
<?php

declare(strict_types=1);

namespace SimpleSAML\XML\Assert;

use InvalidArgumentException;
use SimpleSAML\Assert\Assert as BaseAssert;
use SimpleSAML\XML\Constants as C;
use SimpleSAML\XML\Exception\RuntimeException;
use SimpleSAML\XML\Utils\XPathFilter;

use function sprintf;

/**
* @package simplesamlphp/xml-common
*/
trait XPathFilterTrait
{
/***********************************************************************************
* NOTE: Custom assertions may be added below this line. *
* They SHOULD be marked as `private` to ensure the call is forced *
* through __callStatic(). *
* Assertions marked `public` are called directly and will *
* not handle any custom exception passed to it. *
***********************************************************************************/

/**
* Check an XPath expression for allowed axes and functions
* The goal is preventing DoS attacks by limiting the complexity of the XPath expression by only allowing
* a select subset of functions and axes.
* The check uses a list of allowed functions and axes, and throws an exception when an unknown function
* or axis is found in the $xpathExpression.
*
* Limitations:
* - The implementation is based on regular expressions, and does not employ an XPath 1.0 parser. It may not
* evaluate all possible valid XPath expressions correctly and cause either false positives for valid
* expressions or false negatives for invalid expressions.
* - The check may still allow expressions that are not safe, I.e. expressions that consist of only
* functions and axes that are deemed "save", but that are still slow to evaluate. The time it takes to
* evaluate an XPath expression depends on the complexity of both the XPath expression and the XML document.
* This check, however, does not take the XML document into account, nor is it aware of the internals of the
* XPath processor that will evaluate the expression.
* - The check was written with the XPath 1.0 syntax in mind, but should work equally well for XPath 2.0 and 3.0.
*
* @param string $xpathExpression
* @param array<string> $allowedAxes
* @param array<string> $allowedFunctions
* @param string $message
*/
public static function validAllowedXPathFilter(
string $xpathExpression,
array $allowedAxes = C::DEFAULT_ALLOWED_AXES,
array $allowedFunctions = C::DEFAULT_ALLOWED_FUNCTIONS,
string $message = '',
): void {
BaseAssert::allString($allowedAxes);
BaseAssert::allString($allowedFunctions);
BaseAssert::maxLength(
$xpathExpression,
C::XPATH_FILTER_MAX_LENGTH,
sprintf('XPath Filter exceeds the limit of 100 characters.'),
);

try {
// First remove the contents of any string literals in the $xpath to prevent false positives
$xpathWithoutStringLiterals = XPathFilter::removeStringContents($xpathExpression);

// Then check that the xpath expression only contains allowed functions and axes, throws when it doesn't
XPathFilter::filterXPathFunction($xpathWithoutStringLiterals, $allowedFunctions);
XPathFilter::filterXPathAxis($xpathWithoutStringLiterals, $allowedAxes);
} catch (RuntimeException $e) {
throw new InvalidArgumentException(sprintf(
$message ?: $e->getMessage(),
$xpathExpression,
));
}
}
}
61 changes: 49 additions & 12 deletions src/Constants.php
Original file line number Diff line number Diff line change
Expand Up @@ -37,18 +37,55 @@ class Constants
*/
public const XPATH10_URI = 'http://www.w3.org/TR/1999/REC-xpath-19991116';

/**
* The namespace for the XML Path Language 2.0
*/
public const XPATH20_URI = 'http://www.w3.org/TR/2010/REC-xpath20-20101214/';
/** @var array<string> */
public const DEFAULT_ALLOWED_AXES = [
'ancestor',
'ancestor-or-self',
'attribute',
'child',
'descendant',
'descendant-or-self',
'following',
'following-sibling',
// 'namespace', // By default, we do not allow using the namespace axis
'parent',
'preceding',
'preceding-sibling',
'self',
];

/**
* The namespace for the XML Path Language 3.0
*/
public const XPATH30_URI = 'https://www.w3.org/TR/2014/REC-xpath-30-20140408/';
/** @var array<string> */
public const DEFAULT_ALLOWED_FUNCTIONS = [
// 'boolean',
// 'ceiling',
// 'concat',
// 'contains',
// 'count',
// 'false',
// 'floor',
// 'id',
// 'lang',
// 'last',
// 'local-name',
// 'name',
// 'namespace-uri',
// 'normalize-space',
'not',
// 'number',
// 'position',
// 'round',
// 'starts-with',
// 'string',
// 'string-length',
// 'substring',
// 'substring-after',
// 'substring-before',
// 'sum',
// 'text',
// 'translate',
// 'true',
];

/**
* The namespace for the XML Path Language 3.1
*/
public const XPATH31_URI = 'https://www.w3.org/TR/2017/REC-xpath-31-20170321/';
/** @var int */
public const XPATH_FILTER_MAX_LENGTH = 100;
}
163 changes: 163 additions & 0 deletions src/Utils/XPathFilter.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
<?php

declare(strict_types=1);

namespace SimpleSAML\XML\Utils;

use SimpleSAML\XML\Exception\RuntimeException;

use function in_array;
use function preg_match_all;
use function preg_replace;

/**
* XPathFilter helper functions for the XML library.
*
* @package simplesamlphp/xml-common
*/
class XPathFilter
{
/**
* Remove the content from all single or double-quoted strings in $input, leaving only quotes.
*
* @param string $input
* @return string
* @throws \SimpleSAML\XML\Exception\RuntimeException
*/
public static function removeStringContents(string $input): string
{
/**
* This regex should not be vulnerable to a ReDOS, because it uses possessive quantifiers
* that prevent backtracking.
*
* @see https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS
*
* Use possessive quantifiers (i.e. *+ and ++ instead of * and + respectively) to prevent backtracking.
*
* '/(["\'])(?:(?!\1).)*+\1/'
* (["\']) # Match a single or double quote and capture it in group 1
* (?: # Start a non-capturing group
* (?! # Negative lookahead
* \1 # Match the same quote as in group 1
* ) # End of negative lookahead
* . # Match any character (that is not a quote, because of the negative lookahead)
* )*+ # Repeat the non-capturing group zero or more times, possessively
* \1 # Match the same quote as in group 1
*/
$res = preg_replace(
'/(["\'])(?:(?!\\1).)*+\\1/',
"\\1\\1", // Replace the content with two of the quotes that were matched
$input,
);

if (null === $res) {
throw new RuntimeException("Error in preg_replace");
}

return $res;
}


/**
* Check if the $xpath_expression uses an XPath function that is not in the list of allowed functions
*
* @param string $xpathExpression the expression to check. Should be a valid xpath expression
* @param string[] $allowedFunctions array of string with a list of allowed function names
* @throws \SimpleSAML\XML\Exception\RuntimeException
*/
public static function filterXPathFunction(string $xpathExpression, array $allowedFunctions): void
{
/**
* Look for the function specifier '(' and look for a function name before it.
* Ignoring whitespace before the '(' and the function name.
* All functions must match a string on a list of allowed function names
*/
$matches = [];
$res = preg_match_all(
/**
* Function names are lower-case alpha (i.e. [a-z]) and can contain one or more hyphens,
* but cannot start or end with a hyphen. To match this, we start with matching one or more
* lower-case alpha characters, followed by zero or more atomic groups that start with a hyphen
* and then match one or more lower-case alpha characters. This ensures that the function name
* cannot start or end with a hyphen, but can contain one or more hyphens.
* More than one consecutive hyphen does not match.
*
* Use possessive quantifiers (i.e. *+ and ++ instead of * and + respectively) to prevent backtracking
* and thus prevent a ReDOS.

* '/([a-z]++(?>-[a-z]++)*+)\s*+\(/'
* ( # Start a capturing group
* [a-z]++ # Match one or more lower-case alpha characters
* (?> # Start an atomic group (no capturing)
* - # Match a hyphen
* [a-z]++ # Match one or more lower-case alpha characters, possessively
* )*+ # Repeat the atomic group zero or more times,
* ) # End of the capturing group
* \s*+ # Match zero or more whitespace characters, possessively
* \( # Match an opening parenthesis
*/

'/([a-z]++(?>-[a-z]++)*+)\\s*+\\(/',
$xpathExpression,
$matches,
);

// Check that all the function names we found are in the list of allowed function names
foreach ($matches[1] as $match) {
if (!in_array($match, $allowedFunctions)) {
throw new RuntimeException("Invalid function: '" . $match . "'");
}
}
}


/**
* Check if the $xpath_expression uses an XPath axis that is not in the list of allowed axes
*
* @param string $xpathExpression the expression to check. Should be a valid xpath expression
* @param string[] $allowedAxes array of string with a list of allowed axes names
* @throws \SimpleSAML\XML\Exception\RuntimeException
*/
public static function filterXPathAxis(string $xpathExpression, array $allowedAxes): void
{
/**
* Look for the axis specifier '::' and look for a function name before it.
* Ignoring whitespace before the '::' and the axis name.
* All axes must match a string on a list of allowed axis names
*/
$matches = [];
$res = preg_match_all(
/**
* We use the same rules for matching Axis names as we do for function names.
* The only difference is that we match the '::' instead of the '('
* so everything that was said about the regular expression for function names
* applies here as well.
*
* Use possessive quantifiers (i.e. *+ and ++ instead of * and + respectively) to prevent backtracking
* and thus prevent a ReDOS.
*
* '/([a-z]++(?>-[a-z]++)*+)\s*+::'
* ( # Start a capturing group
* [a-z]++ # Match one or more lower-case alpha characters
* (?> # Start an atomic group (no capturing)
* - # Match a hyphen
* [a-z]++ # Match one or more lower-case alpha characters, possessively
* )*+ # Repeat the atomic group zero or more times,
* ) # End of the capturing group
* \s*+ # Match zero or more whitespace characters, possessively
* \( # Match an opening parenthesis
*/

'/([a-z]++(?>-[a-z]++)*+)\\s*+::/',
$xpathExpression,
$matches,
);

// Check that all the axes names we found are in the list of allowed axes names
foreach ($matches[1] as $match) {
if (!in_array($match, $allowedAxes)) {
throw new RuntimeException("Invalid axis: '" . $match . "'");
}
}
}
}
Loading