Skip to content

Commit 00b035c

Browse files
committed
Make more difficult sanitize of the expression string before eval
1 parent 4b2d89c commit 00b035c

File tree

5 files changed

+91
-23
lines changed

5 files changed

+91
-23
lines changed

ANNOUNCE.rst

+21-2
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,10 @@ Announcing NumExpr 2.8.5
44

55
Hi everyone,
66

7-
**Under development.**
7+
In 2.8.5 we have added a new function, `validate` which checks an expression `ex`
8+
for validity, for usage where the program is parsing a user input. There are also
9+
consequences for this sort of usage, since `eval(ex)` is called, and as such we
10+
do some string sanitization as described below.
811

912
Project documentation is available at:
1013

@@ -13,7 +16,23 @@ http://numexpr.readthedocs.io/
1316
Changes from 2.8.4 to 2.8.5
1417
---------------------------
1518

16-
**Under development.**
19+
* A `validate` function has been added. This function checks the inputs, returning
20+
`None` on success or raising an exception on invalid inputs. This function was
21+
added as numerous projects seem to be using NumExpr for parsing user inputs.
22+
`re_evaluate` may be called directly following `validate`.
23+
* As an addendum to the use of NumExpr for parsing user inputs, is that NumExpr
24+
calls `eval` on the inputs. A regular expression is now applied to help sanitize
25+
the input expression string, forbidding '__', ':', and ';'. Attribute access
26+
is also banned except for '.r' for real and '.i' for imag.
27+
* Thanks to timbrist for a fix to behavior of NumExpr with integers to negative
28+
powers. NumExpr was pre-checking integer powers for negative values, which
29+
was both inefficient and causing parsing errors in some situations. Now NumExpr
30+
will simply return 0 as a result for such cases. While NumExpr generally tries
31+
to follow NumPy behavior, performance is also critical.
32+
* Thanks to peadar for some fixes to how NumExpr launches threads for embedded
33+
applications.
34+
* Thanks to de11n for making parsing of the `site.cfg` for MKL consistent among
35+
all shared platforms.
1736

1837

1938
What's Numexpr?

RELEASE_NOTES.rst

+18-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,24 @@ Release notes for NumExpr 2.8 series
55
Changes from 2.8.4 to 2.8.5
66
---------------------------
77

8-
**Under development.**
8+
* A `validate` function has been added. This function checks the inputs, returning
9+
`None` on success or raising an exception on invalid inputs. This function was
10+
added as numerous projects seem to be using NumExpr for parsing user inputs.
11+
`re_evaluate` may be called directly following `validate`.
12+
* As an addendum to the use of NumExpr for parsing user inputs, is that NumExpr
13+
calls `eval` on the inputs. A regular expression is now applied to help sanitize
14+
the input expression string, forbidding '__', ':', and ';'. Attribute access
15+
is also banned except for '.r' for real and '.i' for imag.
16+
* Thanks to timbrist for a fix to behavior of NumExpr with integers to negative
17+
powers. NumExpr was pre-checking integer powers for negative values, which
18+
was both inefficient and causing parsing errors in some situations. Now NumExpr
19+
will simply return 0 as a result for such cases. While NumExpr generally tries
20+
to follow NumPy behavior, performance is also critical.
21+
* Thanks to peadar for some fixes to how NumExpr launches threads for embedded
22+
applications.
23+
* Thanks to de11n for making parsing of the `site.cfg` for MKL consistent among
24+
all shared platforms.
25+
926

1027
Changes from 2.8.3 to 2.8.4
1128
---------------------------

doc/user_guide.rst

+17-10
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
NumExpr 2.0 User Guide
1+
NumExpr 2.8 User Guide
22
======================
33

4-
The :code:`numexpr` package supplies routines for the fast evaluation of
4+
The NumExpr package supplies routines for the fast evaluation of
55
array expressions elementwise by using a vector-based virtual
66
machine.
77

@@ -11,23 +11,33 @@ Using it is simple::
1111
>>> import numexpr as ne
1212
>>> a = np.arange(10)
1313
>>> b = np.arange(0, 20, 2)
14-
>>> c = ne.evaluate("2*a+3*b")
14+
>>> c = ne.evaluate('2*a + 3*b')
1515
>>> c
1616
array([ 0, 8, 16, 24, 32, 40, 48, 56, 64, 72])
1717

1818

19+
It is also possible to use NumExpr to validate an expression::
20+
21+
>>> ne.validate('2*a + 3*b')
22+
23+
which returns `None` on success or raises an exception on invalid inputs.
24+
25+
and it can also re_evaluate an expression::
26+
27+
>>> b = np.arange(0, 40, 4)
28+
>>> ne.re_evaluate()
29+
1930
Building
2031
--------
2132

22-
*NumExpr* requires Python_ 2.6 or greater, and NumPy_ 1.7 or greater. It is
33+
*NumExpr* requires Python_ 3.7 or greater, and NumPy_ 1.13 or greater. It is
2334
built in the standard Python way:
2435

2536
.. code-block:: bash
2637
27-
$ python setup.py build
28-
$ python setup.py install
38+
$ pip install .
2939
30-
You must have a C-compiler (i.e. MSVC on Windows and GCC on Linux) installed.
40+
You must have a C-compiler (i.e. MSVC Build tools on Windows and GCC on Linux) installed.
3141

3242
Then change to a directory that is not the repository directory (e.g. `/tmp`) and
3343
test :code:`numexpr` with:
@@ -268,9 +278,6 @@ General routines
268278
* :code:`detect_number_of_cores()`: Detects the number of cores on a system.
269279

270280

271-
272-
273-
274281
Intel's VML specific support routines
275282
-------------------------------------
276283

numexpr/necompiler.py

+19-8
Original file line numberDiff line numberDiff line change
@@ -260,15 +260,17 @@ def __init__(self, astnode):
260260
def __str__(self):
261261
return 'Immediate(%d)' % (self.node.value,)
262262

263-
_forbidden_re = re.compile('[\;[\:]|__')
263+
264+
_forbidden_re = re.compile('[\;[\:]|__|\.[abcdefghjklmnopqstuvwxyzA-Z_]')
264265
def stringToExpression(s, types, context):
265266
"""Given a string, convert it to a tree of ExpressionNode's.
266267
"""
267268
# sanitize the string for obvious attack vectors that NumExpr cannot
268269
# parse into its homebrew AST. This is to protect the call to `eval` below.
269-
# We forbid `;`, `:`. `[` and `__`
270-
# We would like to forbid `.` but it is both a reference and decimal point.
271-
if _forbidden_re.search(s) is not None:
270+
# We forbid `;`, `:`. `[` and `__`, and attribute access via '.'.
271+
# We cannot ban `.real` or `.imag` however...
272+
no_whitespace = re.sub(r'\s+', '', s)
273+
if _forbidden_re.search(no_whitespace) is not None:
272274
raise ValueError(f'Expression {s} has forbidden control characters.')
273275

274276
old_ctx = expressions._context.get_current_context()
@@ -766,7 +768,6 @@ def getArguments(names, local_dict=None, global_dict=None, _frame_depth: int=2):
766768
_names_cache = CacheDict(256)
767769
_numexpr_cache = CacheDict(256)
768770
_numexpr_last = {}
769-
_numexpr_sanity = set()
770771
evaluate_lock = threading.Lock()
771772

772773
# MAYBE: decorate this function to add attributes instead of having the
@@ -828,6 +829,13 @@ def validate(ex: str,
828829
_frame_depth: int
829830
The calling frame depth. Unless you are a NumExpr developer you should
830831
not set this value.
832+
833+
Note
834+
----
835+
Both `validate` and by extension `evaluate` call `eval(ex)`, which is
836+
potentially dangerous on unsanitized inputs. As such, NumExpr does some
837+
sanitization, banning the character ':;[', the dunder '__', and attribute
838+
access to all but '.r' for real and '.i' for imag access to complex numbers.
831839
"""
832840
global _numexpr_last
833841

@@ -857,8 +865,6 @@ def validate(ex: str,
857865
kwargs = {'out': out, 'order': order, 'casting': casting,
858866
'ex_uses_vml': ex_uses_vml}
859867
_numexpr_last = dict(ex=compiled_ex, argnames=names, kwargs=kwargs)
860-
# with evaluate_lock:
861-
# return compiled_ex(*arguments, **kwargs)
862868
except Exception as e:
863869
return e
864870
return None
@@ -918,7 +924,12 @@ def evaluate(ex: str,
918924
The calling frame depth. Unless you are a NumExpr developer you should
919925
not set this value.
920926
921-
927+
Note
928+
----
929+
Both `validate` and by extension `evaluate` call `eval(ex)`, which is
930+
potentially dangerous on unsanitized inputs. As such, NumExpr does some
931+
sanitization, banning the character ':;[', the dunder '__', and attribute
932+
access to all but '.r' for real and '.i' for imag access to complex numbers.
922933
"""
923934
# We could avoid code duplication if we called validate and then re_evaluate
924935
# here, but they we have difficulties with the `sys.getframe(2)` call in

numexpr/tests/test_numexpr.py

+16-2
Original file line numberDiff line numberDiff line change
@@ -536,13 +536,27 @@ def test_forbidden_tokens(self):
536536

537537
# Forbid semicolon
538538
try:
539-
evaluate('import os; os.cpu_count()')
539+
evaluate('import os;')
540540
except ValueError:
541541
pass
542542
else:
543543
self.fail()
544544

545-
# I struggle to come up with cases for our ban on `'` and `"`
545+
# Attribute access
546+
try:
547+
evaluate('os.cpucount()')
548+
except ValueError:
549+
pass
550+
else:
551+
self.fail()
552+
553+
# But decimal point must pass
554+
a = 3.0
555+
evaluate('a*2.')
556+
evaluate('2.+a')
557+
558+
559+
546560

547561

548562

0 commit comments

Comments
 (0)