-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Improve JSON serializable object recursion protection #10020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bb5e561
to
c9ff86d
Compare
Just a note that there might be similar issue with FFI and var_export that I'd of course look at too. Basically all XFAIL tests marked here will get resolved. |
Zend/zend_types.h
Outdated
@@ -681,6 +681,7 @@ static zend_always_inline uint32_t zval_gc_info(uint32_t gc_type_info) { | |||
#define GC_IMMUTABLE (1<<6) /* can't be changed in place */ | |||
#define GC_PERSISTENT (1<<7) /* allocated using malloc */ | |||
#define GC_PERSISTENT_LOCAL (1<<8) /* persistent, but thread-local */ | |||
#define GC_DEBUG_PROTECTED (1<<9) /* whether the protection is for debug functions */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes conflict with IS_OBJ_FREE_CALLED
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry I completely missed that it's reused by obj flags. I just had a quick try to do
diff --git a/Zend/zend_types.h b/Zend/zend_types.h
index 432c79f35a..9c4cf21ec1 100644
--- a/Zend/zend_types.h
+++ b/Zend/zend_types.h
@@ -636,9 +636,9 @@ static zend_always_inline zend_uchar zval_get_type(const zval* pz) {
#define GC_TYPE_MASK 0x0000000f
#define GC_FLAGS_MASK 0x000003f0
-#define GC_INFO_MASK 0xfffffc00
+#define GC_INFO_MASK 0xfffff800
#define GC_FLAGS_SHIFT 0
-#define GC_INFO_SHIFT 10
+#define GC_INFO_SHIFT 11
static zend_always_inline zend_uchar zval_gc_type(uint32_t gc_type_info) {
return (gc_type_info & GC_TYPE_MASK);
@@ -677,11 +677,11 @@ static zend_always_inline uint32_t zval_gc_info(uint32_t gc_type_info) {
/* zval_gc_flags(zval.value->gc.u.type_info) (common flags) */
#define GC_NOT_COLLECTABLE (1<<4)
-#define GC_PROTECTED (1<<5) /* used for recursion detection */
-#define GC_IMMUTABLE (1<<6) /* can't be changed in place */
-#define GC_PERSISTENT (1<<7) /* allocated using malloc */
-#define GC_PERSISTENT_LOCAL (1<<8) /* persistent, but thread-local */
-#define GC_DEBUG_PROTECTED (1<<9) /* whether the protection is for debug functions */
+#define GC_PROTECTED (1<<5) /* used for recursion detection */
+#define GC_IMMUTABLE (1<<6) /* can't be changed in place */
+#define GC_PERSISTENT (1<<7) /* allocated using malloc */
+#define GC_PERSISTENT_LOCAL (1<<8) /* persistent, but thread-local */
+#define GC_DEBUG_PROTECTED (1<<10) /* whether the protection is for debug functions */
#define GC_NULL (IS_NULL | (GC_NOT_COLLECTABLE << GC_FLAGS_SHIFT))
#define GC_STRING (IS_STRING | (GC_NOT_COLLECTABLE << GC_FLAGS_SHIFT))
It's likely missing some things as it's breaking something in GC because it shows leaks so I would need to investigate / debug it more but before I do that and spend more time on this, I wanted to check if you would be fine in general with extending common flags to 11 or you think there's a better solution for this? Basically I just don't want to waste time on something that you think is not the right solution.
@@ -2,6 +2,8 @@ | |||
Bug #77638 (var_export'ing certain class instances segfaults) | |||
--EXTENSIONS-- | |||
ffi | |||
--XFAIL-- | |||
var_export now expect consistent properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Introduction of segfaults in existing tests is unacceptable of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this is just a draft so I would of course fix all the issues before it's ready. I mainly wanted to check if the approach looks fine to you before I finalize it.
c9ff86d
to
72ff655
Compare
I wouldn't change GC related mask without a very big reason. As I understood, the main reason of this PR is allowing nested calls of Can this be fixed by remembering and reseting GC_PROTECTED flags at start of all serializers and restoring original value at the end? |
@dstogov It's actually more about optimizing I'm not sure how remembering GC_PROTECTED could work. If we took just a simple case of calling Just for my interest, is the new flag really that problematic? From what I saw it just reduces the GC address counter to 19 bits which should still be sufficient. Or is there anything else that I missed? |
Can you please show, what particular case is optimized. If we have recursion <?php
class SerializingTest implements JsonSerializable
{
public function jsonSerialize(): mixed
{
return (object)['a' => (object)['a' => (object)['a' => $this->a]]];
}
}
$o = new SerializingTest();
for ($i = 0; $i < 1000000; $i++) {
$s = json_encode($o);
}
var_dump($s);
Reducing limits of GC is definitely bad and should be done with extreme care. You actually, didn't change all the GC related limits (like GC_MAX_UNCOMPRESSED). Reducing this single bit from GC address would limit number of "uncomressed" possible GC roots to 256K. So the GC would get significant slowdown, once a php app would create graph with more than 256K object. For some apps this may become a problem. |
@dstogov This https://github.com/php/php-src/pull/10020/files#diff-7d42a64925fbe4e298b5fcf928cb3994e4e021774503190b850be7d01d181be9R556-R557 . It's basically to prevent building of properties table when serializing JsonSerializable object. I assume that your example doesn't show any differences because there are no properties defined in it. If you add bunch, you might see some difference potentially. Essentially this was first reported here: https://bugs.php.net/bug.php?id=81524 . Then Nikita tried to fix it in #7589 but that resulted in significant slow down so it was closed. |
OK. Now I finally understood the root of the first problem. <?php
class SerializingTest implements JsonSerializable
{
public $a = 1;
public function jsonSerialize(): mixed
{
return (object)['a' => (object)['a' => (object)['a' => $this->a]]];
}
}
for ($i = 0; $i < 1000000; $i++) {
$s = json_encode(new SerializingTest());
}
var_dump($s); Unfortunately, I don't have a good idea about a general fix. I'll try to think a bit more. |
@dstogov I have been thinking about an alternative solution for this and got an idea how to possibly resolve it. I think we could keep a single flag for the GC protection but the semantic of that would slightly change if it is applied on object. It would just mean that it was protected by any of the sources (var_dump, json_encode...) but if it is set as protected it would require getting the HashTable (using Z_OBJPROP_P) and then it would do another GC protection check for the specific source. We could use currently unused flags in Line 348 in 6a6e91f
graph TD;
CheckProtection --> IsObjGCProtected{Is object GC protected?}
IsObjGCProtected -->|no| ProtectObj{Protect object using GC flag}
IsObjGCProtected -->|yes| GetHT{Get prop HashTable from object}
ProtectObj --> ExecutePC{Execute protected code}
GetHT --> IsHTProtected{Is HashTable protected with source flag?}
IsHTProtected -->|no| ProtectHT{Protect object using GC flag}
IsHTProtected -->|yes| Recursion{End because Recursion}
ProtectHT --> ExecutePC
ExecutePC --> IsHTPresent{Is HashTable present?}
IsHTPresent -->|yes| UnprotectHT{Unprotect HashTable}
IsHTPresent -->|no| UnprotectObj{Unprotect object}
UnprotectObj --> End
UnprotectHT --> End
For majority of use case that HashTable branch would not need to be used as that would be used only for Recursion or nested interactions so those cases are much less likely. Does it make sense to you and if so, do you see any issues with that? |
I didn't understand your proposal in all details yet, but looking into the block-scheme I think that the behavior of var_dump() and others is going to be clanged. Now the "recursive objects" are going to be processed twice, before the recursion is detected. Right? |
As I see, one of the main problems is the space required for the additional "protection" bit(s). I think it may be possible to reuse hight bits in Z_PROPERTY_GUARD(). When |
Yeah the initial idea was to do it twice. We used to have that behavior in JSON before and it is not such a big issue - it just slightly changes output but it is not such a big problem. But of course it is better not to do that so your proposal is much better! I will give it a try hopefully next or following month as this is kind lower priority thing for me but would like to get it sorted before the first beta. I will ping you then when I have got a working PR. Thanks! |
Closing as this solution won't be used in any case. |
This PR introduces a new way of recursion protection in JSON, var_dump and friends. It fixes issue in master for __debugInfo and also improves perf for jsonSerializable in some cases. More info can be found in phpGH-10020. Closes phpGH-11812
This PR changes the JSON serialized object protection logic to use in most cases protection on object which should be more performent and reduce the memory usage. It's a done in the way so it is still possible to support nested calls with debug functions and var_export. It means
json_encode
inside__debugInfo
on the same objectvar_dump
,debug_zval_dump
andprint_r
insidejsonSerialize
on the same object.var_export
inside__debugInfo
orjsonSerialize
on the same object.See various test cases that implements those combinations (should be clearer from it).
To support 1st case, the two level protection for JSON serializable object is introduced. Primarily it tries to set protection on an object and only if it's already protected, it falls back to protecting the properties hash table. In most cases it will just use protection on the object and only if called inside
__debugInfo
, it will use the properties hash table protection. It results in a slight output change if there is a recursion as it catches the recursion on the next level (see diff below forext/json/tests/bug61978.phpt
). I think this should be acceptable and after thinking about it, such output makes maybe even more sense because it actually shows that there's some kind of recursion rather than just anull
value but this might be subjective.To support 2nd case, a special flag on object has been introduced. The reason for that is that it might not be reliable to do protection on the result of debug info when the result is temporary. If I understand the code correctly, it might happen on the reference case. It would also leave more edge cases when all 3 approaches are combined (see test case
json_encode_recursion_04.phpt
) which couldn't fully work as fallback cover just 2 levels.To support 3rd case, the protection needs to be again done on the returned properties table. For
var_export
it is mostly the properties table so it should be mostly reliable. The exception is the SplFixedArray that got changed in #9757 . I think we could still leave this optimization for debug functions but forvar_export
it might be better to fallback to the old behavior. The thing is that the change for protection invar_export
introduced in #9448 broke usingvar_export
in__debugInfo
which previously worked and it is in some way a regression. I haven't implemented the changes for SplFixedArray yet as I would like to first hear other opinions about it.@dstogov @TysonAndre @nikic what do you think about this? Does it make sense to you or do you think it should be done differently?