Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic introduction to opcache and map_ptr #134

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added Book/php7/zend_engine/images/map_ptr.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
104 changes: 103 additions & 1 deletion Book/php7/zend_engine/zend_opcache.rst
Original file line number Diff line number Diff line change
@@ -1,2 +1,104 @@
Zend OPCache
Zend Opcache
============

While PHP is an interpreter, it does not interpret PHP source code directly. Instead, the source code is first parsed
into an abstract syntax tree (AST), which is then compiled to opcodes (and used to generate classes and other data
structures), which are then passed to the Zend virtual machine (Zend VM) to actually be interpreted.

As you can see, there are quite a few steps involved to run a single PHP file. Without an additional extension, PHP
will perform all of these steps for every single loaded file in every single request. Since usually PHP files don't
change between requests both the parsing and compilation steps will always yield the same result. Opcache is an
extension that caches the opcodes and data structures like classes between requests to improve performance. The AST is
not used after the compilation step (with the exception of constant expressions) and thus does not need to be cached.

Normally, all the memory allocated during the processing of a request gets freed after the request has been processed.
So to store the compiled opcodes for future requests opcache puts them in a shared memory segment (SHM) that can not
only be accessed by the same process after the request has finished but also by other processes handling other requests.
This means that if you are using php-fpm with many children they will only store the opcodes for each PHP file once in
SHM.

This comes with a significant restriction: The opcodes and other data structures in SHM must not be mutated by any
child request (at least not unless these changes should also affect the other processes). If any changes to SHM are
made there is a locking mechanism to avoid data races.

The classes and OParrays (arrays of opcodes, i.e. functions) in SHM are mostly immutable. There are currently the
following exceptions:

* ``zend_persistent_script.dynamic_members`` stores information about the state of the cached script
* ``zend_class_entry.inheritance_cache`` is extended if a new combination of parent class and interfaces is encountered
* ``zend_op.handler`` can be replaced at runtime by the tracing JIT to start executing JITted code
* ``zend_string.gc.refcount`` for persistent strings can be replaced with map ptr offset to cache class lookups by name
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* ``zend_string.gc.refcount`` for persistent strings can be replaced with map ptr offset to cache class lookups by name
* ``zend_string.gc.refcount`` for persistent strings can be replaced with a map pointer (map_ptr) offset to cache the lookup of classes by name

Found this sentence somewhat confusing


map_ptr
-------

As mentioned in the introduction, SHM is shared for all child processes. Thus, no changes can be made to the data
structures living in SHM unless 1. the changes are supposed to be reflected in all child processes and 2. the SHM
segment has been locked before any changes are written to avoid data races.

Sometimes the data structures contain a field that should be different per process or request. The ``map_ptr`` mechanism
can be used to achieve this. In short, instead of storing a pointer to per-process data directly (as that would affect
all processes but each having different addresses for this process specific data), a unique offset is assigned during
compilation. During runtime, local memory is allocated to hold enough space for all the entries referenced in these
offsets. Each offset then corresponds to an item in the local memory segment without needing to know the per-process
location of the exact element.

The map ptr can also store real pointers. This can be useful when the data structure doesn't live in SHM and thus
doesn't need to store the value in a separate map. The fact that pointers are aligned by their size is used to
Copy link
Contributor Author

@iluuu1994 iluuu1994 Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is actually wrong. We're not just storing pointers to pointers but also pointers to other data structures. I think ZEND_MAP_PTR_DEF(bool*, foo) would fail as its alignment is 1 byte and it thus could be stored in 0xABC1 without packing. So we're relying on the fact that the structures we're pointing themselves to have an alignment of >1. Maybe ZEND_MAP_PTR_DEF could even assert that.

@nikic Can you confirm this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally yes, with the caveat that there are both type alignment and allocator alignment guarantees. E.g. all ZMM allocations are 4/8 byte aligned, and the system allocator has similar guarantees. You couldn't store a pointer into the middle of a string though, which might not be sufficiently aligned.

differentiate between offsets and real pointers. The offsets start at 1, 9, 17, etc (assuming a 64-bit pointer size).
However, pointers (unless padding was removed) would not get aligned this way. Instead, they would be stored in
addresses that end with 0x0 or 0x8 given that they are 8-byte aligned. The macro ``ZEND_MAP_PTR_IS_OFFSET`` will be used
internally to check if the ``0b1`` byte is set. If it is, the value stored is an offset. Otherwise it's a direct
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

byte -> bit?

Copy link
Contributor Author

@iluuu1994 iluuu1994 Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! That makes sense. I guess we're also never using ZEND_MAP_PTR_DEF to store an offset into a structure.

struct Foo {
    bool a;
    bool b;
};

struct Bar {
    ZEND_MAP_PTR_DEF(bool*, b);
};

struct Foo *foo = emalloc(sizeof(struct Foo));
bool *b = &foo->b; // b would have the 0b1 bit set

struct Bar *bar = emalloc(sizeof(struct Bar));
ZEND_MAP_PTR_INIT(bar->b, b); // bar->b would be interpreted as an offset

I'll adjust the explanation accordingly.

EDIT: Oops, I meant to post that on the comment abovel

pointer.

Here is an example from php-src.

.. image:: ./images/map_ptr.png
:align: center

Classes can contain data that needs to be evaluated at runtime, called ``mutable_data``. This ``mutable_data`` contains
fields like ``default_properties_table``, ``constants_table`` and more. ``ZEND_MAP_PTR_DEF`` can be used to declare a
field that holds the offset into local memory or direct pointer.

::

struct _zend_class_entry {
// ...
ZEND_MAP_PTR_DEF(zend_class_mutable_data*, mutable_data);
// ...
};

``ZEND_MAP_PTR_INIT`` assigns a value directly to the underlying offset field. This must only be done when the structure
is not yet in SHM. In practice, this marco should usually only be used to initialize the map ptr to ``NULL``.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

marco -> macro. I also wouldn't say this writes the offset field, it writes the pointer. It does not need to be NULL for the non-SHM use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to point to the fact that usually code should be agnostic on whether an offset or pointer is stored, so you should usually do ZEND_MAP_PTR_INIT(x->y, NULL) and then ZEND_MAP_PTR_SET(x->y, ptr). Maybe I can clarify that ZEND_MAP_PTR_INIT can be used when you know the structure doesn't live in SHM as a performance optimization.


::

ZEND_MAP_PTR_INIT(ce->mutable_data, NULL);

``ZEND_MAP_PTR_NEW`` can be used to generate a new offset and store it in the given field. This must only be done when
the structure is not yet in SHM. This step can be skipped if the data structure is not intended to be stored in SHM and
thus doesn't need any pointer indirection.

::

ZEND_MAP_PTR_NEW(ce->mutable_data);
Comment on lines +78 to +84
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few caveats with ZEND_MAP_PTR_NEW that I think are worth mentioning:

One is that it is only safe to use ZEND_MAP_PTR_NEW during compilation of user scripts. This is because opcache needs to update ZCSG(map_ptr_last) so that all processes are aware of the new offsets, which it does after compilations. It is also safe to use during the startup phase, but not in the MINIT() of dl()'ed modules.

An other one is that offsets should be allocated sparingly, and never on non persistent structures, as offsets are never freed or reused (and this eventually affects the memory usage of all processes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One is that it is only safe to use ZEND_MAP_PTR_NEW during compilation of user scripts.

I hinted at this with:

This must only be done when the structure is not yet in SHM.

I'll see if I can reword this to make it clearer. The fact that we're modifying ZCSG(map_ptr_last) also means we need an exclusive write lock.


To assign a pointer to the map ptr you can use the ``ZEND_MAP_PTR_SET`` macro. It will automatically store it in the
local map (through ``ZEND_MAP_PTR_SET_IMM``) or in the map ptr itself (through ``ZEND_MAP_PTR_INIT``) depending on
whether it contains an offset or a real pointer (or ``NULL``). ``ZEND_MAP_PTR_SET_IMM`` should usually not be called
directly as it assumes the underlying value is an offset, which is not a always a safe assumption.

::

ZEND_MAP_PTR_SET(ce->mutable_data, mutable_data);

``ZEND_MAP_PTR_GET`` can be used to retrieve the pointer regardless of whether it's stored as an indirect of direct
pointer. Once again, there's ``ZEND_MAP_PTR_GET_IMM`` which usually shouldn't be called directly.

::

zend_mutable_data *mutable_data = ZEND_MAP_PTR_GET(ce->mutable_data);

To summarize, with the map ptr mechanism the relative offset to ``mutable_data`` is the same for each process. The local
map itself lives at a different address in each process. The associated item will be retrieved by adding the given offset
to the base of the local map.