KPHP runtime is fundamentally single-threaded. Script memory allocators are implemented without concurrent access in mind. It makes a reason to let a single KPHP worker occupy one CPU core.
Even embedded curl is compiled from scratch with special options to avoid DNS resolving in a separate thread.
KPHP runtime uses a preallocated fixed-size buffer. All memory pieces are aligned to 8 bytes.
For the small pieces (< 16KB), fixed-size slab lists are used. For the huge pieces (>= 16KB), the red-black binary tree is used. When a memory piece is freed, it is attached to the corresponding structure, so on the next allocation, it can be reused.
After each request, the allocator state, all lists, and the tree are reset.
The main disadvantage of such an approach is memory fragmentation on long requests. In order to improve this situation, the allocator may perform an implicit defragmentation process, that stops the world.
All types of the type system are implemented in C++.
Types that take roots from PHP (like strings and arrays) are implemented to behave exactly the same.
string in PHP is a binary-safe set of bytes. Like in PHP, strlen(“Привет”) (a Russian word as UTF-8) is 12, not 6: to deal with encodings, use mbstring module. PHP has no char type: single chars are also strings. Concatenations commonly allocate new strings, though KPHP tries to perform compile-time optimizations to avoid extra PHP strings creation using raw buffers and preallocation, overall it works identically.
array in PHP can be either array-vector or array-map (hashtable), that's why array<T> in KPHP can store both integer and string buckets. Vectors are optimized, especially vectors of primitives. KPHP handles refcounters and implements copy-on-write. All nuances with numeric strings seem to be covered.
mixed is a 16-byte structure with enum type and union storage, reinterpret_casted depending on a runtime type. Almost all runtime operations with mixed are expressed as switch-case.
callable is an implicit codegenerated lambda class with __invoke() method, but some callables are really inlined as C++ lambdas in-place.
PHP classes are codegenerated as C++ structures, and class_instance<T> is a wrapper above structures, similar to shared_ptr, that provides reference semantics. Class inheritance is a real C++ inheritance, but class methods are separate functions that accept $this as the first parameter.
Types that are KPHP-custom are polyfilled with PHP and behave the same if used correctly in PHP. For instance, tuples are expressed as std::tuple, and shapes are “tagged tuples”, consider shape.h for implementation.
Instances, tuples and shapes can be converted to an array with to_array_debug(), can be serialized, etc. To support this, the compilation stage detects such classes and codegenerates visitors to C++ structures.
string and array variables use a reference counter, but unlike classes, they are copy on write.
All available functions are listed in functions.txt. It's not the best format of extern declarations, has to be reconsidered in the future. Looks like this:
function bcmod ($lhs ::: string, $rhs ::: string) ::: string;
C++ implementations are also in the runtime/ folder with f$ prefix:
string f$bcmod(const string &lhs, const string &rhs) { /* ... */ }
Some PHP functions have only one implementation, and some have many C++ overloads. For example, the abs() function should work like in PHP: if passed int, return int; if passed string, return int or float; if float, return float; and so on. Of course, the easiest approach is to declare
function abs ($v ::: mixed) ::: mixed;
But this means that abs(-2) would be inferred mixed suddenly. Not to spoil types, it's declared as
function abs ($v ::: any) ::: ^1 | int;
And implemented in C++ for many overloads, to match announced inferring:
mixed f$abs(const mixed &v);
int64_t f$abs(int64_t v);
double f$abs(double v);
int64_t f$abs(const Optional<int64_t> &v);
int64_t f$abs(const Optional<bool> &v);
double f$abs(const Optional<double> &v);
Many trivial functions like is_bool(), count(), etc. have lots of overloads not to spoilt types.
Functions for arrays often require C++ template functions. Take
function array_first_value ($a ::: array) ::: ^1[*];
Since $a can be an array of any — it doesn't matter — and returns its element type, we implement it as
template<class T>
T f$array_first_value(const array<T> &a) { /* ... */ }
The return type of some C++ implementations depends on PHP call. Let's look at
function instance_deserialize($serialized ::: string, $to_type ::: string) ::: instance<^2>;
If PHP code calls instance_deserialize($s, A::class), its return type is A. To manage this, attribute cpp_template_call is listed in @kphp-extern-func-info declaration — then codegeneration outputs not a simple call, but a template call with the inferred return type. And in C++ we implement this as
template<class ResultClass>
ResultClass f$instance_deserialize(const string &serialized, const string&) { /* ... */ }
Some PHP operators are codegenerated as C++ operators, and some as C++ functions. For example, $a*$b
will remain the same in C++ (operator * is overloaded for string, mixed, etc.), but $a/$b
will look like divide($a,$b)
, as C++ division works differently for integers.
Implementation of OpenSSL, curl, and some others uses external libraries, but all f$ “PHP wrappers” must exist, that proxy PHP types to native code and back.
In order to make standard PHP functions behave the same, studying PHP source code is necessary. Some corner cases were implemented concerning Zend implementations, adapted to KPHP types and specifics.
In order to improve efficiency, KPHP preallocates all constant arrays and strings.
If string or array is a compile-time constant, KPHP puts it into the data section as is and generates initialization code which is evaluated on server startup only once, putting the result into heap memory.
In both cases, these variables have the special reference counter value which can't be modified, but any modification of the variable triggers copy on write process.
Every KPHP worker has a network server running inside, which uses reactor pattern for event handling, built upon epoll multiplexing:
All reactor iterations are performed in epoll_work() function.
Until an HTTP request is finished, the worker doesn't start handling another request. KPHP server supports keep-alive header for incoming HTTP connections, but one worker can keep only one connection at a time, so it has to close the HTTP connection before accepting a new one.
Sending an RPC query flows like this:
rpc_tl_query_one()
from PHP code, it serializes a query to raw TL bytesReceiving an RPC response flows like this:
rpc_tl_query_result_one()
, it returns the parsed TL (gets bytes if ready or waits for all this)If you plan to use RPC calls, a compiled tl schema (.tlo file) must be specified with --tl-schema/-T
option for kphp. It codegenerates all C++ classes representing TL types and functions, as well as all storing/fetching implementations (converting TL bytes to PHP/TL types and back). For untyped TL it is done automatically (as it is undetectable, what functions can be called), for typed TL only reachable corresponding PHP classes are linked to C++ analogs.
Async programming implementation is quite complicated. Functions that are forked and all accessible from them down through the call graph can be resumable — it means, they can stop and then start from the same point, having restored all state.
A function stops when it starts waiting for a network response or for another resumable function.
A function resumes when a network response is ready or that resumable function has finished.
Let's examine the following PHP snippet:
function selectUser(int $id) {
$query = ['_' => 'user.get', 'id' => $id];
$response = rpcCall($query);
return $response['result'] ?? false;
}
wait(fork(selectUser(1)));
The selectUser() function is resumable (because it is accessible from fork() and calls a resumable rpcCall()). It is codegenerated not as a function — but as a class, where all local variables become members of this class. This is how local values are stored in a stopped state.
class c$selectUser final : public Resumable {
private:
int64_t v$id;
array< mixed > v$query;
mixed v$response;
public:
using ReturnT = mixed;
c$selectUser(int64_t v$id) noexcept : v$id(v$id), v$query(), v$response() { }
bool run() noexcept final {
RESUMABLE_BEGIN
//12: $query = ['_' => 'user.get', 'id' => $id];
v$query = ({ /* array creation */ });
//13: $response = rpcCall($query);
v$response = f$rpcCall(v$query);
TRY_WAIT(resumable_label$u3, v$response, mixed);
//14: return $response['result'] ?? false;
RETURN (null_coalesce< mixed >(v$response, v$const_string$us8, 471988L, false));
RESUMABLE_END
}
};
mixed f$selectUser(int64_t v$id) noexcept {
return start_resumable < c$selectUser::ReturnT >(new c$selectUser(v$id));
}
int64_t f$fork$selectUser(int64_t v$id) noexcept {
return fork_resumable(new c$selectUser(v$id));
}
As we see, the function body is run() method surrounded by RESUMABLE_BEGIN / RESUMABLE_END
. We also see some strange TRY_WAIT
and RETURN
— let's look into them.
They all are macros described in resumable.h. Let's expand the run() function focusing on macros:
bool run() noexcept final {
if (pos__ != nullptr) // pos__ is a member of Resumable class
goto pos__;
do {
/* real function body: */
v$query = ({ /* array creation */ });
v$response = f$rpcCall(v$query);
// this is TRY_WAIT:
pos__ = resumable_label$u3;
return false;
resumable_label$u3:
v$response = input_->load<mixed>(); // input_ is a member of Resumable
// this is RETURN
output_->save <mixed> (/* return value */); // output_ is a member of Resumable
return true;
} while (0);
return false;
}
Some highlights on what's going on:
The scheduler, which orchestrates all forks and detects what can/should be called next, goes out of the frame of current documentation. Its sources are present in resumable.cpp.