diff --git a/rfcs/20190814-kernel-and-op-registration.md b/rfcs/20190814-kernel-and-op-registration.md new file mode 100644 index 000000000..155d39b03 --- /dev/null +++ b/rfcs/20190814-kernel-and-op-registration.md @@ -0,0 +1,295 @@ +# Kernel and Op Implementation and Registration API + +| Status | Accepted | +:-------------- |:---------------------------------------------------- | +| **Author(s)** | James Ring (sjr@google.com). | +| **Sponsor** | Günhan Gülsoy (gunan@google.com) | +| **Updated** | 2020-06-02 | + +## Objective + +Tensorflow (TF) currently provides a C++ API for implementing kernels and ops. +The Voltron project aims to create a modular/plugin-based TF implementation with +API and ABI surfaces. Plugins will be able to create and register custom kernel +and op implementations. + +In order to provide a stable ABI, the Voltron team has chosen to provide C APIs +to plugin authors. This document introduces the C API for op and kernel +registration. For authors who wish to continue using C++ to interface with +TensorFlow, an ABI-stable C++ header-only API is provided. + +## Motivation + +Presently, there is no ABI-stable API for extending TensorFlow with new kernels +and ops. There is no guarantee that a plugin written with one compiler will work +with a version of TensorFlow built with another, even on the same operating +system and architecture. This makes it difficult to distribute plugins without +also distributing the source code and requiring end-users to build the plugin +alongside TensorFlow. + +An ABI-stable API for extending TensorFlow will simplify the distribution of +plugins and allow plugin authors to distribute binary artifacts without +necessarily publishing plugin source code. + +## User Benefit + +Plugin authors will be able to publish plugins that users can use more easily. +In turn, the TensorFlow community will benefit from an increase in the number of +variety of available plugins. + +## Design Overview + +In general, the kernel and op registration C APIs aim to permit the +implementation of any kernel or op that is currently possible with the C++ API. +Where possible, existing C++ function implementations are reused from within a C +wrapper. The purpose of the wrapper is simply to provide ABI stability. + +Since plugins will be dynamically loaded (e.g. via `dlopen` on POSIX), the API +avoids relying on static initialization. + +The intention is that existing kernels should be able to be ported to the new +APIs with a minimum of reimplementation effort. This precludes a from-scratch +re-imagining of TensorFlow APIs. + +The following diagram describes the components built with the proposed C and C++ +APIs. + + +----------------+ <--+ + | | | + | Plugin | | + | | | + +----------------+ | + | | | + | C++ header API | | Plugin + | | | my_plugin.so + +--> +----------------+ | + | | | | + | | C API headers | | + | | | | + | +----------------+ <--+ + | | | + | | C API impl | + Core | | | + Tensorflow | +----------------+ + libtf.so | | | + | | Core C++ APIs | + | | | + +--> +----------------+ + +In this example, there are two object files: `my_plugin.so` and +`libtensorflow.so`. `my_plugin.so` is implemented in terms of the C++ +header-only API, which is in turn implemented in terms of the C API headers. The +C API implementation is provided by TensorFlow at runtime when it loads the +plugin's shared object. + +This design addresses changes that are required to the existing C API that are +required to support op and kernel plugins. It also introduces the C++ +header-only API, which currently does not exist. + +## Ops + +This section introduces changes to the C API that are required to support ops. +An alpha version of this API is already checked in at `tensorflow/c/ops.h`. + +### Registration + +In the C++ API, ops are registered at static initialization time using the +`REGISTER_OP` macro. For example: + +```c++ +REGISTER_OP("Bitcast") + .Input("input: T") + .Output("output: type") + .Attr("T: {bfloat16, ...}") + .Attr("type: {bfloat16, ...}") + .SetShapeFn([](InferenceContext* ctx) { ... }) + .Doc("A bitcast operator"); +``` + +The equivalent C API will be a series of functions that operate on +`TF_OpDefinitionBuilder *`, a pointer to an opaque struct (i.e. a struct whose +content is not made known to the user). The functions include, but are not +limited to: + +* `TF_OpDefinitionBuilder* TF_NewOpDefinitionBuilder(const char* op_name)`: + constructs and returns a new op registration builder for an op with the given + name + +* `void TF_OpDefinitionBuilderAddAttr(TF_OpDefinitionBuilder* builder, const + char* attr)`: adds the given attribute to the builder (equivalent to `Attr` + above) + +* `void TF_OpDefinitionBuilderAddInput(TF_OpDefinitionBuilder* builder, const + char* input)`: adds the given input to the builder (equivalent to `Input` + above) + +Additional functions are provided for setting other properties of the operation +(e.g. `TF_OpDefinitionBuilderSetIsCommutative`). + +Registration is then actually performed using the `TF_RegisterOpDefinition` +function. This function populates a `TF_Status` indicating whether registration +was successful and frees the resources associated with the op definition +builder. + +The C equivalent of the bitcast op registration example above is shown below: + +```c++ + +#include "tensorflow/c/ops.h" + +void InferBitcastShape(TF_ShapeInferenceContext* ctx, // see the section below on + TF_Status* status); // shape inference + +void InitPlugin() { + TF_OpDefinitionBuilder* b = TF_NewOpDefinitionBuilder("Bitcast"); + TF_OpDefinitionBuilderAddInput(b, "input: T"); + TF_OpDefinitionBuilderAddOutput(b, "output: type"); + TF_OpDefinitionBuilderAddAttr(b, "T: {bfloat16, ...}"); + TF_OpDefinitionBuilderAddAttr(b, "type: {bfloat16, ...}"); + TF_OpDefinitionBuilderSetShapeInferenceFunction(b, &InferBitcastShape); + + TF_Status* status = TF_NewStatus(); + TF_RegisterOpDefinition(b, status); + if (TF_GetCode(status) != TF_OK) { /* handle errors */ } +} + +``` + +### Shape Inference + +A significant feature of certain ops is their ability to infer their output +shapes. TensorFlow will invoke the registered shape inference function (if one +is provided) when it needs to know the op's output shape. The registration +function declaration is shown below: + + +```c++ +void TF_OpDefinitionBuilderSetShapeInferenceFunction( + TF_OpDefinitionBuilder* builder, + void (*shape_inference_func)(TF_ShapeInferenceContext* ctx, TF_Status* status)); +``` + +A series of functions prefixed with `TF_ShapeInferenceContext` is provided for +the following purposes: + +* Examining operator input shapes (`TF_ShapeInferenceContextGetInput`) + +* Creating and deleting shape and dimension handles (`TF_{New,Delete}ShapeHandle`, `TF_{New,Delete}DimensionHandle`) + +* Manipulating shape and dimension handles (`TF_ShapeInferenceContextWithRank`, `TF_ShapeInferenceContextDim`) + +In general, C analogues to the C++ methods in `tensorflow::shape_inference` +(see `tensorflow/core/framework/shape_inference.h`) will be provided. + +## Kernels + +This section introduces changes to the C API that are required to support +kernels. An alpha version of this API is already checked in at +`tensorflow/c/kernels.h`. + +### Registration + +Kernel registration with the C++ API is accomplished with the +`REGISTER_KERNEL_BUILDER` macro. This macro expands to code that relies on +static initialization to register the provided kernel with the global kernel +registry. See below for an example of registering a kernel with the C++ API: + +```c++ + +#include "tensorflow/core/framework/op_kernel.h" + +class BitcastOp : public OpKernel { + explicit BitcastOp(OpKernelConstruction* context) : OpKernel(context) { … } + void Compute(OpKernelContext* context) override { … } +}; + +REGISTER_KERNEL_BUILDER(Name("Bitcast").Device(DEVICE_CPU), BitcastOp) +``` + +The equivalent C API provides a series of functions that operate on +`TF_KernelBuilder`, an opaque struct obtained with the `TF_NewKernelBuilder` call. +The kernel builder is registered with TensorFlow using the +`TF_RegisterKernelBuilder` function. See below for an example of registering +the bitcast kernel using the C API: + +```c++ +#include "tensorflow/c/kernels.h" + +typedef struct bitcast_kernel { … } bitcast_kernel; + +// Bitcast_Create, Bitcast_Compute and Bitcast_Delete actually implement the +// kernel. See the section below for discussion on kernel implementation. +static void* Bitcast_Create(TF_OpKernelConstruction* context) { + bitcast_kernel* k = (bitcast_kernel*) calloc(1, sizeof(bitcast_kernel)); + /* initialize the fields of k as needed */ + return (void*) k; +} + +static void* Bitcast_Compute(void* k, TF_OpKernelContext* context) { + bitcast_kernel* kernel = (bitcast_kernel*) k; // this is the pointer returned by + // Bitcast_Create + /* compute the result */ + TF_SetOutput(context, ...); +} + +static void Bitcast_Delete(void *k) { free(k); } + +void InitPlugin() { + TF_KernelBuilder* builder = TF_NewKernelBuilder(/*op_name*/"Bitcast", DEVICE_CPU, + &Bitcast_Create, &Bitcast_Compute, &Bitcast_Delete); + TF_Status* status = TF_NewStatus(); + TF_RegisterKernelBuilder(/*kernel_name*/"Bitcast", builder, status); + if (TF_GetCode(status) != TF_OK) { /* handle errors */ } + TF_DeleteStatus(status); +} +``` + +The registration function prototypes are provided below. Kernel authors must +provide a compute function. Creation and deletion functions are optional, but +if a creation function is provided that causes memory allocation, a deletion +function that frees the memory should also be provided, otherwise a leak will +occur. + +```c++ +TF_KernelBuilder* TF_NewKernelBuilder( + const char* op_name, const char* device_name, + void* (*create_func)(TF_OpKernelConstruction*), + void (*compute_func)(void*, TF_OpKernelContext*), + void (*delete_func)(void*)); + +void TF_RegisterKernelBuilder(const char* name, TF_KernelBuilder* builder, + TF_Status* status); +``` + +### Implementation + +The main classes for C++ kernel implementations are `OpKernelCreation` +(provided by TensorFlow to the kernel constructor) and `OpKernelContext` +(provided to the kernel's `Compute` method). The analogues in the C API are +`TF_OpKernelCreation` and `TF_OpKernelContext`. The aim of the C API is to +provide functions for working with these structs that match, as closely as +possible, the C++ API. + +### Inputs and Outputs + +Kernels must be able to retrieve their inputs and provide outputs. In the C++ +API, the tensorflow::OpKernelContext::GetInput and SetOutput family of +functions provide this functionality. The equivalent C calls will be +`TF_GetInput` and `TF_SetInput`. These functions operate on `TF_Tensor`, which +is already part of the existing TensorFlow C API. + +String tensors will be supported in an ABI-stable way. This will require +changes to their binary representation described in the [tstring design +document](https://github.com/tensorflow/community/blob/master/rfcs/20190411-string-unification.md). + +## C++ Header-Only API + +As described above, the main motivation for providing a C API is ABI stability. +However, some programmers may find the C API less convenient than the +non-ABI-stable C++ API. To address this concern, we plan to provide a +header-only C++ API that is implemented in terms of the ABI-stable C API. This +API will contain classes such as `Tensor`, `OpKernelContext`, and +`OpKernelConstruction`, whose names will be familiar to existing C++ API users. +Ideally, this API will be as close as possible to the existing non-ABI-stable +Tensorflow C++ API, so that kernels and ops currently implemented in C++ may be +ported to the ABI-stable C++ with as little implementation churn as possible. diff --git a/rfcs/20190814-kernel-and-op-registration/device_api_overview.png b/rfcs/20190814-kernel-and-op-registration/device_api_overview.png new file mode 100644 index 000000000..8803ead79 Binary files /dev/null and b/rfcs/20190814-kernel-and-op-registration/device_api_overview.png differ