Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up serialization and deserialization #55

Merged
merged 30 commits into from
Jan 31, 2024

Conversation

andersfugmann
Copy link
Contributor

@andersfugmann andersfugmann commented Dec 31, 2023

Benchmarking against ocaml-protoc showed a huge number of allocations for both serialization and deserialization, which prompted some work on speeding up serialization and deserialization:

  • A quick rewrite to use exceptions instead of the error monad showed a x2 speedup for deserialization
  • Serializing directly to the output buffer yields yet another speed improvement for serialization (~x2 speedup)
  • Deserialization reads directly from the input buffer to avoid allocations and makes good use of static evaluation. This improves deserialization speed by 2-4x, depending on input. Two strategies for parsing input:
    • If input is ordered (its recommended to send fields in sorted order), construction is O(1).
    • If input is un-ordered a map is constructed with has O(n log n), and uses ref cells, which hurts the GC (adds roots)
  • Improving code generation for extensions and constructors (make), which now uses default values for optional fields rather than using pattern matching to unpack values (potentially reduce GC strain and improve inlining opportunities)
    Other changes includes:
    • A bug-fix for extensions, where setting an extension did not remove the previous values in some cases.
    • Improve emitted code by stripping whitespace at end of line and ensure all files ends with an empty line

For simple structures, ocaml-protoc-plugin is still slower than ocaml-protoc, but for more complex structures speed is comparable.

I'll note that ocaml-protoc has chosen to be non-compliant, and uses c stubs for faster encoding and decoding. Some observations on ocaml-protoc:

  • Does not allow repeated enums
  • Sends fields in reverse order (which is allowed but not recommended)
  • Always sends fields, even when fields have default values
  • Only support packed repeated fields
  • No proto2 support
  • Does not support extensions

@andersfugmann andersfugmann requested a review from a team as a code owner December 31, 2023 08:13
@andersfugmann andersfugmann force-pushed the andersfugmann/no_result branch from c316318 to 67b8fcf Compare December 31, 2023 08:58
@andersfugmann andersfugmann marked this pull request as draft January 1, 2024 22:33
@andersfugmann andersfugmann force-pushed the andersfugmann/no_result branch from 770f31d to 754c8b7 Compare January 1, 2024 23:39
@andersfugmann
Copy link
Contributor Author

I don't really know how the changes can be reviewed, as they are massive! I think the tests are pretty complete and tests a lot of things, as well as the benchmarks which also verifies results.

@andersfugmann andersfugmann marked this pull request as ready for review January 7, 2024 21:08
@andersfugmann andersfugmann changed the title Speed up deserialization by using exceptions for error propergation Speed up serialization and deserialization Jan 7, 2024
…ther than the result monad.

 * User facing API remains unchanged
 * This change speeds up deserialization by a factor of ~2
… serialization and is now within a factor of x2 of ocaml-protoc in terms of serialization and deserialization speed
…stinction between proto2 and proto3 in the spec
…s. Also loop unroll varint encoding and add tests to verify correct encoding for varints.
…iting legth delimited fields, as well as loop unroll some functions.
… use seq under the assumption that List.of_seq is well optimized.
@andersfugmann andersfugmann force-pushed the andersfugmann/no_result branch from e0fefbe to 17a6f0f Compare January 7, 2024 23:05
@andersfugmann
Copy link
Contributor Author

@AndreasDahl Any suggestions on how we can get this PR merged? I understand that its huge, but I believe the test coverage quite high.
A suggestion would be to focus on coding style and comment if the libraries needed for running the benchmark (make bench) should be added to the opam file.

… flamba

  * Assume orderd fields when reading on fast-path and revert to a slow path if/when encountering unordered fields.
  * Sort fields to follow standard implementations
  * Change constructor to take extensions last to speedup handling of extensions
  * Delete locate options.ml as we require ocaml >=4.08
@andersfugmann
Copy link
Contributor Author

With the latest optimizations, ocaml-protoc-plugin is now comparable with protoc:

Numbers below shows relation to protoc. A value above one means that Ocaml-protoc-plugin is slower and below means faster.
The benchmark is run with flambda settings: -O3 -unbox-closures -remove-unused-arguments -rounds 3 -inline 100.00 -inline-max-depth 3 -inline-max-unroll 3

name protoc plugin ratio
string_list.M/Decode 45673.3403 ns/run 53547.6444 ns/run 1.172
string_list.M/Encode 77058.1932 ns/run 46928.1161 ns/run .608
float_list.M/Decode 11195.2996 ns/run 14448.8059 ns/run 1.290
float_list.M/Encode 19415.1014 ns/run 8767.3420 ns/run .451
int64_list.M/Decode 16906.9672 ns/run 14329.7493 ns/run .847
int64_list.M/Encode 13556.7060 ns/run 11329.0867 ns/run .835
enum.M/Decode 31.9749 ns/run 72.6818 ns/run 2.273
enum.M/Encode 68.3581 ns/run 59.7913 ns/run .874
string.M/Decode 50.1519 ns/run 91.4369 ns/run 1.823
string.M/Encode 77.2197 ns/run 86.7556 ns/run 1.123
float.M/Decode 28.6674 ns/run 72.1196 ns/run 2.515
float.M/Encode 66.3969 ns/run 62.3008 ns/run .938
int64.M/Decode 30.8383 ns/run 72.8756 ns/run 2.363
int64.M/Encode 66.0895 ns/run 58.9611 ns/run .892
bench.M/Decode 157661.3070 ns/run 151011.1603 ns/run .957
bench.M/Encode 217168.8299 ns/run 123210.2917 ns/run .567

Benchmark still shows x2 slower decoding on simple messages (messages with only one field).

…and simplify

read and write of varints. Using flambda, this has proven to be the fastest implementation.
@andersfugmann
Copy link
Contributor Author

@AndreasDahl Repinging on this again. I'll stop adding commits to this for now, and I'll look into some of the other issues. However, I'd like to get this merged soon, as I intend to base future work on top of this. Could I ask for a quick review? I don't know if you have some tests that you can run on top of this code (if its still being used actively), but I don't think an in-depth review is sensible at this point with all the changes made.

@andersfugmann
Copy link
Contributor Author

andersfugmann commented Jan 30, 2024

@hongy20 (as representing the @issue/platypus team who are marked as CODEOWNERS) , @AndreasDahl do you have suggestions on how to proceed on this?

I propose that we either.

  1. We agree that as a general rule all code should be reviewed, and you will prioritizing reviewing.
  2. We don't really care about having code on master reviewed. Remove the CODEOWERS and I will just push to master at will
  3. Transfer the repository to me and you can create / maintain a local fork.

In case I don't get a timely response, I will start merging to main and prepare a new release, but its really not an ideal solution. I would much rather prefer 3. if Issuu has no interest in maintaining these repositories. I would propose the same for ocaml-zmq repository which also have a stale PR. ocaml-zmq repository should be transferred to the ocaml community though.

@andersfugmann
Copy link
Contributor Author

Note that I really do not prefer solution 2 and think we should avoid if possible.

Copy link
Contributor

@AndreasDahl AndreasDahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the late review. I'll start the discussion internally on how we want to maintain this going forward.

@andersfugmann andersfugmann merged commit 509e5da into master Jan 31, 2024
4 checks passed
@andersfugmann andersfugmann deleted the andersfugmann/no_result branch January 31, 2024 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants