You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Complex encoding: **0.18x** (5.6x slower than C)
309
+
- Complex decoding: **0.33x** (3.0x slower than C)
310
+
- 🎯 **Target:**~1.0x for simple decoding, ~0.7x for complex decoding (still needs Priority 2-4 optimizations)
311
+
302
312
**Architecture:**
303
313
- ✅ Hybrid encoding strategy (fast path for PyDict, `items()` for other mappings)
304
314
- ✅ Direct buffer writing with `doc.to_writer()` for nested documents
305
315
- ✅ Efficient `_id` field ordering at top level
306
316
- ✅ Direct byte reading for common types (single-pass bytes → Python dict)
307
317
- ✅ Fallback to Rust `bson` library for less common types
318
+
- ✅ **Comprehensive type caching** (all BSON types cached on first use)
308
319
- ✅ 100% test pass rate (60 tests: 58 passing + 2 skipped for optional numpy dependency)
309
320
310
321
**Performance Analysis:**
311
322
312
-
The Rust extension is currently slower than the C extension for both encoding and decoding. The main bottleneck is **Python FFI overhead** - creating Python objects from Rust incurs significant performance cost.
323
+
The Rust extension was initially slower than the C extension due to **Python FFI overhead** - specifically, repeated type imports on every BSON conversion. With comprehensive type caching now implemented, performance improved by ~24% (0.21x → 0.26x). However, significant overhead remains from:
324
+
- Python object creation for every BSON value (even with cached types)
325
+
- PyO3 FFI overhead when calling Python constructors
326
+
- Lack of fast paths for common types (C extension uses direct C API calls)
327
+
328
+
The type caching helped but wasn't the silver bullet we hoped for. The C extension's performance advantage comes from using low-level C API calls (`PyLong_FromLong`, `PyUnicode_FromStringAndSize`, etc.) instead of calling Python constructors through FFI.
313
329
314
-
**Recommendation:** C extension remains the default and recommended choice. The Rust extension demonstrates feasibility and correctness but is not yet performance-competitive for production use.
330
+
**Recommendation:** C extension remains the default and recommended choice. The Rust extension demonstrates feasibility and correctness, with type caching providing modest improvements. Further optimizations (Priority 2-4) are needed to approach performance parity.
315
331
316
332
### Path to Performance Parity
317
333
318
334
Analysis of the C extension reveals several optimization opportunities to achieve near-parity performance:
319
335
320
-
#### Priority 1: Type Caching (HIGH IMPACT)
336
+
#### Priority 1: Type Caching (HIGH IMPACT) ✅ **IMPLEMENTED**
321
337
322
-
**Problem:** The Rust implementation calls `py.import()` on every BSON type conversion:
323
-
```rust
324
-
// Called millions of times during decoding!
325
-
letint64_module=py.import("bson.int64")?;
326
-
letint64_class=int64_module.getattr("Int64")?;
327
-
```
338
+
**Status:** ✅ **COMPLETE** - Comprehensive type caching has been implemented.
328
339
329
-
**Solution:**Cache Python type objects in module state (like C extension does):
340
+
**Implementation:**All BSON types are now cached using lazy initialization:
**Actual Impact:**~1.24x faster overall (0.21x → 0.26x average ratio)
382
+
**Actual Effort:**~6 hours
383
+
384
+
**Analysis:** Type caching provided modest improvements (~24%) but not the expected 2-3x speedup. The remaining bottleneck is Python object creation overhead through PyO3 FFI. The C extension's advantage comes from using direct C API calls (`PyLong_FromLong`, etc.) instead of calling Python constructors. Priority 2 (Fast Paths) is now critical to achieve further gains.
341
385
342
386
#### Priority 2: Fast Paths for Common Types (MEDIUM IMPACT)
0 commit comments