some suggestions for the Python code

There's about a 15% speedup by tweaking the data classes.

CPython's `__slots__` is a bit of a hack which makes an instance attribute faster to look up, and it makes the instance more compact. It is not in the Python 3.7 dataclass decorator (see https://www.python.org/dev/peps/pep-0557/#support-for-automatically-setting-slots ). It can be added manually, which is a recommended workaround. Doing that gives a benefit to your Python benchmark, improving it from the original:
```
# 36077.7910000003 μs
@dataclass
class Vertex:
    x: float
    y: float
    z: float
```
to
```
# 32817.94299999996 μs
@dataclass
class Vertex:
    __slots__ = ("x", "y", "z")
    x: float
    y: float
    z: float
```
Then, for a reason I don't understand, the default `__init__` adds measurable though small overhead compared to a manual one.
```
# 30137.319999997915 μs
@dataclass
class Vertex:
    __slots__ = ("x", "y", "z")
    x: float
    y: float
    z: float
    def __init__(self, x, y, z):
        self.x = x
        self.y = y
        self.z = z
```
However, a manual `__init__` seems wrong given the goal of the dataclass decorator.

If I add a `__slots__ = ("normal", "v1", "v2", "v3")` to the Triangle class, the timing drops further, to 28504 μs.

There are a couple of microoptimizations which improved things by a couple of percent, but not enough to warrant them being considered in this benchmark.

## ctypes alternative

One way to get better performance is to use the ctypes module from the standard library. The following takes about 124 μs:
```
import struct
import timeit
import ctypes

class Vertex(ctypes.Structure):
    _pack_ = 4
    _fields_ = [("x", ctypes.c_float),
                ("y", ctypes.c_float),
                ("z", ctypes.c_float)]
        
class Triangle(ctypes.Structure):
    _pack_ = 2
    _fields_ = [("normal", Vertex),
                ("v1", Vertex),
                ("v2", Vertex),
                ("v3", Vertex),
                ("_ignore", ctypes.c_short)]

def parse(path: str):
    with open(path, 'rb') as stl:
        stl.seek(80)  # skip header
        trianglecount = struct.unpack('I', stl.read(4))[0]

        buffer_size = 50 * trianglecount
        s = stl.read(buffer_size)
        assert len(s) == (buffer_size), (len(s), buffer_size)
        return (Triangle*trianglecount).from_buffer_copy(s)

def benchmark():
    triangles = parse('nist.stl')
    ## print("blah", sum(triangle.normal.x + triangle.v1.y + triangle.v2.z + triangle.v3.x
    ##                       for triangle in triangles))

time = min(timeit.Timer(benchmark).repeat(number=1, repeat=500)) * 1e6

print(str(time) + " μs")
```

It's a bit of a cheat as there isn't any object instantiation. If I uncomment the test code, the benchmark time goes to 6881 μs. If I compromise and instead Triangle instances but on-demand Vertex instances, using `return list((Triangle*trianglecount).from_buffer_copy(s))` then the parse time goes to 1560 μs and the benchmark+test code only slightly increases to 7000 μs.

## NumPy alternative

If you're willing to give up the attribute accession API, another option is to use NumPy, and bring the timing down to 80 μs. With structured types I can reference triangles[10].v1.y as triangles[10]["v1"]["y"]. However, I don't think this is acceptable for what you are looking for.
```
import numpy as np
import struct
import timeit

point = [("x", np.float32), ("y", np.float32), ("z", np.float32)]
triangle_fields = np.dtype([
    ("normal", point),
    ("v1", point),
    ("v2", point),
    ("v3", point),
    ("ignore", "2S")
    ])
def parse(path: str):
    with open(path, 'rb') as stl:
        stl.seek(80)  # skip header
        trianglecount = struct.unpack('I', stl.read(4))[0]

        s = stl.read(50 * trianglecount)
        assert len(s) == (50 * trianglecount), (len(s), 50 * trianglecount)
        return np.frombuffer(s, triangle_fields, count=trianglecount)

def benchmark():
    triangles = parse('nist.stl')
    ## print("blah", sum(triangle["normal"]["x"] + triangle["v1"]["y"] + triangle["v2"]["z"] + triangle["v3"]["x"]
    ##                       for triangle in triangles))

time = min(timeit.Timer(benchmark).repeat(number=1, repeat=500)) * 1e6

print(str(time) + " μs")
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

some suggestions for the Python code #1

ctypes alternative

NumPy alternative

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

some suggestions for the Python code #1

Description

ctypes alternative

NumPy alternative

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions