Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

trt.IStreamReader (as implemented e.g. in polygraphy) requires higher peak CPU memory and more time than naive python implementation. #4327

Open
michaelfeil opened this issue Jan 16, 2025 · 6 comments
Assignees
Labels
Investigating Issue needs further investigation Module:Runtime triaged Issue has been triaged by maintainers

Comments

@michaelfeil
Copy link

michaelfeil commented Jan 16, 2025

Description

I am trying to optimize the loading of a ~14.2GB tensorrt-llm engine on a 16GB CPU RAM node into a 16GB VRAM. As the rest of my program takes around ~1GB CPU RAM, there is little room for not streaming the CudaEngine from disk to cuda.

Upon trying out the trt.IStreamReader the class does not hold its promises.

  • its slower then reading the file in python.
  • it requires ~15GB CPU RAM overhead instead of 1GB CPU RAM with a naive implementation

Environment

TensorRT Version:

NVIDIA GPU: H100

/baseten/engine-builder/tei_trt# nvidia-smi
Wed Jan 15 23:59:04 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |

Operating System: Ubuntu 22.04

Python Version (if applicable): 3.10.2

PyTorch Version (if applicable): 2.5.1

Baremetal or Container (if so, version):

Relevant Files

Llama-7B engine created with TensorRT-LLM 0.16.0

Steps To Reproduce

iimport time
import tensorrt as trt

from pathlib import Path

def FileReaderVanilla(filepath):
    if not Path(filepath).exists():
        raise ValueError(f"File at {filepath} does not exist!")
    with open(filepath, "rb") as f:
        return f.read()
class FileReaderV1(trt.IStreamReader):
    """
    Class that supplies data to TensorRT from a stream. This may help reduce memory usage during deserialization.
    Moves engine file directly to CUDA memory, without loading it into CPU memory first.
    https://github.com/NVIDIA/TensorRT/blob/97ff24489d0ea979c418c7a0847dfc14c8483846/tools/Polygraphy/polygraphy/backend/trt/file_reader.py#L28
    Args:
        filepath (str):
                The path to the serialized file.

    ```python
    # roughly equivalent to:
    if not self.serialize_path.exists():
        raise ValueError(
            f"missing engine at serialize_path={self.serialize_path}"
        )
    with open(self.serialize_path, "rb") as f:
        yield f.read() # stream equivalent
    ```
    """
    def __init__(self, filepath):
        # Must explicitly initialize parent for any trampoline class! Will mysteriously segfault without this.
        trt.IStreamReader.__init__(self)  # type: ignore

        self.filepath = filepath

        if not Path(self.filepath).exists():
            raise ValueError(f"File at {self.filepath} does not exist!")
        self.file = open(self.filepath, "rb")
        
    def read(self, size: int) -> bytes:
        print(f"Reading {size} bytes")
        return self.file.read(size)

    def free(self):
        if self.file:
            self.file.close()

    def __enter__(self):
        # Open the file and create a memory map
        
        return self

    def __exit__(self, exc_type, exc_value, traceback):
        self.free()

class FileReaderV2(trt.IStreamReaderV2):
    """
    Class that supplies data to TensorRT from a stream, without loading the whole file into memory.
    Moves engine file directly to CUDA memory, without first allocating it all in CPU memory.

    Args:
        file (Path):
            The path to the serialized engine file.
    """
    def __init__(self, file_path):
        trt.IStreamReaderV2.__init__(self)
        self.bytes = Path(file_path).read_bytes()
        self.len = len(self.bytes)
        self.index = 0

    def read(self, size, cudaStreamPtr):
        
        assert self.index + size <= self.len
        data = self.bytes[self.index:self.index + size]
        self.index += size
        print(f"Reading {size} bytes, actual size: {len(data)}")
        return data

    def seek(self, offset, where):
        print(f" seek position: {offset} {where}")
        if where == trt.SeekPosition.SET:
            self.index = offset
        elif where == trt.SeekPosition.CUR:
            self.index += offset
        elif where == trt.SeekPosition.END:
            self.index = self.len - offset
        else:
            raise ValueError(f"Invalid seek position: {where}")

def init_runtime(reader):
    runtime = trt.Runtime(trt.Logger(trt.Logger.INFO))
    engine = runtime.deserialize_cuda_engine(reader)
    assert engine is not None
    return runtime, engine

def debug_max_memory_usage_filereaderv2():
    _ = init_runtime(FileReaderV2("/app/engines/rank0.engine"))
    time.sleep(1)

def debug_max_memory_usage_filereaderv1():
    _ = init_runtime(FileReaderV1("/app/engines/rank0.engine"))
    time.sleep(1)

def debug_max_memory_usage_filereader_vanilla():
    _ = init_runtime(FileReaderVanilla("/app/engines/rank0.engine"))
    time.sleep(1)

if __name__ == "__main__":
    # /usr/bin/time -v poetry run python ./tests/test_runtime_filereader.py
    debug_max_memory_usage_filereaderv2()

Vanilla results

8.4s + peak memory 15524688kB

/usr/bin/time -v poetry run python --vanilla
debug_max_memory_usage_filereader_vanilla()

  warnings.warn(
[TensorRT-LLM] TensorRT-LLM version: 0.16.0
        Command being timed: "poetry run python ./tests/test_runtime_filereader.py"
        User time (seconds): 8.40
        System time (seconds): 17.13
        Percent of CPU this job got: 109%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:23.25
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 15524688
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 6318
        Minor (reclaiming a frame) page faults: 3824756
        Voluntary context switches: 53551
        Involuntary context switches: 537
        Swaps: 0
        File system inputs: 0
        File system outputs: 24
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
(trt-tei-runtime-py3.10) root@michaelfeil-dev-pod-h100-0:~/baseten/engine-builde

IStreamReaderV1 loading:

  • User time (seconds): 10.27 (worse)
  • Maximum resident set size (kbytes): 29217388 (almost double)
/usr/bin/time -v poetry run python --stream
debug_max_memory_usage_filereader()
[TensorRT-LLM] TensorRT-LLM version: 0.16.0
        Command being timed: "poetry run python ./tests/test_runtime_filereader.py"
        User time (seconds): 10.27
        System time (seconds): 22.72
        Percent of CPU this job got: 111%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:29.65
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 29217388
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 6284
        Minor (reclaiming a frame) page faults: 7312826
        Voluntary context switches: 54294
        Involuntary context switches: 538
        Swaps: 0
        File system inputs: 0
        File system outputs: 24
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Analysis

The duplication of the memory is likely because of a parsing from python to cpp, which uses a copy. If the API was to read it in smaller chunks, this would not be as bad.

The .read(size) API is called twice with StreamV1 class, requesting the initial 32Bytes and then the rest.

# successful read that needs 29217388kB
reading 32 bytes from /app/engines/rank0.engine
reading 14244750076 bytes from /app/engines/rank0.engine 

pdb breakpoint delivers no additional info

builder/tei_trt/tests/test_runtime_filereader.py(7)init_runtime()
      6     runtime = trt.Runtime(trt.Logger([trt.Logger.INFO](http://trt.logger.info/)))
----> 7     engine = runtime.deserialize_cuda_engine(reader)
      8     assert engine is not None
> /workspace/model-performance/michaelfeil/baseten/engine-builder/tei_trt/trt_tei_runtime/trt_model.py(137)read()
    136         ipdb.set_trace()
--> 137         print(f"reading {size} bytes from {self.filepath}")
    138         return self.file.read(size)

Analysis IStreamReaderV2

Streamreaderv2 also reads out most in one file. This actually does fail.

 seek position: 0 SeekPosition.SET
 seek position: 0 SeekPosition.SET
Reading 32 bytes, acutal size: 32
Reading 48 bytes, acutal size: 48
 seek position: 80 SeekPosition.SET
Reading 6586564 bytes, acutal size: 6586564
 seek position: 6586648 SeekPosition.SET
Reading 13975421440 bytes, acutal size: 13975421440
Segmentation fault (core dumped)

Desired behavior:

Either:

  • accept if fewer bytes are returned, moving parts of the engine plan to GPU. I could limit a max return size in python to e.g. 1GB and C++ side would "need to make it work"
  • C++ side exposes a API for setting a max bytes size. Python can set this optional value to control the demand from C++ side.
  • the pybind interface / garbage collection on python side seems to be unclean, such that we have duplication of memory. (in-memory copy) instead of passing the value (as in vanilla bytes interface)

Commands or scripts:

Have you tried the latest release?: YES

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): polygraphy / tensorrt_llm

@michaelfeil michaelfeil changed the title trt.IStreamReader poor performance of TensorRT 10.7 trt.IStreamReader usage from polygraphy of TensorRT 10.7 requires higher memory and time than naive implementation. Jan 16, 2025
@michaelfeil michaelfeil changed the title trt.IStreamReader usage from polygraphy of TensorRT 10.7 requires higher memory and time than naive implementation. trt.IStreamReader usage from polygraphy requires higher peak memory and more time than naive python implementation. Jan 16, 2025
@michaelfeil michaelfeil changed the title trt.IStreamReader usage from polygraphy requires higher peak memory and more time than naive python implementation. trt.IStreamReader usage from polygraphy requires higher peak CPU memory and more time than naive python implementation. Jan 16, 2025
@lix19937
Copy link

polygraphy just a inference prototyping and debugging toolkit, not for the purpose of pursuing performance. Here it warp the trt.IStreamReader https://github.com/NVIDIA/TensorRT/blob/release/10.7/tools/Polygraphy/polygraphy/backend/trt/file_reader.py.

@michaelfeil
Copy link
Author

@lix19937 The above Implementation is an exact copy of https://github.com/NVIDIA/TensorRT/blob/release/10.7/tools/Polygraphy/polygraphy/backend/trt/file_reader.py (which is currently the only OSS implementation of trt.IStreamReader). The issue happens with both, the linked and the code in this issue.

@michaelfeil michaelfeil changed the title trt.IStreamReader usage from polygraphy requires higher peak CPU memory and more time than naive python implementation. trt.IStreamReader (as implemented e.g. in polygraphy) requires higher peak CPU memory and more time than naive python implementation. Jan 17, 2025
@pranavm-nvidia
Copy link
Collaborator

There's an IStreamReaderV2 which was meant to solve some of these problems: https://docs.nvidia.com/deeplearning/tensorrt/latest/inference-library/python-api-docs.html#deserializing-a-plan

@michaelfeil
Copy link
Author

michaelfeil commented Feb 3, 2025

@pranavm-nvidia As you see above, the StreamReaderV2 has the same issues, citing the above section on V2: Llama-8B-Fp16 engine.

Reading 32 bytes, acutal size: 32
Reading 48 bytes, acutal size: 48
 seek position: 80 SeekPosition.SET
Reading 6586564 bytes, acutal size: 6586564
 seek position: 6586648 SeekPosition.SET
Reading 13975421440 bytes, acutal size: 13975421440 # causes OOM on a 16GB machine

@pranavm-nvidia
Copy link
Collaborator

Huh yeah, that is strange. Looking at the implementation, it does indeed request the entire engine in one go. I assume this API was only intended to be used with GPU Direct Storage so that you bypass host memory entirely.

@jhalakpatel do you know?

@michaelfeil
Copy link
Author

class FileReaderV2(trt.IStreamReaderV2): # memory efficient impl
    """
    Class that supplies data to TensorRT from a stream, without loading the whole file into memory.
    Moves engine file directly to CUDA memory, without first allocating it all in CPU memory.
    Args:
         file (Path):
             The path to the serialized engine file.
    """
    def __init__(self, file_path):
        trt.IStreamReaderV2.__init__(self)
        # Open the file in binary mode without reading it all into memory.
        self._file = open(file_path, "rb")
        # Determine the file length (used for boundary checking and seeking).
        self._file.seek(0, 2)  # Move to the end of the file.
        self._length = self._file.tell()
        print("reading size of file", self._length)
        self._file.seek(0, 0)  # Return to the start of the file.

    def read(self, size, cudaStreamPtr):
        """
        Reads a chunk of the engine file from disk.

        Args:
            size (int): The number of bytes to read.
            cudaStreamPtr: A pointer to a CUDA stream (not used in this implementation).

        Returns:
            bytes: The next chunk of data from the file.

        Raises:
            ValueError: If the requested read would exceed the file's length.
        """
        current_pos = self._file.tell()
        if current_pos + size > self._length:
            raise ValueError(
                f"Attempt to read beyond end-of-file (current pos: {current_pos}, "
                f"requested size: {size}, total length: {self._length})"
            )
        data = self._file.read(size)
        print(f"Reading {size} bytes, actual size: {len(data)}")
        return data

    def seek(self, offset, where):
        """
        Repositions the file pointer based on the offset and seek position.

        Args:
            offset (int): The offset to seek.
            where (trt.SeekPosition): The reference position which can be:
                - trt.SeekPosition.SET: Begin at the start of the file.
                - trt.SeekPosition.CUR: Relative to the current file position.
                - trt.SeekPosition.END: Relative to the end of the file.

        Raises:
            ValueError: If an invalid seek position is provided.
        """
        print(f"seek position: {offset} {where}")
        if where == trt.SeekPosition.SET:
            self._file.seek(offset, 0)
        elif where == trt.SeekPosition.CUR:
            self._file.seek(offset, 1)
        elif where == trt.SeekPosition.END:
            # For a positive offset, move backward from the file end.
            self._file.seek(-offset, 2)
        else:
            raise ValueError(f"Invalid seek position: {where}")

The same engine is fine if I load it via the bytes API.

e.g. leading to:

reading size of file 1001586228
seek position: 0 SeekPosition.SET
seek position: 0 SeekPosition.SET
Reading 32 bytes, actual size: 32
Reading 48 bytes, actual size: 48
seek position: 80 SeekPosition.SET
Reading 5121756 bytes, actual size: 5121756
seek position: 5121840 SeekPosition.SET
Reading 724107264 bytes, actual size: 724107264
[michaelfeil-dev-pod-h100-0:2106123:0:2106123] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x72b730000000)
==== backtrace (tid:2106123) ====
 0 0x0000000000042520 __sigaction()  ???:0
 1 0x00000000001a6cf9 __nss_database_lookup()  ???:0
 2 0x00000000000a67ae tensorrt::PyStreamReaderV2::read()  :0
 3 0x0000000001124b59 getLogger()  ???:0
 4 0x00000000011285b6 getLogger()  ???:0
 5 0x0000000001128dbb getLogger()  ???:0
 6 0x00000000011614f2 createInferRuntime_INTERNAL()  ???:0
 7 0x00000000011619ab createInferRuntime_INTERNAL()  ???:0
 8 0x00000000000b367f pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<nvinfer1::ICudaEngine*, nvinfer1::IRuntime, nvinfer1::v_1_0::IStreamReaderV2&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, char const*, pybind11::call_guard<pybind11::gil_scoped_release>, pybind11::keep_alive<0ul, 1ul> >(nvinfer1::ICudaEngine* (nvinfer1::IRuntime::*)(nvinfer1::v_1_0::IStreamReaderV2&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, char const* const&, pybind11::call_guard<pybind11::gil_scoped_release> const&, pybind11::keep_alive<0ul, 1ul> const&)::{lambda(nvinfer1::IRuntime*, nvinfer1::v_1_0::IStreamReaderV2&)#1}, nvinfer1::ICudaEngine*, nvinfer1::IRuntime*, nvinfer1::v_1_0::IStreamReaderV2&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, char const*, pybind11::call_guard<pybind11::gil_scoped_release>, pybind11::keep_alive<0ul, 1ul> >(pybind11::cpp_function::initialize<nvinfer1::ICudaEngine*, nvinfer1::IRuntime, nvinfer1::v_1_0::IStreamReaderV2&, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg, char const*, pybind11::call_guard<pybind11::gil_scoped_release>, pybind11::keep_alive<0ul, 1ul> >(nvinfer1::ICudaEngine* (nvinfer1::IRuntime::*)(nvinfer1::v_1_0::IStreamReaderV2&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, char const* const&, pybind11::call_guard<pybind11::gil_scoped_release> const&, pybind11::keep_alive<0ul, 1ul> const&)::{lambda(nvinfer1::IRuntime*, nvinfer1::v_1_0::IStreamReaderV2&)#1}&&, nvinfer1::ICudaEngine* (*)(nvinfer1::IRuntime*, nvinfer1::v_1_0::IStreamReaderV2&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&, char const* const&, pybind11::call_guard<pybind11::gil_scoped_release> const&, pybind11::keep_alive<0ul, 1ul> const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN()  :0
 9 0x000000000004d0ce pybind11::cpp_function::dispatcher()  :0
10 0x0000000000110553 cfunction_call()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Objects/methodobject.c:543
11 0x00000000000c6e8c _PyObject_MakeTpCall()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Objects/call.c:215
12 0x00000000000c6e8c _PyObject_MakeTpCall()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Objects/call.c:216
13 0x00000000000c9ca2 _PyObject_VectorcallTstate()  /tmp/python-build.20250201000403.746361/Python-3.10.12/./Include/cpython/abstract.h:112
14 0x00000000000c9ca2 _PyObject_VectorcallTstate()  /tmp/python-build.20250201000403.746361/Python-3.10.12/./Include/cpython/abstract.h:99
15 0x00000000000c9ca2 method_vectorcall()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Objects/classobject.c:53
16 0x00000000001b79fd _PyObject_VectorcallTstate()  /tmp/python-build.20250201000403.746361/Python-3.10.12/./Include/cpython/abstract.h:114
17 0x0000000000070f23 call_function()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Python/ceval.c:5890
18 0x0000000000070f23 _PyEval_EvalFrameDefault()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Python/ceval.c:4181
19 0x00000000001b9b94 _PyEval_EvalFrame()  /tmp/python-build.20250201000403.746361/Python-3.10.12/./Include/internal/pycore_ceval.h:46
20 0x00000000001b9b94 _PyEval_Vector()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Python/ceval.c:5067
21 0x00000000001b79fd _PyObject_VectorcallTstate()  /tmp/python-build.20250201000403.746361/Python-3.10.12/./Include/cpython/abstract.h:114
22 0x000000000006dd27 call_function()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Python/ceval.c:5890
23 0x000000000006dd27 _PyEval_EvalFrameDefault()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Python/ceval.c:4198
24 0x00000000001b9b94 _PyEval_EvalFrame()  /tmp/python-build.20250201000403.746361/Python-3.10.12/./Include/internal/pycore_ceval.h:46
25 0x00000000001b9b94 _PyEval_Vector()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Python/ceval.c:5067
26 0x00000000000c9c68 _PyObject_VectorcallTstate()  /tmp/python-build.20250201000403.746361/Python-3.10.12/./Include/cpython/abstract.h:114
27 0x00000000000c9c68 method_vectorcall()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Objects/classobject.c:53
28 0x00000000001b79fd _PyObject_VectorcallTstate()  /tmp/python-build.20250201000403.746361/Python-3.10.12/./Include/cpython/abstract.h:114
29 0x0000000000070f23 call_function()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Python/ceval.c:5890
30 0x0000000000070f23 _PyEval_EvalFrameDefault()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Python/ceval.c:4181
31 0x00000000001b9b94 _PyEval_EvalFrame()  /tmp/python-build.20250201000403.746361/Python-3.10.12/./Include/internal/pycore_ceval.h:46
32 0x00000000001b9b94 _PyEval_Vector()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Python/ceval.c:5067
33 0x00000000000c700b _PyObject_FastCallDictTstate()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Objects/call.c:142
34 0x00000000000c7340 _PyObject_Call_Prepend()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Objects/call.c:431
35 0x000000000013506e slot_tp_init()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Objects/typeobject.c:7734
36 0x000000000013506e slot_tp_init()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Objects/typeobject.c:7739
37 0x000000000012cc1e type_call()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Objects/typeobject.c:1135
38 0x00000000000c6e8c _PyObject_MakeTpCall()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Objects/call.c:215
39 0x00000000000c6e8c _PyObject_MakeTpCall()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Objects/call.c:216
40 0x000000000006df2c call_function()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Python/ceval.c:5890
41 0x000000000006df2c _PyEval_EvalFrameDefault()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Python/ceval.c:4213
42 0x00000000001b9b94 _PyEval_EvalFrame()  /tmp/python-build.20250201000403.746361/Python-3.10.12/./Include/internal/pycore_ceval.h:46
43 0x00000000001b9b94 _PyEval_Vector()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Python/ceval.c:5067
44 0x00000000000c9c68 _PyObject_VectorcallTstate()  /tmp/python-build.20250201000403.746361/Python-3.10.12/./Include/cpython/abstract.h:114
45 0x00000000000c9c68 method_vectorcall()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Objects/classobject.c:53
46 0x00000000001b79fd _PyObject_VectorcallTstate()  /tmp/python-build.20250201000403.746361/Python-3.10.12/./Include/cpython/abstract.h:114
47 0x0000000000070f23 call_function()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Python/ceval.c:5890
48 0x0000000000070f23 _PyEval_EvalFrameDefault()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Python/ceval.c:4181
49 0x00000000001b9b94 _PyEval_EvalFrame()  /tmp/python-build.20250201000403.746361/Python-3.10.12/./Include/internal/pycore_ceval.h:46
50 0x00000000001b9b94 _PyEval_Vector()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Python/ceval.c:5067
51 0x00000000000c700b _PyObject_FastCallDictTstate()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Objects/call.c:142
52 0x00000000000c7340 _PyObject_Call_Prepend()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Objects/call.c:431
53 0x000000000013506e slot_tp_init()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Objects/typeobject.c:7734
54 0x000000000013506e slot_tp_init()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Objects/typeobject.c:7739
55 0x000000000012cc1e type_call()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Objects/typeobject.c:1135
56 0x00000000000c6e8c _PyObject_MakeTpCall()  /tmp/python-build.20250201000403.746361/Python-3.10.12/Objects/call.c:215
=================================
[michaelfeil-dev-pod-h100-0:2106123] *** Process received signal ***
[michaelfeil-dev-pod-h100-0:2106123] Signal: Segmentation fault (11)
[michaelfeil-dev-pod-h100-0:2106123] Signal code:  (-6)
[michaelfeil-dev-pod-h100-0:2106123] Failing at address: 0x20230b
[michaelfeil-dev-pod-h100-0:2106123] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x72c2b5c1a520]
[michaelfeil-dev-pod-h100-0:2106123] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x1a6cf9)[0x72c2b5d7ecf9]
[michaelfeil-dev-pod-h100-0:2106123] [ 2] /workspace/model-performance/michaelfeil/baseten/bei/.venv/lib/python3.10/site-packages/tensorrt_bindings/tensorrt.so(+0xa67ae)[0x72c07bca67ae]
[michaelfeil-dev-pod-h100-0:2106123] [ 3] /workspace/model-performance/michaelfeil/baseten/bei/.venv/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0x1124b59)[0x72c12a924b59]
[michaelfeil-dev-pod-h100-0:2106123] [ 4] /workspace/model-performance/michaelfeil/baseten/bei/.venv/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0x11285b6)[0x72c12a9285b6]
[michaelfeil-dev-pod-h100-0:2106123] [ 5] /workspace/model-performance/michaelfeil/baseten/bei/.venv/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0x1128dbb)[0x72c12a928dbb]
[michaelfeil-dev-pod-h100-0:2106123] [ 6] /workspace/model-performance/michaelfeil/baseten/bei/.venv/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0x11614f2)[0x72c12a9614f2]
[michaelfeil-dev-pod-h100-0:2106123] [ 7] /workspace/model-performance/michaelfeil/baseten/bei/.venv/lib/python3.10/site-packages/tensorrt_libs/libnvinfer.so.10(+0x11619ab)[0x72c12a9619ab]
[michaelfeil-dev-pod-h100-0:2106123] [ 8] /workspace/model-performance/michaelfeil/baseten/bei/.venv/lib/python3.10/site-packages/tensorrt_bindings/tensorrt.so(+0xb367f)[0x72c07bcb367f]
[michaelfeil-dev-pod-h100-0:2106123] [ 9] /workspace/model-performance/michaelfeil/baseten/bei/.venv/lib/python3.10/site-packages/tensorrt_bindings/tensorrt.so(+0x4d0ce)[0x72c07bc4d0ce]
[michaelfeil-dev-pod-h100-0:2106123] [10] /workspace/model-performance/michaelfeil/.asdf/installs/python/3.10.12/lib/libpython3.10.so.1.0(+0x110553)[0x72c2b5f10553]
[michaelfeil-dev-pod-h100-0:2106123] [11] /workspace/model-performance/michaelfeil/.asdf/installs/python/3.10.12/lib/libpython3.10.so.1.0(_PyObject_MakeTpCall+0x8c)[0x72c2b5ec6e8c]
[michaelfeil-dev-pod-h100-0:2106123] [12] /workspace/model-performance/michaelfeil/.asdf/installs/python/3.10.12/lib/libpython3.10.so.1.0(+0xc9ca2)[0x72c2b5ec9ca2]
[michaelfeil-dev-pod-h100-0:2106123] [13] /workspace/model-performance/michaelfeil/.asdf/installs/python/3.10.12/lib/libpython3.10.so.1.0(+0x1b79fd)[0x72c2b5fb79fd]
[michaelfeil-dev-pod-h100-0:2106123] [14] /workspace/model-performance/michaelfeil/.asdf/installs/python/3.10.12/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x5fd3)[0x72c2b5e70f23]
[michaelfeil-dev-pod-h100-0:2106123] [15] /workspace/model-performance/michaelfeil/.asdf/installs/python/3.10.12/lib/libpython3.10.so.1.0(+0x1b9b94)[0x72c2b5fb9b94]
[michaelfeil-dev-pod-h100-0:2106123] [16] /workspace/model-performance/michaelfeil/.asdf/installs/python/3.10.12/lib/libpython3.10.so.1.0(+0x1b79fd)[0x72c2b5fb79fd]
[michaelfeil-dev-pod-h100-0:2106123] [17] /workspace/model-performance/michaelfeil/.asdf/installs/python/3.10.12/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x2dd7)[0x72c2b5e6dd27]
[michaelfeil-dev-pod-h100-0:2106123] [18] /workspace/model-performance/michaelfeil/.asdf/installs/python/3.10.12/lib/libpython3.10.so.1.0(+0x1b9b94)[0x72c2b5fb9b94]
[michaelfeil-dev-pod-h100-0:2106123] [19] /workspace/model-performance/michaelfeil/.asdf/installs/python/3.10.12/lib/libpython3.10.so.1.0(+0xc9c68)[0x72c2b5ec9c68]
[michaelfeil-dev-pod-h100-0:2106123] [20] /workspace/model-performance/michaelfeil/.asdf/installs/python/3.10.12/lib/libpython3.10.so.1.0(+0x1b79fd)[0x72c2b5fb79fd]
[michaelfeil-dev-pod-h100-0:2106123] [21] /workspace/model-performance/michaelfeil/.asdf/installs/python/3.10.12/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x5fd3)[0x72c2b5e70f23]
[michaelfeil-dev-pod-h100-0:2106123] [22] /workspace/model-performance/michaelfeil/.asdf/installs/python/3.10.12/lib/libpython3.10.so.1.0(+0x1b9b94)[0x72c2b5fb9b94]
[michaelfeil-dev-pod-h100-0:2106123] [23] /workspace/model-performance/michaelfeil/.asdf/installs/python/3.10.12/lib/libpython3.10.so.1.0(_PyObject_FastCallDictTstate+0x6b)[0x72c2b5ec700b]
[michaelfeil-dev-pod-h100-0:2106123] [24] /workspace/model-performance/michaelfeil/.asdf/installs/python/3.10.12/lib/libpython3.10.so.1.0(_PyObject_Call_Prepend+0x100)[0x72c2b5ec7340]
[michaelfeil-dev-pod-h100-0:2106123] [25] /workspace/model-performance/michaelfeil/.asdf/installs/python/3.10.12/lib/libpython3.10.so.1.0(+0x13506e)[0x72c2b5f3506e]
[michaelfeil-dev-pod-h100-0:2106123] [26] /workspace/model-performance/michaelfeil/.asdf/installs/python/3.10.12/lib/libpython3.10.so.1.0(+0x12cc1e)[0x72c2b5f2cc1e]
[michaelfeil-dev-pod-h100-0:2106123] [27] /workspace/model-performance/michaelfeil/.asdf/installs/python/3.10.12/lib/libpython3.10.so.1.0(_PyObject_MakeTpCall+0x8c)[0x72c2b5ec6e8c]
[michaelfeil-dev-pod-h100-0:2106123] [28] /workspace/model-performance/michaelfeil/.asdf/installs/python/3.10.12/lib/libpython3.10.so.1.0(_PyEval_EvalFrameDefault+0x2fdc)[0x72c2b5e6df2c]
[michaelfeil-dev-pod-h100-0:2106123] [29] /workspace/model-performance/michaelfeil/.asdf/installs/python/3.10.12/lib/libpython3.10.so.1.0(+0x1b9b94)[0x72c2b5fb9b94]
[michaelfeil-dev-pod-h100-0:2106123] *** End of error message ***

@brnguyen2 brnguyen2 added Investigating Issue needs further investigation Module:Runtime triaged Issue has been triaged by maintainers labels Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Investigating Issue needs further investigation Module:Runtime triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

4 participants