Conversation
There was a problem hiding this comment.
Pull request overview
为在 Ascend NPU 环境运行/验证 unstructured(hi_res/YOLOX 等)相关能力提供适配脚本与 monkey patch,包含对模型加载、推理与依赖屏蔽的处理。
Changes:
- 新增 NPU YOLOX 推理适配与 unstructured_inference monkey patch(模型加载、算子替换、推理重写)。
- 新增基准脚本与启动脚本(环境变量、LD_PRELOAD、依赖 mock)。
- 新增 OCR 侧适配模块(通过注入
pytesseract接口)以及一份 NPU fusion 结果 JSON。
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| runtime/ops/mapper/unstructured_npu/run.sh | Ascend NPU 启动脚本:设置 jemalloc/环境变量并运行 benchmark |
| runtime/ops/mapper/unstructured_npu/benchmark_npu.py | 基准入口:依赖深度 mock、初始化 NPU、调用 unstructured 分区逻辑并落盘结果 |
| runtime/ops/mapper/unstructured_npu/npu_adapter.py | 关键适配:requests 拦截、LayoutElements 替换、YOLOX 前向/解码/后处理重写与模型加载 |
| runtime/ops/mapper/unstructured_npu/ocr_npu_adapter.py | OCR 侧隔离进程(CPU PaddleOCR)+ 伪造 pytesseract 模块注入函数 |
| runtime/ops/mapper/unstructured_npu/fusion_result.json | 运行产物/调试信息:记录图融合统计 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| export OMP_NUM_THREADS=1 | ||
|
|
||
| # 6. Python 路径 (包含当前目录和 YOLOX) | ||
| export PYTHONPATH=$(pwd):$(pwd)/YOLOX-main:$PYTHONPATH |
There was a problem hiding this comment.
set -u 下直接拼接 $PYTHONPATH 会在变量未定义时报错并导致脚本退出。这里建议使用 ${PYTHONPATH:-} 兜底,或先 PYTHONPATH=${PYTHONPATH:-} 再追加。
| export PYTHONPATH=$(pwd):$(pwd)/YOLOX-main:$PYTHONPATH | |
| export PYTHONPATH="$(pwd):$(pwd)/YOLOX-main:${PYTHONPATH:-}" |
| # 3. 设置 LD_PRELOAD (覆盖式设置,防止重复) | ||
| # 注意:jemalloc 必须排在第一位,libgomp 排第二解决 TLS 问题 | ||
| export LD_PRELOAD="$JEMALLOC:$GOMP" | ||
|
|
There was a problem hiding this comment.
脚本只检查了 jemalloc 是否存在,但同样强依赖 libgomp.so.1 被预加载;若该文件不存在,动态加载器会报错/行为不确定。建议在设置 LD_PRELOAD 前也校验 $GOMP 是否存在,并给出明确错误信息。
|
|
||
| if model_name in _NPU_MODEL_CACHE: | ||
| return _NPU_MODEL_CACHE[model_name] | ||
|
|
||
| if os.path.exists("./yolox_l.pt"): | ||
| model_path = "./yolox_l.pt" | ||
| else: | ||
| model_path = "/mnt/nvme0n1/pjj-data/data/models/yolox_l.pt" | ||
|
|
||
| print(f"[NPU Adapter] Loading local model: {model_path}") | ||
|
|
||
| from unstructured_inference.models.yolox import UnstructuredYoloXModel | ||
| model = UnstructuredYoloXModel() | ||
| model.model_path = model_path | ||
|
|
There was a problem hiding this comment.
npu_get_model 使用了硬编码的本地绝对路径 /mnt/nvme0n1/.../yolox_l.pt 作为 fallback,这会导致在其他机器/容器中必然找不到模型并直接失败。建议改为:优先从配置/环境变量读取模型路径,或统一走 HuggingFace/缓存目录,并在错误信息中提示如何配置。
| if model_name in _NPU_MODEL_CACHE: | |
| return _NPU_MODEL_CACHE[model_name] | |
| if os.path.exists("./yolox_l.pt"): | |
| model_path = "./yolox_l.pt" | |
| else: | |
| model_path = "/mnt/nvme0n1/pjj-data/data/models/yolox_l.pt" | |
| print(f"[NPU Adapter] Loading local model: {model_path}") | |
| from unstructured_inference.models.yolox import UnstructuredYoloXModel | |
| model = UnstructuredYoloXModel() | |
| model.model_path = model_path | |
| if model_name in _NPU_MODEL_CACHE: | |
| return _NPU_MODEL_CACHE[model_name] | |
| # Resolve model path in a portable, configurable way. | |
| # Priority: | |
| # 1. NPU_YOLOX_MODEL_PATH (environment variable) | |
| # 2. ./yolox_l.pt (current working directory) | |
| # 3. ~/.cache/unstructured_npu/yolox_l.pt (user cache directory) | |
| env_model_path = os.environ.get("NPU_YOLOX_MODEL_PATH") | |
| cache_dir = os.path.join(os.path.expanduser("~"), ".cache", "unstructured_npu") | |
| cache_model_path = os.path.join(cache_dir, "yolox_l.pt") | |
| candidate_paths = [] | |
| if env_model_path: | |
| candidate_paths.append(env_model_path) | |
| candidate_paths.append("./yolox_l.pt") | |
| candidate_paths.append(cache_model_path) | |
| model_path = None | |
| for _path in candidate_paths: | |
| if _path and os.path.exists(_path): | |
| model_path = _path | |
| break | |
| if model_path is None: | |
| raise FileNotFoundError( | |
| "[NPU Adapter] YOLOX model file not found.\n" | |
| "Searched locations:\n" | |
| f" - NPU_YOLOX_MODEL_PATH={env_model_path!r}\n" | |
| " - ./yolox_l.pt\n" | |
| f" - {cache_model_path}\n\n" | |
| "Please either:\n" | |
| " 1. Set environment variable NPU_YOLOX_MODEL_PATH to the full path of yolox_l.pt, or\n" | |
| " 2. Place yolox_l.pt in the current working directory, or\n" | |
| " 3. Place yolox_l.pt under the cache directory shown above." | |
| ) | |
| print(f"[NPU Adapter] Loading local model: {model_path}") | |
| from unstructured_inference.models.yolox import UnstructuredYoloXModel | |
| model = UnstructuredYoloXModel() | |
| model.model_path = model_path |
| strides = [] | ||
|
|
||
| for (hsize, wsize), stride in zip(self.hw, self.strides): | ||
| yv, xv = torch.meshgrid([torch.arange(hsize), torch.arange(wsize)]) |
There was a problem hiding this comment.
torch.meshgrid 在较新的 PyTorch 版本中建议显式指定 indexing(例如 indexing="ij"),否则会产生警告,未来版本存在行为变更风险。建议在此处补上 indexing 参数以保证兼容性。
| yv, xv = torch.meshgrid([torch.arange(hsize), torch.arange(wsize)]) | |
| yv, xv = torch.meshgrid(torch.arange(hsize), torch.arange(wsize), indexing="ij") |
| if os.path.exists("npu_adapter.py"): | ||
| try: | ||
| import npu_adapter | ||
| logger.info("应用 YOLOX NPU 补丁...") | ||
| npu_adapter.apply_patches() | ||
| except Exception as e: | ||
| logger.error(f"NPU 适配器加载失败: {e}") | ||
| import traceback | ||
| traceback.print_exc() | ||
| sys.exit(1) |
There was a problem hiding this comment.
ocr_npu_adapter.py 提供了 apply_ocr_patch()(替换 pytesseract/unstructured_pytesseract),但当前 benchmark_npu.py 中在导入 unstructured 前并未调用该 patch,因此基于 pytesseract 的 OCR 路径仍会按原逻辑执行。若该 PR 目标包含 OCR 适配,建议在阶段 4(导入 unstructured 之前)显式调用该 patch,并提供开关以便按需启用。
| try: | ||
| elements, mode_desc = _extract_elements(file_path) | ||
| logger.info(f"模式: {mode_desc}") | ||
| except Exception as e: | ||
| logger.error(f"处理崩溃: {e}") | ||
| import traceback | ||
| traceback.print_exc() | ||
| return | ||
|
|
||
| duration = time.time() - start_time | ||
|
|
||
| if not elements: | ||
| logger.error("未提取到元素。") | ||
| return | ||
|
|
There was a problem hiding this comment.
脚本在处理失败时多数路径只是 logger.error(...); return,最终进程仍会以退出码 0 结束(例如 run_benchmark 捕获异常后直接返回、__main__ 找不到测试文件也不 sys.exit(1))。这会导致上层 run.sh/CI 无法感知失败。建议在失败场景下显式 sys.exit(1) 或重新抛出异常以返回非 0 退出码。
| import importlib.util | ||
| import importlib.machinery |
There was a problem hiding this comment.
importlib.util 在当前文件中未被使用(仅 import 未引用)。建议删除未使用的 import,避免 lint/静态检查噪音。
| import importlib.util | |
| import importlib.machinery |
| def safe_cat(tensors, dim=1): | ||
| try: | ||
| res = torch.cat(tensors, dim=dim) | ||
| torch.npu.synchronize() | ||
| return res | ||
| except Exception: | ||
| cpu_tensors = [t.cpu() for t in tensors] | ||
| if not cpu_tensors: return torch.tensor([], device=tensors[0].device) | ||
| return torch.cat(cpu_tensors, dim=dim).to(tensors[0].device) |
There was a problem hiding this comment.
safe_cat 的异常回退分支在 tensors 为空时会访问 tensors[0].device,会直接抛 IndexError(try 块里的 torch.cat([]) 本身也会触发异常)。建议在函数开头就处理空输入并允许显式传入/推断 device。
| def safe_cat(tensors, dim=1): | |
| try: | |
| res = torch.cat(tensors, dim=dim) | |
| torch.npu.synchronize() | |
| return res | |
| except Exception: | |
| cpu_tensors = [t.cpu() for t in tensors] | |
| if not cpu_tensors: return torch.tensor([], device=tensors[0].device) | |
| return torch.cat(cpu_tensors, dim=dim).to(tensors[0].device) | |
| def safe_cat(tensors, dim=1, device=None): | |
| # 先处理空输入,避免 torch.cat([]) 和 tensors[0] 访问异常 | |
| if not tensors: | |
| if device is not None: | |
| return torch.tensor([], device=device) | |
| return torch.tensor([]) | |
| # 对于非空输入,如未显式传入 device,则从首个 tensor 推断 | |
| if device is None: | |
| device = tensors[0].device | |
| try: | |
| res = torch.cat(tensors, dim=dim) | |
| torch.npu.synchronize() | |
| return res | |
| except Exception: | |
| cpu_tensors = [t.cpu() for t in tensors] | |
| return torch.cat(cpu_tensors, dim=dim).to(device) |
| pass | ||
|
|
||
| return _orig_request(self, method, url, *args, **kwargs) | ||
|
|
||
| requests.Session.request = mocked_request |
There was a problem hiding this comment.
这里对 requests.Session.request 做了全局 monkey patch(在模块 import 时立即生效),会影响进程内所有使用 requests 的调用方,带来不可控的网络行为变化/排障困难。建议把拦截逻辑改为更局部的实现(例如仅在下载 YOLOX 权重的代码路径中使用自定义 Session/Adapter,或通过环境变量开关启用)。
| pass | |
| return _orig_request(self, method, url, *args, **kwargs) | |
| requests.Session.request = mocked_request | |
| # 出现异常时回退到原始 URL | |
| pass | |
| return _orig_request(self, method, url, *args, **kwargs) | |
| # 通过环境变量控制是否启用对 requests.Session.request 的全局 monkey patch。 | |
| # 设置 NPU_ADAPTER_PATCH_REQUESTS=1 / true / yes 时才会启用。 | |
| _enable_global_requests_patch = os.environ.get("NPU_ADAPTER_PATCH_REQUESTS", "").strip().lower() in ( | |
| "1", | |
| "true", | |
| "yes", | |
| ) | |
| if _enable_global_requests_patch: | |
| requests.Session.request = mocked_request |
| import types | ||
| import torch | ||
| import torch_npu | ||
| import numpy as np | ||
| import requests | ||
| from torchvision.ops import nms | ||
| from requests.exceptions import ConnectionError |
There was a problem hiding this comment.
该文件里 types 与 ConnectionError 目前未被使用(仅 import 未引用),会增加静态检查噪音。建议删除未使用的 import,或在确有用途时补上对应代码路径。
| import types | |
| import torch | |
| import torch_npu | |
| import numpy as np | |
| import requests | |
| from torchvision.ops import nms | |
| from requests.exceptions import ConnectionError | |
| import torch | |
| import torch_npu | |
| import numpy as np | |
| import requests | |
| from torchvision.ops import nms |
npu上unstructuredio算子的适配