-
Notifications
You must be signed in to change notification settings - Fork 373
Description
Hi,
I am trying to run Taylor Softmax.
(0)
I run the python3 setup.py install and get:
root@7c09a3f30c39:/home/keras/notebook/nvme_raid/aveysov/pytorch-loss# python3 setup.py install
running install
running bdist_egg
running egg_info
creating pytorch_loss.egg-info
writing pytorch_loss.egg-info/PKG-INFO
writing dependency_links to pytorch_loss.egg-info/dependency_links.txt
writing top-level names to pytorch_loss.egg-info/top_level.txt
writing manifest file 'pytorch_loss.egg-info/SOURCES.txt'
reading manifest file 'pytorch_loss.egg-info/SOURCES.txt'
writing manifest file 'pytorch_loss.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build
creating build/lib.linux-x86_64-3.7
creating build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/swish.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/frelu.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/generalized_iou_loss.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/pc_softmax.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/focal_loss_old.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/focal_loss.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/one_hot.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/soft_dice_loss.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/amsoftmax.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/taylor_softmax.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/triplet_loss.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/__init__.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/label_smooth.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/hswish.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/ema.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/test.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/dice_loss.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/large_margin_softmax.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/lovasz_softmax.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/mish.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/conv_ops.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/ohem_loss.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/affinity_loss.py -> build/lib.linux-x86_64-3.7/pytorch_loss
copying pytorch_loss/dual_focal_loss.py -> build/lib.linux-x86_64-3.7/pytorch_loss
running build_ext
building 'focal_cpp' extension
creating /home/keras/notebook/nvme_raid/aveysov/pytorch-loss/build/temp.linux-x86_64-3.7
creating /home/keras/notebook/nvme_raid/aveysov/pytorch-loss/build/temp.linux-x86_64-3.7/csrc
Emitting ninja build file /home/keras/notebook/nvme_raid/aveysov/pytorch-loss/build/temp.linux-x86_64-3.7/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] /usr/local/cuda/bin/nvcc -I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7/sit
e-packages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c -c /home/keras/notebook/nvme_raid/aveysov/pytorch-loss/csrc/focal_kernel.cu -o /home/keras/notebook/nvme_raid/aveysov/pytorch-loss/build/temp.linux-x86_64-3.7/csrc
/focal_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPIL
ER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=focal_cpp -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
FAILED: /home/keras/notebook/nvme_raid/aveysov/pytorch-loss/build/temp.linux-x86_64-3.7/csrc/focal_kernel.o
/usr/local/cuda/bin/nvcc -I/opt/conda/lib/python3.7/site-packages/torch/include -I/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.7/site-packages/torch/include/TH -I/opt/conda/lib/python3.7/site-pack
ages/torch/include/THC -I/usr/local/cuda/include -I/opt/conda/include/python3.7m -c -c /home/keras/notebook/nvme_raid/aveysov/pytorch-loss/csrc/focal_kernel.cu -o /home/keras/notebook/nvme_raid/aveysov/pytorch-loss/build/temp.linux-x86_64-3.7/csrc/focal
_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYP
E="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=focal_cpp -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 -std=c++14
In file included from /usr/local/cuda/include/crt/math_functions.h:8958:0,
from /usr/local/cuda/include/crt/common_functions.h:295,
from /usr/local/cuda/include/cuda_runtime.h:115,
from <command-line>:0:
/usr/include/c++/7/cmath:45:15: fatal error: math.h: No such file or directory
compilation terminated.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1672, in _run_ninja_build
env=env)
File "/opt/conda/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
I run the python3 setup.py install command in my dockerized research environment, which is derived from the official PyTorch GPU images:
ARG BASE_IMAGE=pytorch/pytorch:1.9.0-cuda11.1-cudnn8-devel
FROM $BASE_IMAGE
I remember when I faced similar problems in the past, I did something like this for compilation of some CUDA kernels, but then I removed these lines (it was a while ago!):
RUN apt-get install gcc-5 g++-5 g++-5-multilib gfortran-5 -y && \
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 60 --slave /usr/bin/g++ g++ /usr/bin/g++-5 --slave /usr/bin/gfortran gfortran /usr/bin/gfortran-5 && \
update-alternatives --query gcc
RUN gcc --version
Could you maybe elaborate a bit here, since I am not very familiar with how the C++ ecosystem works.
(1)
As far as I see there is a standard autograd implementation and a custom CUDA implementation.
Since I am not very proficient with C++ and CUDA, may I ask what was the reasoning behind adding a custom CUDA kernel, was the autograd version too slow, or memory intensive?
Many thanks for you advice and code!