Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_summarization.py isn't computing on GPU (but allocate memory) #18

Open
pltrdy opened this issue Jun 1, 2017 · 15 comments · May be fixed by #38
Open

run_summarization.py isn't computing on GPU (but allocate memory) #18

pltrdy opened this issue Jun 1, 2017 · 15 comments · May be fixed by #38

Comments

@pltrdy
Copy link

pltrdy commented Jun 1, 2017

Strange fact. I've been trying to train summarization with default parameters and default dataset.

It turns out that the GPU is detected (2017-06-01 17:57:05.548650: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)), some GPU memory is alocated for the process, but nothing is running on it (load: 0%). It sometimes peak at around 27% for very short time.

The GPU is a single 1080 with 8G RAM (4617MiB allocated by the process)

Any idea?

@pedrobalage
Copy link

@pltrdy, try to change the follow line
https://github.com/abisee/pointer-generator/blob/master/run_summarization.py#L125
to
default_device = tf.device('/gpu:0')

@pltrdy
Copy link
Author

pltrdy commented Jun 1, 2017

Thx @pedrobalage, I actually removed this block. I don't really understand why the device is hardcoded. Shouldn't it be left "default" (and then user can (un/)set CUDA_VISIBLE_DEVICE) or at least a parameter?

@pedrobalage
Copy link

@pltrdy, I didn't understand this decision too. This was the only way it worked for me. I also setup the CUDA_VISIBLE_DEVICE.

@pltrdy
Copy link
Author

pltrdy commented Jun 1, 2017

@pedrobalage thanks for sharing it, I didn't even noticed this line.

@abisee I forked, switched to python3 syntax & removed this line. My python 3 implementation wont be compatible because of bytes vs str differences so I'm not sure a PR would be welcomed for that.
https://github.com/pltrdy/pointer-generator

@ioana-blue
Copy link

thanks for pointing this out @pltrdy removing that line allows the code to run on the gpu and it's going 20x faster. so you pretty much made my life go by 20 times faster 👍

@tianjianjiang
Copy link

Another related discussion can be found in #3

@pltrdy
Copy link
Author

pltrdy commented Jun 1, 2017

@ioana-blue In fact I just complained then @pedrobalage helped us ;)

@tianjianjiang thx for referencing it. Interesting to notice that it's the same GPU. Can't figure out how this may work on K40 (used by the author) and not on 1080 since there is enough RAM. Maybe CUDA/TF version have some effect here. No idea.

@tianjianjiang
Copy link

@pltrdy You're welcome and I can also confirm that changing it from CPU to GPU can be worse on K80 and GT 650M.

@tianjianjiang
Copy link

BTW, there's a recent post about this topic https://datascience.stackexchange.com/questions/19220/choosing-between-cpu-and-gpu-for-training-a-neural-network
Also my GT 650M and K80 experience is usually a support to not use slow GPU with small RAM, and to be careful with GPU in virtualization (i.e. AWS EC2 and Google Cloud).

@abisee
Copy link
Owner

abisee commented Jun 6, 2017

Hi everyone,

I'm afraid this will be a rather unsatisfactory answer, but the reason that we set default device to CPU is that our code is derived from the TextSum code, which does the same thing (see this line). Similarly, we have a few

with tf.device("/gpu:0"):

lines in model.py that are derived from similar lines in this file.

Personally I'm not sure what is the rationale for doing this (I think I tried removing these lines and didn't see a difference). But that's the code we ran for our experiments so we left it like that for reproducibility, in case it matters.

Edit: When we ran on our K40, the GPU-Util was non-zero so the GPU was being used.

@ioana-blue
Copy link

I'm new to everything related to this work, but learning. From what I read in the docs and looking at the code, this is my understanding.

https://www.tensorflow.org/tutorials/using_gpu

explains that with tf.device will try to impose to use a certain device for certain computations, instead of the default device chosen by tf (which prioritizes gpus over cpus as far as I understand). if the device specified cannot be allocated, you get an exception, UNLESS you use allow_soft_placement=True, which our code uses: https://github.com/abisee/pointer-generator/blob/master/util.py#L26

That's why you can see various behavior depending on where you run the code, what's running concurrently, etc.

Hope this explains some of the mystery.

@Spandan-Madan
Copy link

I'm not entirely sure if I was able to make it to run on GPU. @ioana-blue @abisee @pltrdy @pedrobalage @tianjianjiang How long is a training step taking you approximately? For me every training step is about 1.5 seconds, is that the speed you're getting on GPU or the CPU?

@pltrdy
Copy link
Author

pltrdy commented Jul 10, 2017

@Spandan-Madan I'm around 1.8sec/step, default parameters, nvidia 1080.
The GPU usage might be sub-optimal as we can see it is not always 100% of load.

@eduOS
Copy link

eduOS commented Jul 16, 2017

It's better to switch to GPU, since the default setting would make the training extremely slow.

falcondai added a commit to falcondai/pointer-generator that referenced this issue Jul 26, 2017
@falcondai falcondai linked a pull request Jul 26, 2017 that will close this issue
@abisee
Copy link
Owner

abisee commented Aug 16, 2017

Hello again,

Though I'm sure I tried changing default_device several months ago and found no difference, I tried making the change again this week and it did make things faster. So we've now committed that change to the repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants