-
Notifications
You must be signed in to change notification settings - Fork 259
beginner_source/ddp_series_theory.rst 번역 #896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
90aec72
1f42fa2
d525f21
a6d27e1
ba40331
f3edc12
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,70 +1,69 @@ | ||
`Introduction <ddp_series_intro.html>`__ \|\| **What is DDP** \|\| | ||
`Single-Node Multi-GPU Training <ddp_series_multigpu.html>`__ \|\| | ||
`Fault Tolerance <ddp_series_fault_tolerance.html>`__ \|\| | ||
`Multi-Node training <../intermediate/ddp_series_multinode.html>`__ \|\| | ||
`minGPT Training <../intermediate/ddp_series_minGPT.html>`__ | ||
`소개 <ddp_series_intro.html>`__ \|\| **분산 데이터 병렬 처리 (DDP) 란 무엇인가?** \|\| | ||
`단일 노드 다중-GPU 학습 <ddp_series_multigpu.html>`__ \|\| | ||
`결함 내성 <ddp_series_fault_tolerance.html>`__ \|\| | ||
`다중 노드 학습 <../intermediate/ddp_series_multinode.html>`__ \|\| | ||
`minGPT 학습 <../intermediate/ddp_series_minGPT.html>`__ | ||
|
||
What is Distributed Data Parallel (DDP) | ||
분산 데이터 병렬 처리 (DDP) 란 무엇인가? | ||
======================================= | ||
|
||
Authors: `Suraj Subramanian <https://github.com/suraj813>`__ | ||
저자: `Suraj Subramanian <https://github.com/suraj813>`__ | ||
번역: `박지은 <https://github.com/rumjie>`__ | ||
|
||
.. grid:: 2 | ||
|
||
.. grid-item-card:: :octicon:`mortar-board;1em;` What you will learn | ||
.. grid-item-card:: :octicon:`mortar-board;1em;` 이 장에서 배우는 것 | ||
|
||
* How DDP works under the hood | ||
* What is ``DistributedSampler`` | ||
* How gradients are synchronized across GPUs | ||
* DDP 의 내부 작동 원리 | ||
* ``DistributedSampler`` 이란 무엇인가? | ||
* GPU 간 변화도가 동기화되는 방법 | ||
|
||
|
||
.. grid-item-card:: :octicon:`list-unordered;1em;` Prerequisites | ||
.. grid-item-card:: :octicon:`list-unordered;1em;` 필요 사항 | ||
|
||
* Familiarity with `basic non-distributed training <https://tutorials.pytorch.kr/beginner/basics/quickstart_tutorial.html>`__ in PyTorch | ||
* 파이토치 `비분산 학습 <https://tutorials.pytorch.kr/beginner/basics/quickstart_tutorial.html>`__ 에 익숙할 것 | ||
|
||
Follow along with the video below or on `youtube <https://www.youtube.com/watch/Cvdhwx-OBBo>`__. | ||
아래의 영상이나 `유투브 영상 youtube <https://www.youtube.com/watch/Cvdhwx-OBBo>`__ 을 따라 진행하세요. | ||
|
||
.. raw:: html | ||
|
||
<div style="margin-top:10px; margin-bottom:10px;"> | ||
<iframe width="560" height="315" src="https://www.youtube.com/embed/Cvdhwx-OBBo" frameborder="0" allow="accelerometer; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> | ||
</div> | ||
|
||
This tutorial is a gentle introduction to PyTorch `DistributedDataParallel <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__ (DDP) | ||
which enables data parallel training in PyTorch. Data parallelism is a way to | ||
process multiple data batches across multiple devices simultaneously | ||
to achieve better performance. In PyTorch, the `DistributedSampler <https://pytorch.org/docs/stable/data.html#torch.utils.data.distributed.DistributedSampler>`__ | ||
ensures each device gets a non-overlapping input batch. The model is replicated on all the devices; | ||
each replica calculates gradients and simultaneously synchronizes with the others using the `ring all-reduce | ||
algorithm <https://tech.preferred.jp/en/blog/technologies-behind-distributed-deep-learning-allreduce/>`__. | ||
이 튜토리얼은 파이토치에서 분산 데이터 병렬 학습을 가능하게 하는 `분산 데이터 병렬 <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__ (DDP) | ||
에 대해 소개합니다. 데이터 병렬 처리란 더 높은 성능을 달성하기 위해 | ||
여러 개의 디바이스에서 여러 데이터 배치들을 동시에 처리하는 방법입니다. | ||
파이토치에서, `분산 샘플러 <https://pytorch.org/docs/stable/data.html#torch.utils.data.distributed.DistributedSampler>`__ 는 | ||
각 디바이스가 서로 다른 입력 배치를 받는 것을 보장합니다. | ||
모델은 모든 디바이스에 복제되며, 각 사본은 변화도를 계산하는 동시에 `Ring-All-Reduce | ||
알고리즘 <https://tech.preferred.jp/en/blog/technologies-behind-distributed-deep-learning-allreduce/>`__ 을 사용해 다른 사본과 동기화됩니다. | ||
|
||
This `illustrative tutorial <https://tutorials.pytorch.kr/intermediate/dist_tuto.html#>`__ provides a more in-depth python view of the mechanics of DDP. | ||
`예시 튜토리얼 <https://tutorials.pytorch.kr/intermediate/dist_tuto.html#>`__ 에서 DDP 메커니즘에 대해 파이썬 관점에서 심도 있는 설명을 볼 수 있습니다. | ||
|
||
Why you should prefer DDP over ``DataParallel`` (DP) | ||
``데이터 병렬 DataParallel`` (DP) 보다 DDP가 나은 이유 | ||
---------------------------------------------------- | ||
|
||
`DataParallel <https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html>`__ | ||
is an older approach to data parallelism. DP is trivially simple (with just one extra line of code) but it is much less performant. | ||
DDP improves upon the architecture in a few ways: | ||
|
||
+---------------------------------------+------------------------------+ | ||
| ``DataParallel`` | ``DistributedDataParallel`` | | ||
+=======================================+==============================+ | ||
| More overhead; model is replicated | Model is replicated only | | ||
| and destroyed at each forward pass | once | | ||
+---------------------------------------+------------------------------+ | ||
| Only supports single-node parallelism | Supports scaling to multiple | | ||
| | machines | | ||
+---------------------------------------+------------------------------+ | ||
| Slower; uses multithreading on a | Faster (no GIL contention) | | ||
| single process and runs into Global | because it uses | | ||
| Interpreter Lock (GIL) contention | multiprocessing | | ||
+---------------------------------------+------------------------------+ | ||
|
||
Further Reading | ||
`DP <https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html>`__ 는 데이터 병렬 처리의 이전 접근 방식입니다. | ||
DP 는 간단하지만, (한 줄만 추가하면 됨) 성능은 훨씬 떨어집니다. DDP는 아래와 같은 방식으로 아키텍처를 개선합니다. | ||
|
||
.. list-table:: | ||
:header-rows: 1 | ||
|
||
* - ``DataParallel`` | ||
- ``DistributedDataParallel`` | ||
* - 작업 부하가 큼, 전파될 때마다 모델이 복제 및 삭제됨 | ||
- 모델이 한 번만 복제됨 | ||
* - 단일 노드 병렬 처리만 가능 | ||
- 여러 머신으로 확장 가능 | ||
* - 느림, 단일 프로세스에서 멀티 스레딩을 사용하기 때문에 Global Interpreter Lock (GIL) 충돌이 발생 | ||
- 빠름, 멀티 프로세싱을 사용하기 때문에 GIL 충돌 없음 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ascii로된 table이 list-table형식으로 바꿔두셨는데 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 제가 잘못 작성했을 수도 있었을 것 같은데, 빌드해서 홈페이지 확인했을 때 테이블이 깨지는 것을 발견해서 형식을 바꿔두었습니다! |
||
|
||
|
||
읽을거리 | ||
--------------- | ||
|
||
- `Multi-GPU training with DDP <ddp_series_multigpu.html>`__ (next tutorial in this series) | ||
- `Multi-GPU training with DDP <ddp_series_multigpu.html>`__ (이 시리즈의 다음 튜토리얼) | ||
- `DDP | ||
API <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__ | ||
- `DDP Internal | ||
|
Uh oh!
There was an error while loading. Please reload this page.