Skip to content

Commit cc5d445

Browse files
authored
Merge branch 'pytorch-2' into model/qr_code_pytorch2
2 parents 6de35ae + b9a314e commit cc5d445

File tree

2 files changed

+19
-19
lines changed

2 files changed

+19
-19
lines changed

Diff for: README.md

+19-19
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# ADI MAX78000/MAX78002 Model Training and Synthesis
22

3-
May 6, 2024
3+
May 20, 2024
44

55
**Note: The pytorch-2 branch is in development. Please see [KNOWN_ISSUES](KNOWN_ISSUES.txt).**
66

@@ -11,9 +11,9 @@ ADI’s MAX78000/MAX78002 project is comprised of five repositories:
1111
2. The software development kit (MSDK), which contains drivers and example programs ready to run on the evaluation kits (EVkit and Feather):
1212
[Analog Devices MSDK](https://github.com/analogdevicesinc/msdk)
1313
3. The training repository, which is used for deep learning *model development and training*:
14-
[ai8x-training](https://github.com/analogdevicesinc/ai8x-training) **(described in this document)**
14+
[ai8x-training](https://github.com/analogdevicesinc/ai8x-training/tree/pytorch-2) **(described in this document)**
1515
4. The synthesis repository, which is used to *convert a trained model into C code* using the “izer” tool:
16-
[ai8x-synthesis](https://github.com/analogdevicesinc/ai8x-synthesis) **(described in this document)**
16+
[ai8x-synthesis](https://github.com/analogdevicesinc/ai8x-synthesis/tree/pytorch-2) **(described in this document)**
1717
5. The reference design repository, which contains host applications and sample applications for reference designs such as [MAXREFDES178 (Cube Camera)](https://www.analog.com/en/design-center/reference-designs/maxrefdes178.html):
1818
[refdes](https://github.com/analogdevicesinc/MAX78xxx-RefDes)
1919
*Note: Examples for EVkits and Feather boards are part of the MSDK*
@@ -75,15 +75,15 @@ Limited support and advice for using other hardware and software combinations is
7575

7676
**The only officially supported platforms for model training** are Ubuntu Linux 20.04 LTS and 22.04 LTS on amd64/x86_64, either the desktop or the [server version](https://ubuntu.com/download/server).
7777

78-
*Note that hardware acceleration using CUDA is <u>not available</u> in PyTorch for Raspberry Pi 4 and other <u>aarch64/arm64</u> devices, even those running Ubuntu Linux 20.04/22.04. See also [Development on Raspberry Pi 4 and 400](https://github.com/analogdevicesinc/ai8x-synthesis/blob/develop/docs/RaspberryPi.md) (unsupported).*
78+
*Note that hardware acceleration using CUDA is <u>not available</u> in PyTorch for Raspberry Pi 4 and other <u>aarch64/arm64</u> devices, even those running Ubuntu Linux 20.04/22.04. See also [Development on Raspberry Pi 4 and 400](https://github.com/analogdevicesinc/ai8x-synthesis/blob/pytorch-2/docs/RaspberryPi.md) (unsupported).*
7979

8080
This document also provides instructions for installing on RedHat Enterprise Linux / CentOS 8 with limited support.
8181

8282
##### Windows
8383

84-
On Windows 10 version 21H2 or newer, and Windows 11, after installing the Windows Subsystem for Linux (WSL2), Ubuntu Linux 20.04 or 22.04 can be used inside Windows with full CUDA acceleration, please see *[Windows Subsystem for Linux](https://github.com/analogdevicesinc/ai8x-synthesis/blob/develop/docs/WSL2.md).* For the remainder of this document, follow the steps for Ubuntu Linux.
84+
On Windows 10 version 21H2 or newer, and Windows 11, after installing the Windows Subsystem for Linux (WSL2), Ubuntu Linux 20.04 or 22.04 can be used inside Windows with full CUDA acceleration, please see *[Windows Subsystem for Linux](https://github.com/analogdevicesinc/ai8x-synthesis/blob/pytorch-2/docs/WSL2.md).* For the remainder of this document, follow the steps for Ubuntu Linux.
8585

86-
If WSL2 is not available, it is also possible (but not recommended due to inherent compatibility issues and slightly degraded performance) to run this software natively on Windows. Please see *[Native Windows Installation](https://github.com/analogdevicesinc/ai8x-synthesis/blob/develop/docs/Windows.md)*.
86+
If WSL2 is not available, it is also possible (but not recommended due to inherent compatibility issues and slightly degraded performance) to run this software natively on Windows. Please see *[Native Windows Installation](https://github.com/analogdevicesinc/ai8x-synthesis/blob/pytorch-2/docs/Windows.md)*.
8787

8888
##### macOS
8989

@@ -317,8 +317,8 @@ Change to the project root and run the following commands. Use your GitHub crede
317317

318318
```shell
319319
$ cd <your/project>
320-
$ git clone --recursive https://github.com/analogdevicesinc/ai8x-training.git
321-
$ git clone --recursive https://github.com/analogdevicesinc/ai8x-synthesis.git
320+
$ git clone --recursive -b pytorch-2 https://github.com/analogdevicesinc/ai8x-training.git
321+
$ git clone --recursive -b pytorch-2 https://github.com/analogdevicesinc/ai8x-synthesis.git
322322
```
323323

324324
#### Creating the Virtual Environment
@@ -329,7 +329,7 @@ To create the virtual environment and install basic wheels:
329329
$ cd ai8x-training
330330
```
331331

332-
The default branch is “develop” which is updated most frequently. If you want to use the main” branch instead, switch to “main” using `git checkout main`.
332+
Using the instructions above checks out the `pytorch-2` branch which supports PyTorch 2.3. For PyTorch 1.8 support, use the `develop` or `main` branches. To switch, use `git checkout`, for example `git checkout main`.
333333

334334
If using pyenv, set the local directory to use Python 3.11.8.
335335

@@ -395,11 +395,11 @@ For all other systems, including macOS:
395395

396396
##### Repository Branches
397397

398-
By default, the `develop` branch is checked out. This branch is the most frequently updated branch and it contains the latest improvements to the project. To switch to the main branch that is updated less frequently, but may be more stable, use the command `git checkout main`.
398+
When following these instructions, the `pytorch-2` branch is checked out. For PyTorch 1.8 support, use either the `develop` branch (the most frequently updated branch which it contains the latest improvements to the project) or the `main` branch (updated less frequently, but possibly more stable). To change branches, use the command `git checkout`, for example `git checkout main`.
399399

400400
###### TensorFlow / Keras
401401

402-
Support for TensorFlow / Keras is currently in the `develop-tf` branch.
402+
Support for TensorFlow / Keras is deprecated.
403403

404404
#### Updating to the Latest Version
405405

@@ -587,7 +587,7 @@ The MSDK is also available as a [git repository](https://github.com/analogdevice
587587
$ pacman -S --needed base filesystem msys2-runtime make
588588
```
589589

590-
5. Install packages for OpenOCD. OpenOCD binaries are available in the “openocd” sub-folder of the ai8x-synthesis repository. However, some additional dependencies are required on most systems. See [openocd/README.md](https://github.com/analogdevicesinc/ai8x-synthesis/blob/develop/openocd/README.md) for a list of packages to install, then return here to continue.
590+
5. Install packages for OpenOCD. OpenOCD binaries are available in the “openocd” sub-folder of the ai8x-synthesis repository. However, some additional dependencies are required on most systems. See [openocd/README.md](https://github.com/analogdevicesinc/ai8x-synthesis/blob/pytorch-2/openocd/README.md) for a list of packages to install, then return here to continue.
591591

592592
6. Add the location of the toolchain binaries to the system path.
593593

@@ -1075,12 +1075,12 @@ The MAX78000 hardware does not support arbitrary network parameters. Specificall
10751075
* The *final* streaming layer must use padding.
10761076
* Layers that use 1×1 kernels without padding are automatically replaced with equivalent layers that use 3×3 kernels with padding.
10771077
1078-
* The weight memory supports up to 768 * 64 3×3 Q7 kernels (see [Number Format](#number-format)), for a total of [432 KiB of kernel memory](docs/AHBAddresses.md).
1078+
* The weight memory supports up to 768 * 64 3×3 Q7 kernels (see [Number Format](#number-format)), for a total of [432 KiB of kernel memory](https://github.com/analogdevicesinc/ai8x-synthesis/blob/pytorch-2/docs/AHBAddresses.md).
10791079
When using 1-, 2- or 4-bit weights, the capacity increases accordingly.
10801080
When using more than 64 input or output channels, weight memory is shared, and effective capacity decreases proportionally (for example, 128 input channels require twice as much space as 64 input channels, and a layer with <u>both</u> 128 input and 128 output channels requires <u>four</u> times as much space as a layer with only 64 input channels and 64 output channels).
10811081
Weights must be arranged according to specific rules detailed in [Layers and Weight Memory](#layers-and-weight-memory).
10821082
1083-
* There are 16 instances of 32 KiB data memory ([for a total of 512 KiB](docs/AHBAddresses.md)). When not using streaming mode, any data channel (input, intermediate, or output) must completely fit into one memory instance. This limits the first-layer input to 32,768 pixels per channel in the CHW format (181×181 when width = height). However, when using more than one input channel, the HWC format may be preferred, and all layer outputs are in HWC format as well. In those cases, it is required that four channels fit into a single memory instance — or 8192 pixels per channel (approximately 90×90 when width = height).
1083+
* There are 16 instances of 32 KiB data memory ([for a total of 512 KiB](https://github.com/analogdevicesinc/ai8x-synthesis/blob/pytorch-2/docs/AHBAddresses.md)). When not using streaming mode, any data channel (input, intermediate, or output) must completely fit into one memory instance. This limits the first-layer input to 32,768 pixels per channel in the CHW format (181×181 when width = height). However, when using more than one input channel, the HWC format may be preferred, and all layer outputs are in HWC format as well. In those cases, it is required that four channels fit into a single memory instance — or 8192 pixels per channel (approximately 90×90 when width = height).
10841084
Note that the first layer commonly creates a wide expansion (i.e., a large number of output channels) that needs to fit into data memory, so the input size limit is mostly theoretical. In many cases, [Data Folding](#data-folding) (distributing the input data across multiple channels) can effectively increase both the input dimensions as well as improve model performance.
10851085
10861086
* The hardware supports 1D and 2D convolution layers, 2D transposed convolution layers (upsampling), element-wise addition, subtraction, binary OR, binary XOR as well as fully connected layers (`Linear`), which are implemented using 1×1 convolutions on 1×1 data:
@@ -1171,12 +1171,12 @@ The MAX78002 hardware does not support arbitrary network parameters. Specificall
11711171
* Layers that use 1×1 kernels without padding are automatically replaced with equivalent layers that use 3×3 kernels with padding.
11721172
* Streaming layers must use convolution (i.e., the `Conv1d`, `Conv2d`, or `ConvTranspose2d` [operators](#operation)).
11731173
1174-
* The weight memory of processors 0, 16, 32, and 48 supports up to 5,120 3×3 Q7 kernels (see [Number Format](#number-format)), all other processors support up to 4,096 3×3 Q7 kernels, for a total of [2,340 KiB of kernel memory](docs/AHBAddresses.md).
1174+
* The weight memory of processors 0, 16, 32, and 48 supports up to 5,120 3×3 Q7 kernels (see [Number Format](#number-format)), all other processors support up to 4,096 3×3 Q7 kernels, for a total of [2,340 KiB of kernel memory](https://github.com/analogdevicesinc/ai8x-synthesis/blob/pytorch-2/docs/AHBAddresses.md).
11751175
When using 1-, 2- or 4-bit weights, the capacity increases accordingly. The hardware supports two different flavors of 1-bit weights, either 0/–1 or +1/–1.
11761176
When using more than 64 input or output channels, weight memory is shared, and effective capacity decreases.
11771177
Weights must be arranged according to specific rules detailed in [Layers and Weight Memory](#layers-and-weight-memory).
11781178
1179-
* The total of [1,280 KiB of data memory](docs/AHBAddresses.md) is split into 16 sections of 80 KiB each. When not using streaming mode, any data channel (input, intermediate, or output) must completely fit into one memory instance. This limits the first-layer input to 81,920 pixels per channel in CHW format (286×286 when height = width). However, when using more than one input channel, the HWC format may be preferred, and all layer outputs are in HWC format as well. In those cases, it is required that four channels fit into a single memory section — or 20,480 pixels per channel (143×143 when height = width).
1179+
* The total of [1,280 KiB of data memory](https://github.com/analogdevicesinc/ai8x-synthesis/blob/pytorch-2/docs/AHBAddresses.md) is split into 16 sections of 80 KiB each. When not using streaming mode, any data channel (input, intermediate, or output) must completely fit into one memory instance. This limits the first-layer input to 81,920 pixels per channel in CHW format (286×286 when height = width). However, when using more than one input channel, the HWC format may be preferred, and all layer outputs are in HWC format as well. In those cases, it is required that four channels fit into a single memory section — or 20,480 pixels per channel (143×143 when height = width).
11801180
Note that the first layer commonly creates a wide expansion (i.e., a large number of output channels) that needs to fit into data memory, so the input size limit is mostly theoretical. In many cases, [Data Folding](#data-folding) (distributing the input data across multiple channels) can effectively increase both the input dimensions as well as improve model performance.
11811181
11821182
* The hardware supports 1D and 2D convolution layers, 2D transposed convolution layers (upsampling), element-wise addition, subtraction, binary OR, binary XOR as well as fully connected layers (`Linear`), which are implemented using 1×1 convolutions on 1×1 data:
@@ -2083,7 +2083,7 @@ The only model architecture implemented in this repository is the sequential mod
20832083
20842084
<img src="docs/NAS_Sequential_Model.png" alt="nas_model" style="zoom:50%;"/>
20852085
2086-
All required elastic search strategies are implemented in this [model file](https://github.com/analogdevicesinc/ai8x-training/blob/develop/models/ai85nasnet-sequential.py).
2086+
All required elastic search strategies are implemented in this [model file](https://github.com/analogdevicesinc/ai8x-training/blob/pytorch-2/models/ai85nasnet-sequential.py).
20872087
20882088
A new model architecture can be implemented by implementing the `OnceForAllModel` interface. The new model class must implement the following:
20892089
@@ -3268,9 +3268,9 @@ See the [benchmarking guide](https://github.com/analogdevicesinc/MaximAI_Documen
32683268
32693269
Additional information about the evaluation kits, and the software development kit (MSDK) is available on the web at <https://github.com/analogdevicesinc/MaximAI_Documentation>.
32703270
3271-
[AHB Addresses for MAX78000 and MAX78002](docs/AHBAddresses.md)
3271+
[AHB Addresses for MAX78000 and MAX78002](https://github.com/analogdevicesinc/ai8x-synthesis/blob/pytorch-2/docs/AHBAddresses.md)
32723272
3273-
[Facial Recognition System](https://github.com/analogdevicesinc/ai8x-training/blob/develop/docs/FacialRecognitionSystem.md)
3273+
[Facial Recognition System](https://github.com/analogdevicesinc/ai8x-training/blob/pytorch-2/docs/FacialRecognitionSystem.md)
32743274
32753275
32763276
---

Diff for: README.pdf

5.49 KB
Binary file not shown.

0 commit comments

Comments
 (0)