Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main' into docs_section_update
Browse files Browse the repository at this point in the history
  • Loading branch information
jainapurva committed Jan 22, 2025
2 parents cf9694e + 5d1444b commit 227fe3c
Show file tree
Hide file tree
Showing 4 changed files with 289 additions and 244 deletions.
6 changes: 3 additions & 3 deletions docs/source/api_ref_sparsity.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ torchao.sparsity

WandaSparsifier
PerChannelNormObserver
apply_sparse_semi_structured
apply_fake_sparsity


sparsify_
semi_sparse_weight
int8_dynamic_activation_int8_semi_sparse_weight
6 changes: 3 additions & 3 deletions docs/source/sparsity.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Our workflow is designed to consist of two parts that answer each question indep

The handoff point between these two pieces are sparse weights stored in a dense format, with 0 in the place of missing elements. This is a natural handoff point because sparse matrix multiplication and dense matrix multiplication with this tensor will be numerically equivalent. This lets us present a clear contract to the user for our backend, for a given sparsity pattern:

**\ *If you can get your dense matrix into a [2:4 sparse format], we can speed up matrix multiplication up to [1.7x] with no numerical loss.*\ **
If you can get your dense matrix into a **2:4 sparse format**, we can speed up matrix multiplication up to **1.7x** with no numerical loss.

This also allows users with existing sparse weights in a dense format to take advantage of our fast sparse kernels. We anticipate many users to come up with their own custom frontend masking solution or to use another third party solution, as this is an active area of research.

Expand Down Expand Up @@ -102,9 +102,9 @@ Context

This section provides some context on neural network pruning/sparsity as well as definitions for some common pruning/sparsity terms. In academia / industry, **pruning** and **sparsity** are often used interchangeably to refer to the same thing. This can be confusing, especially since sparsity is an overloaded term that can refer to many other things, such as sparse tensor representations.

Note that this section focuses on **pruning**\ , instead of **sparse training**. The distinction being that in **pruning** we start with a pretrained dense model, while during **sparse training** we train a sparse model from scratch.
Note that this section focuses on **pruning**, instead of **sparse training**. The distinction being that in **pruning** we start with a pretrained dense model, while during **sparse training** we train a sparse model from scratch.

**In order to avoid confusion, we generally try to use sparsity to refer to tensors. Note that a sparse tensor can refer to a dense tensor with many zero values, or a tensor stored using a sparse representation. We describe the flow as *pruning* and the resultant model as a *pruned* model.**
In order to avoid confusion, we generally try to use sparsity to refer to tensors. Note that a sparse tensor can refer to a dense tensor with many zero values, or a tensor stored using a sparse representation. We describe the flow as **pruning** and the resultant model as a **pruned** model.

Roughly, the flow for achieving a more performant pruned model looks like this:

Expand Down
Loading

0 comments on commit 227fe3c

Please sign in to comment.