-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audio: ARIA: Add HiFi5 implementation #8692
Conversation
src/audio/aria/aria_hifi5.c
Outdated
/** | ||
* \brief Aria gain index mapping table | ||
*/ | ||
const uint8_t INDEX_TAB[] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is being cast to a uint32_t in the code and should simplify the L32 loads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the xtensa simulator I didn't see any improvement, the purpose I use the uint8_t is to save memory. But it is okay to change it to a uint32_t table if we don't mind the memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting, the L32 of a uint32_t is 1 op, but to load a byte is a L32 and a shift/mask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lgirdwood @andrula-song , there is load byte instruction, so no need use load32 for this table, save memory is always good:
Load 8-bit Unsigned L8UI
Xtensa Instruction Set Architecture (ISA) Reference Manual 449
Instruction Word (RRI8)
Required Configuration Option
Core Architecture (See Section 4.2 on page 51)
Assembler Syntax
L8UI at, as, 0..255
Description
L8UI is an 8-bit unsigned load from memory. It forms a virtual address by adding the
contents of address register as and an 8-bit zero-extended constant value encoded in
the instruction word. Therefore, the offset ranges from 0 to 255. Eight bits (one byte) are
read from the physical address. This data is then zero-extended and written to address
register at.
If the Region Translation Option (page 185) or the MMU Option (page 199)is enabled,
the virtual address is translated to the physical address. If not, the physical address is
identical to the virtual address. If the translation or memory reference encounters an
error (for example, protection violation or non-existent memory), the processor raises
one of several exceptions (see Section 4.4.2.5 on page 105)
b3151bc
to
33735a2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@singalsu needs your review.
Add HiFi5 implementation of aria algorithm functions, compared with HiFi3 version, can reduce about 10% cycles. Signed-off-by: Andrula Song <[email protected]>
33735a2
to
eb5e4b1
Compare
2, 3, 4, 5, | ||
6, 7, 8, 9, | ||
0, 1, 2, 3 | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like modulo 10 for 0 to 23 index. Likely not possible but if gains gains table size ARIA_MAX_GAIN_STATES could be 16 or other 2^N value this could be bit-and instead of modulo. But it's out-of-scope of this PR to change functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like modulo 10 for 0 to 23 index. Likely not possible while keeping the algorithm the same but if gains gains table size ARIA_MAX_GAIN_STATES could be 16 or other 2^N value this could be bit-and instead of modulo. But it's out-of-scope of this PR to change functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, I got the patch and compared to aria_hifi3.c.
2, 3, 4, 5, | ||
6, 7, 8, 9, | ||
0, 1, 2, 3 | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like modulo 10 for 0 to 23 index. Likely not possible while keeping the algorithm the same but if gains gains table size ARIA_MAX_GAIN_STATES could be 16 or other 2^N value this could be bit-and instead of modulo. But it's out-of-scope of this PR to change functionality.
# if XCHAL_HAVE_HIFI3 || XCHAL_HAVE_HIFI4 | ||
# if XCHAL_HAVE_HIFI5 | ||
# define ARIA_HIFI5 | ||
# elif XCHAL_HAVE_HIFI3 || XCHAL_HAVE_HIFI4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please switch to the new, de-duplicated approach from #8787
Add HiFi5 implementation of aria algorithm functions, compared with HiFi3 version, can reduce about 10% cycles.