Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio: ARIA: Add HiFi5 implementation #8692

Merged
merged 1 commit into from
Jan 24, 2024

Conversation

andrula-song
Copy link
Contributor

Add HiFi5 implementation of aria algorithm functions, compared with HiFi3 version, can reduce about 10% cycles.

@andrula-song
Copy link
Contributor Author

here is the xtensa simulator result:
Screenshot from 2024-01-04 10-28-01
compared with hifi3 version, the HiFi5 version can save:
about 5% for aria_algo_get_data_odd_channel;
about 10% for aria_algo_get_data_even_channel;
about 15% for aria_algo_calc_gain.

/**
* \brief Aria gain index mapping table
*/
const uint8_t INDEX_TAB[] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is being cast to a uint32_t in the code and should simplify the L32 loads

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the xtensa simulator I didn't see any improvement, the purpose I use the uint8_t is to save memory. But it is okay to change it to a uint32_t table if we don't mind the memory usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting, the L32 of a uint32_t is 1 op, but to load a byte is a L32 and a shift/mask.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgirdwood @andrula-song , there is load byte instruction, so no need use load32 for this table, save memory is always good:

Load 8-bit Unsigned L8UI
Xtensa Instruction Set Architecture (ISA) Reference Manual 449
Instruction Word (RRI8)
Required Configuration Option
Core Architecture (See Section 4.2 on page 51)
Assembler Syntax
L8UI at, as, 0..255
Description
L8UI is an 8-bit unsigned load from memory. It forms a virtual address by adding the
contents of address register as and an 8-bit zero-extended constant value encoded in
the instruction word. Therefore, the offset ranges from 0 to 255. Eight bits (one byte) are
read from the physical address. This data is then zero-extended and written to address
register at.
If the Region Translation Option (page 185) or the MMU Option (page 199)is enabled,
the virtual address is translated to the physical address. If not, the physical address is
identical to the virtual address. If the translation or memory reference encounters an
error (for example, protection violation or non-existent memory), the processor raises
one of several exceptions (see Section 4.4.2.5 on page 105)

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@singalsu needs your review.

Add HiFi5 implementation of aria algorithm functions, compared with
HiFi3 version, can reduce about 10% cycles.

Signed-off-by: Andrula Song <[email protected]>
2, 3, 4, 5,
6, 7, 8, 9,
0, 1, 2, 3
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like modulo 10 for 0 to 23 index. Likely not possible but if gains gains table size ARIA_MAX_GAIN_STATES could be 16 or other 2^N value this could be bit-and instead of modulo. But it's out-of-scope of this PR to change functionality.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like modulo 10 for 0 to 23 index. Likely not possible while keeping the algorithm the same but if gains gains table size ARIA_MAX_GAIN_STATES could be 16 or other 2^N value this could be bit-and instead of modulo. But it's out-of-scope of this PR to change functionality.

Copy link
Collaborator

@singalsu singalsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, I got the patch and compared to aria_hifi3.c.

2, 3, 4, 5,
6, 7, 8, 9,
0, 1, 2, 3
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like modulo 10 for 0 to 23 index. Likely not possible while keeping the algorithm the same but if gains gains table size ARIA_MAX_GAIN_STATES could be 16 or other 2^N value this could be bit-and instead of modulo. But it's out-of-scope of this PR to change functionality.

@andrula-song andrula-song marked this pull request as ready for review January 23, 2024 01:27
@marc-hb marc-hb removed their request for review January 24, 2024 00:24
@lgirdwood lgirdwood merged commit 10f0e38 into thesofproject:main Jan 24, 2024
40 of 44 checks passed
# if XCHAL_HAVE_HIFI3 || XCHAL_HAVE_HIFI4
# if XCHAL_HAVE_HIFI5
# define ARIA_HIFI5
# elif XCHAL_HAVE_HIFI3 || XCHAL_HAVE_HIFI4
Copy link
Collaborator

@marc-hb marc-hb Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please switch to the new, de-duplicated approach from #8787

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants