Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combo RAM and Kconfig reduction #2849

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

petejohanson
Copy link
Contributor

Keeping this draft for now, but this is a collection of improvements to remove a few combo Kconfig flags that are instead auto-calculated at build time, and reduce the RAM usage on more constrained devices.

A Le Chiffre STM32 Build on main:

[198/198] Linking C executable zephyr/zmk.elf
Memory region         Used Size  Region Size  %age Used
           FLASH:       42636 B       128 KB     32.53%
             RAM:        8120 B        15 KB     52.86%
     RetainedMem:          0 GB          1 B      0.00%
        IDT_LIST:          0 GB         2 KB      0.00%

Same keymap with this PR:

❯ west build -d build/lc_stm32_tap_dance
[7/7] Linking C executable zephyr/zmk.elf
Memory region         Used Size  Region Size  %age Used
           FLASH:       42732 B       128 KB     32.60%
             RAM:        7736 B        15 KB     50.36%
     RetainedMem:          0 GB          1 B      0.00%
        IDT_LIST:          0 GB         2 KB      0.00%

For a savings of 384 bytes.

Needs some more cleanup, and there's room for some performance improvements for the access to the new bit fields, but this should be ready for early testing.

PR check-list

  • Branch has a clean commit history
  • Additional tests are included, if changing behaviors/core code that is testable.
  • Proper Copyright + License headers added to applicable files (Generally, we stick to "The ZMK Contributors" for copyrights to help avoid churn when files get edited)
  • Pre-commit used to check formatting of files, commit messages, etc.
  • Includes any necessary documentation changes.

Determine the max keys per combo automatically from the devicetree,
so we remove the ZMK_COMBO_MAX_KEYS_PER_COMBO Kconfig symbol.
Reference combos by index, not 32-bit pointers, and store bitfields
instead of arrays in several places, to bring down our flash/RAM usage.
Use bit field to track candidate combos, to avoid needing an explicit
`ZMK_COMBO_MAX_COMBOS_PER_KEY` setting.
app/src/combo.c Outdated
#define COMBO_CONFIGS_WITH_MATCHING_POSITIONS_LEN(positions, _ignore) \
DT_INST_FOREACH_CHILD_VARGS(0, COMBO_INST, positions)
static const struct combo_cfg combos[] = {
LISTIFY(10, COMBO_CONFIGS_WITH_MATCHING_POSITIONS_LEN, (), 0)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should 10 be MAX_COMBO_KEYS instead? I have 30% confidence that I understand what this is doing, but it sounds like it is iterating over possible key_positions lengths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deserved a comment: MAX_COMBO_KEYS is not a pre-processor constant, but a C union trick, so can't be used as a parameter to LISTIFY, so we basically pick an arbitrary high value that won't tax the pre-processor too much. I picked 10, since it's unlikely for someone to combo with more fingers than they have.

Copy link
Contributor

@caksoylar caksoylar Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it doesn't hurt to set this to a high number, so maybe we go 'til 255 (devicetree limit, IIRC)? Unless you think preprocessor can slow down the build by a significant amount.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it doesn't hurt to set this to a high number, so maybe we go 'til 255 (devicetree limit, IIRC)? Unless you think preprocessor can slow down the build by a significant amount.

It might, and honestly, this is iterating by key-positions length, so I'm having a hard time imagining we need to support a combo triggered by 256 different key positions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, just thought it was perhaps a number with lesser-magic involved than 10, it's fine to keep that too (perhaps with a comment and noting the limitation in docs).

int
default 0
help
Deprecated: Storage for combos is now determined automatically
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these should be deprecated. Rather I think we should take the max of the automatic determination and whatever this value is set to, for future Studio use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we have future studio use, we'll likely "pre-allocate" things by setting placeholders up in the devicetree, like we do with reserved layers, so our auto-calculation will still be usable there as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closer inspection, seems doable for MAX_COMBOS_PER_KEY, but less so for vice-versa: what if DT has only 2 and 3 key combos, and then someone wants to set up a 5 key one in Studio?

return 0;
}

static bool combo_active_on_layer(const struct combo_cfg *combo, uint8_t layer) {
if (combo->layers[0] == -1) {
// -1 in the first layer position is global layer scope
if (!combo->layer_mask) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the layer mask might be nicer as an inverted mask, i.e. 0s as true and 1s as false.

app/src/combo.c Outdated
Comment on lines 189 to 203
for (int i = 0; i < ARRAY_SIZE(combos); i++) {
if (sys_bitfield_test_bit((mem_addr_t)&candidates, i)) {
const struct combo_cfg *candidate = &combos[i];
bool found = false;
for (int kp = 0; kp < candidate->key_position_len; kp++) {
if (candidate->key_positions[kp] == position) {
matches++;
found = true;
break;
}
}

if (!found) {
sys_bitfield_clear_bit((mem_addr_t)&candidates, i);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be able to just & the candidates and the combo lookup bitfields with one another?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, with some tweaks for counting remaining bits.

@Nick-Munnich
Copy link
Contributor

Joel had an interesting comment here that might also apply to this PR.

@petejohanson
Copy link
Contributor Author

Joel had an interesting comment here that might also apply to this PR.

Moved to uint32_t type for the bit fields to avoid alignment issues.

@@ -105,17 +105,17 @@ static const struct combo_cfg combos[] = {
#define COMBO_CHILDREN_COUNT (0 DT_INST_FOREACH_CHILD(0, COMBO_ONE))

// We need at least 4 bytes to avoid alignment issues
#define BYTES_FOR_COMBOS_MASK MAX(4, DIV_ROUND_UP(COMBO_CHILDREN_COUNT, 8))
#define BYTES_FOR_COMBOS_MASK MAX(4, DIV_ROUND_UP(COMBO_CHILDREN_COUNT, 32))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this to be a minimum of 4 now that everything is uint32_t?

Comment on lines +181 to +189
static uint8_t number_of_set_bits(uint32_t field) {
uint8_t count = 0;
while (field) {
field &= (field - 1);
count++;
}
// clear unmatched candidates
for (int i = matches; i < CONFIG_ZMK_COMBO_MAX_COMBOS_PER_KEY; i++) {
candidates[i].combo = NULL;

return count;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we actually need to calculate the total number of set bits. We only care if it's 0, 1, or more.

Something like

    if (value == 0) {
        return 0; 
    }
    if ((value & (value - 1)) == 0) {
        return 1;
    }
    return 2;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants