Skip to content

Conversation

@JSCU-CNI
Copy link
Contributor

This PR adds keyword indices for certain fields in the analysis collection. This massively improves load time when accessing an individual analysis result in CAPE on large MongoDB instances.

CAPE uses one large OR-query when loading an analysis result which is why we opted for a separate index for every field on which the aggregation is run. The query also looks for exact matches so the fields are indexed as keyword instead of text.

This massively improves load time when accessing an individual analysis result in CAPEv2 on large MongoDB instances.
"procmemory.file_ref",
]
for item in items:
mongo_create_index(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will take a very long time for a lot of CAPE installations - terabyte scale is common. Probably better to do this in the utils somewhere and emit a warning if the indices aren't found.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where exactly do you suggest we should put this?

Copy link
Contributor

@nbargnesi nbargnesi May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest as a new module in utils, mongodb_indices or something appropriately named.

In the startup module you can check the indices are there and emit a warning if they're missing.

Note too, the difference between doing it in startup and utils. Putting it in utils means we can add indices out-of-band, while CAPE continutes to run. Great way of making things faster incrementally without bringing CAPE down by touching startup.

@doomedraven
Copy link
Collaborator

this is obsolete now, i have added indexes for md5 and sha256 + rewrote the searching

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants