Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Byte matching #96

Merged
merged 35 commits into from
Feb 18, 2025
Merged

Byte matching #96

merged 35 commits into from
Feb 18, 2025

Conversation

diana-qing
Copy link
Contributor

This PR implements support for matching protocol fields on raw bytes. In a filter, bytes are enclosed in pipes (|) and represented as hex bytes.

.DS_Store Outdated
Copy link
Collaborator

@thearossman thearossman Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add all .DS_Store files to .gitignore?

Cargo.toml Outdated
@@ -9,7 +9,7 @@ members = [
"examples/protocols",
"examples/basic",
"examples/basic_file",
"examples/log_ssh",
"examples/log_ssh", "examples/byte_match",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - can you add this to a newline for consistent formatting?

Copy link
Collaborator

@thearossman thearossman Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually -- I think you can take out the example. It's important for functionality testing this feature, and you should save it in a branch in your repo, but I don't think we need to actually check it into the core repository.

One nice alternative would be to modify one of the other programs to add a subscription that looks for NULL bytes so that there is an example of the syntax somewhere. You could modify log_ssh to add an additional subscription matching on content or add something to filter_stats maybe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I modified one of the existing subscriptions in log_ssh to use byte matching to only log SSH handshakes whose protocol version field is equal to 2.0.

@@ -837,6 +838,7 @@ impl fmt::Display for Value {
Value::Ipv4(net) => write!(f, "{}", net),
Value::Ipv6(net) => write!(f, "{}", net),
Value::Text(val) => write!(f, "{}", val),
Value::Byte(val) => write!(f, "{:?}", val),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you format this to print in a consistent format with the input? With the || and in hex.

For hex, might need an iter().for_each to format each byte individually in the Vec -- I don't remember if there's an easier way.

let bytes_vec = bytes_as_str
.replace("|", "")
.split_whitespace()
.map(|s| u8::from_str_radix(s, 16).expect("Not a valid hex byte"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you print in this error message the input that caused it? There could be a ton of subscriptions, so you should signal to the user which pattern caused the failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified to include more info on what input caused the failure. As an example, an input like 2G should lead to a failure since it's not a valid hex byte. The failure message looks like:
Screenshot 2025-02-17 at 6 35 18 PM

Copy link
Collaborator

@thearossman thearossman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit the documentation for Field types in filtergen/lib.rs (will show up here) to add the raw byte vector type.

.gitignore Outdated
@@ -6,3 +6,4 @@ Cargo.lock
*.csv
*.json
*.jsonl
.DS_Store
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diana-qing I think this needs to be *.DS_Store so that it also catches files in subdirectories

@thearossman
Copy link
Collaborator

one .gitignore comment, but this LGTM to merge otherwise. your functionality testing looks good. I ran the SSH example on the tap (without logging) and saw no performance red flags.

@thearossman thearossman merged commit fac8644 into stanford-esrg:main Feb 18, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants