You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The domain rule in the YARA ruleset matches unintended strings that are not actual domains. This leads to false positives when scanning files that contain generic words, filenames, or localhost-like addresses.
To Reproduce
Steps to reproduce the behavior:
Run YARA scan with the domain rule enabled.
Scan a file that contains common words, filenames, or IP addresses.
Observe that many non-domain strings are detected.
Example false positives:
test-123
file.txt
localhost
random_text
All these strings are incorrectly flagged as domains.
Expected behavior
The domain rule should only match valid domains, such as example.com, sub.example.net, or test-site.org. It should not match:
Plain text words
Filenames like file.txt
Localhost or internal references
Additional context
The issue is caused by the overly broad regex pattern:
$domain_regex = /([\w.-]+)/ wide ascii
This matches any word that includes dots, hyphens, or alphanumeric characters, leading to many false positives.
Suggested Fix: Update the regex to a stricter pattern that ensures a valid TLD is present:
Describe the bug
The domain rule in the YARA ruleset matches unintended strings that are not actual domains. This leads to false positives when scanning files that contain generic words, filenames, or localhost-like addresses.
To Reproduce
Steps to reproduce the behavior:
Run YARA scan with the domain rule enabled.
Scan a file that contains common words, filenames, or IP addresses.
Observe that many non-domain strings are detected.
Example false positives:
test-123
file.txt
localhost
random_text
All these strings are incorrectly flagged as domains.
Expected behavior
The domain rule should only match valid domains, such as example.com, sub.example.net, or test-site.org. It should not match:
Plain text words
Filenames like file.txt
Localhost or internal references
Additional context
The issue is caused by the overly broad regex pattern:
$domain_regex = /([\w.-]+)/ wide ascii
This matches any word that includes dots, hyphens, or alphanumeric characters, leading to many false positives.
Suggested Fix: Update the regex to a stricter pattern that ensures a valid TLD is present:
$domain_regex = /([a-zA-Z0-9-]+.[a-zA-Z]{2,6})/ wide ascii
This ensures only real domains are detected.
The text was updated successfully, but these errors were encountered: