Skip to content

Commit 726fe25

Browse files
authored
Merge pull request #2509 from kamil-certat/misp_feed_mapping
ENH: Caching, advanced mapping and separating events for MISP Feed output bot
2 parents 2485871 + 388609a commit 726fe25

File tree

7 files changed

+1036
-108
lines changed

7 files changed

+1036
-108
lines changed

CHANGELOG.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@ Please refer to the [NEWS](NEWS.md) for a list of changes which have an affect o
2424
- `intelmq.lib.datatypes`: Remove unneeded Dict39 alias (PR#2639 by Nakul Rajpal, fixes #2635)
2525
- `intelmq.lib.mixins.http`: Only set HTTP header 'Authorization' if username or password are set and are not both empty string as they are by default in the Manager (fixes #2590, PR#2634 by Sebastian Wagner).
2626
- `intelmq.lib.message.Message.from_dict`: Do not modify the dict parameter by adding the `__type` field and raise an error when type is not determinable (PR#2545 by Sebastian Wagner).
27+
- `intelmq.lib.mixins.cache.CacheMixin` was extended to support temporary storing messages in a cache queue
28+
(PR#2509 by Kamil Mankowski).
2729

2830
### Development
2931

@@ -219,7 +221,15 @@ This is short list of the most important known issues. The full list can be retr
219221
- Treat value `false` for parameter `filter_regex` as false (PR#2499 by Sebastian Wagner).
220222

221223
#### Outputs
222-
- `intelmq.bots.outputs.misp.output_feed`: Handle failures if saved current event wasn't saved or is incorrect (PR by Kamil Mankowski).
224+
- `intelmq.bots.outputs.misp.output_feed`:
225+
- Handle failures if saved current event wasn't saved or is incorrect (PR by Kamil Mankowski).
226+
- Allow saving messages in bulks instead of refreshing the feed immediately (PR#2509 by Kamil Mankowski).
227+
- Regenerate only modified events (PR#2509 by Kamil Mankowski).
228+
- Add `attribute_mapping` parameter to allow selecting a subset of event attributes as well as additional attribute parameters (PR#2509 by Kamil Mankowski).
229+
- Add `grouping_key` parameter to allow keeping IntelMQ events in separated MISP Events based on a given field (PR#2509 by Kamil Mankowski).
230+
- Add `tagging` parameter to allow adding tags to MISP events (PR#2509 by Kamil Mankowski).
231+
- Add `additional_info` parameter to extend the default description of MISP Events (PR#2509 by Kamil Mankowski).
232+
- Add `flat_events` parameter to allow skipping creating objects in MISP Events (PR#2509 by Kamil Mankowski).
223233
- `intelmq.bots.outputs.smtp_batch.output`: Documentation on multiple recipients added (PR#2501 by Edvard Rejthar).
224234

225235
### Documentation

docs/dev/bot-development.md

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -197,14 +197,47 @@ The `CacheMixin` provides methods to cache values for bots in a Redis database.
197197
- `redis_cache_ttl: int = 15`
198198
- `redis_cache_password: Optional[str] = None`
199199

200-
and provides the methods:
200+
and provides the methods to cache key-value pairs:
201201

202202
- `cache_exists`
203203
- `cache_get`
204204
- `cache_set`
205205
- `cache_flush`
206206
- `cache_get_redis_instance`
207207

208+
and following methods to cache objects in a queue:
209+
210+
- `cache_lpush`
211+
- `cache_rpop`
212+
- `cache_llen`.
213+
214+
Caching key-value pairs and queue caching are two different mechanisms. The functions in the
215+
first list are designed for arbitrary values, while the latter ones are primarily for temporarily
216+
storing messages, but can also handle other data types. You won't see caches from one in the other.
217+
For example, if adding a key-value pair using `cache_set`, it does not change the value from
218+
`cache_llen`, and if adding an element using `cache_lpush` you cannot use `check_exists` to look for it.
219+
220+
When using queue-based caching, you have to serialize object to a format accepted by Redis/Valkey
221+
as the underlying storage. For example, to store a message in a queue using bot ID as key, you can
222+
use code like:
223+
224+
```python
225+
self.cache_lpush(self.bot_id, self.receive_message().to_json(jsondict_as_string=True))
226+
```
227+
228+
and to retrieve a message from the cache:
229+
230+
```python
231+
data = self.cache_pop()
232+
if data is None:
233+
return # handle empty cache
234+
message = json.loads(data)
235+
# to use it as Message object
236+
message_obj = MessageFactory.from_dict(
237+
message, harmonization=self.harmonization, default_type="Event"
238+
)
239+
```
240+
208241
### Pipeline Interactions
209242

210243
We can call three methods related to the pipeline:

docs/user/bots.md

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4736,6 +4736,12 @@ Create a directory layout in the MISP Feed format.
47364736
The PyMISP library >= 2.4.119.1 is required, see
47374737
[REQUIREMENTS.txt](https://github.com/certtools/intelmq/blob/master/intelmq/bots/outputs/misp/REQUIREMENTS.txt).
47384738

4739+
Note: please test the produced feed before using in production. This bot allows you to do an
4740+
extensive customisation of the MISP feed, including creating multiple events and tags, but it can
4741+
be tricky to configure properly. Misconfiguration can prevent bot from starting or have bad
4742+
consequences for your MISP Instance (e.g. spaming with events). Use `intelmqctl check` command
4743+
to validate your configuration against common mistakes.
4744+
47394745
**Module:** `intelmq.bots.outputs.misp.output_feed`
47404746

47414747
**Parameters:**
@@ -4760,6 +4766,144 @@ The PyMISP library >= 2.4.119.1 is required, see
47604766
() The output bot creates one event per each interval, all data in this time frame is part of this event. Default "1
47614767
hour", string.
47624768

4769+
**`bulk_save_count`**
4770+
4771+
(optional, int) If set to a non-0 value, the bot won't refresh the MISP feed immediately, but will cache
4772+
incoming messages until the given number of them. Use it if your bot proceeds a high number of messages
4773+
and constant saving to the disk is a problem. Reloading or restarting bot as well as generating
4774+
a new MISP event based on `interval_event` triggers regenerating MISP feed regardless of the cache size.
4775+
4776+
**`attribute_mapping`**
4777+
4778+
(optional, dict) If set, allows selecting which IntelMQ event fields are mapped to MISP attributes
4779+
as well as attribute parameters (like e.g. a comment). The expected format is a *dictionary of dictionaries*:
4780+
first-level key represents an IntelMQ field that will be directly translated to a MISP attribute; nested
4781+
dictionary represents additional parameters PyMISP can take when creating an attribute. They can use
4782+
names of other IntelMQ fields (then the value of such field will be used), or static values. If not needed,
4783+
leave empty dict.
4784+
4785+
For available attribute parameters, refer to the
4786+
[PyMISP documentation](https://pymisp.readthedocs.io/en/latest/_modules/pymisp/mispevent.html#MISPObjectAttribute)
4787+
for the `MISPObjectAttribute`.
4788+
4789+
For example:
4790+
4791+
```yaml
4792+
attribute_mapping:
4793+
source.ip: {}
4794+
feed.name:
4795+
comment: event_description.text
4796+
destination.ip:
4797+
to_ids: False
4798+
```
4799+
4800+
would create a MISP object with three attributes `source.ip`, `feed.name` and `destination.ip`
4801+
and set their values as in the IntelMQ event. In addition, the `feed.name` would have a comment
4802+
as given in the `event_description.text` from IntelMQ event, and `destination.ip` would be set
4803+
as not usable for IDS. You can use `type` key to overwrite the attribute type.
4804+
4805+
**`grouping_key`
4806+
4807+
(optional, string): If set to a field name from IntelMQ event, the bot will work in parallel on a few
4808+
events instead of saving all incoming messages to a one. Each unique value from the field will
4809+
use its own MISP Event. This is useful if your feed provides data about multiple entities you would
4810+
like to group, for example IPs of C2 servers from different botnets. For a given value, the bot will
4811+
use the same MISP Event as long as it's allowed by the `interval_event`.
4812+
4813+
**`additional_info`
4814+
4815+
(optional, string): If set, the generated MISP Event will use it in the `info` field of the event,
4816+
in addition to the standard IntelMQ description with the time frame (you cannot remove it as the bot
4817+
depends of datetimes saved there). If you use `grouping_key`, you may want to use `{key}`
4818+
placeholder which will be then replaced with the value of the grouping key.
4819+
4820+
For example, the following configuration can be used to create MISP Feed with IPs of C2 servers
4821+
of different botnets, having each botnet in a separated MISP Events with an appropriate description.
4822+
Each MISP Event will contain objects with the `source.ip` field only, and the events' info will look
4823+
like *C2 Servers for botnet-1. IntelMQ event 2024-07-09T14:51:10.825123 - 2024-07-10T14:51:10.825123*
4824+
4825+
```yaml
4826+
grouping_key: malware.name
4827+
additional_info: C2 Servers for {key}.
4828+
attribute_mapping:
4829+
source.ip:
4830+
```
4831+
4832+
**`tagging`
4833+
4834+
(optional, dict): Allows setting MISP tags to MISP events. The structure is a *dict of list of dicts*.
4835+
The keys refers to which MISP events you want to tag. If you want to tag all of them, use `__all__`.
4836+
If you use `event_separator` and want to add additional tags to some events, use the expected values
4837+
of the separation field. The *list of dicts* defines MISP tags as parameters to create `MISPTag`
4838+
objects from. Each dictionary has to have at least `name`. For all available parameters refer to the
4839+
[PyMISP documentation](https://pymisp.readthedocs.io/en/latest/_modules/pymisp/abstract.html#MISPTag)
4840+
for `MISPTag`.
4841+
4842+
Note: setting `name` is enough for MISP to match a correct tag from the global collection. You may
4843+
see it lacking the colour in the MISP Feed view, but it will be retriven after importing to your
4844+
instance.
4845+
4846+
Example 1 - set two tags for every MISP event:
4847+
4848+
```yaml
4849+
tagging:
4850+
__all__:
4851+
- name: tlp:red
4852+
- name: source:intelmq
4853+
```
4854+
4855+
Example 2 - create separated events based on `malware.name` and set additional family tag:
4856+
4857+
```yaml
4858+
event_separator: malware.name
4859+
tagging:
4860+
__all__:
4861+
- name: tlp:red
4862+
njrat:
4863+
- name: njrat
4864+
```
4865+
4866+
** `flat_events`
4867+
4868+
(optional, bool): instead of creating an object for every incoming IntelMQ message, it will add
4869+
attributes directly to the MISP event. Useful if your want to export just a list of data, e.g.
4870+
C2 domains, without having to group some attributes together. When using flat events, you
4871+
have to define custom mapping to ensure the correct attribute types. By default set to `False`.
4872+
4873+
**Example**
4874+
4875+
For example, if you have a source that sends C2 domains for multiple malware families,
4876+
you can use the following bot's configuration:
4877+
4878+
```yaml
4879+
parameters:
4880+
destination_queues: {}
4881+
# you have to configure your webserver to expose this path for MISP
4882+
output_dir: "/var/lib/intelmq/bots/your_feed/"
4883+
misp_org_name: My Organisation
4884+
misp_org_uuid: Your-Org-UUID
4885+
interval_event: 1 day
4886+
grouping_key: "malware.name"
4887+
bulk_save_count: 100
4888+
additional_info: "{key} - "
4889+
flat_events: true
4890+
attribute_mapping:
4891+
source.fqdn:
4892+
comment: malware.name
4893+
type: domain
4894+
category: "Network activity"
4895+
to_ids: true
4896+
tagging:
4897+
__all__:
4898+
- name: tlp:amber
4899+
```
4900+
4901+
As a result, you will get MISP feed that creates one event per malware family every day. In the event,
4902+
there will be just C2 domains with the IDS flag set and the malware name as comment. In addition, all
4903+
events will be tagged with `tlp:amber` and also have the malware name in the comment, together with
4904+
the information about the time period. The MISP Feed will be saved to disk after accumulating 100 C2
4905+
domains or on reload/restart.
4906+
47634907
**Usage in MISP**
47644908

47654909
Configure the destination directory of this feed as feed in MISP, either as local location, or served via a web server.

0 commit comments

Comments
 (0)