You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* add guide to create cohort via CSV
* remove reference to record_collect_fonts
* add time_event to Javascript SDK
* add clarification about not set values for UTM tracking and marketing attribution
* update pie chart support for comparisons
* add requirement to send events to merge IDs for Simplified API
* add deduplication mechanism to dev docs
* add future timestamp correction warning
* spelling
Event deduplication allows a project to send the same exact event while only recording that event once.
2
-
Deduplication only occurs when a subset of the event data is exactly identical.
1
+
Mixpanel provides an event deduplication mechanism to ensure that duplicate events do not skew your analytics. Deduplication is essential when events may be sent multiple times due to network retries, client-side batching, or integration with multiple data sources.
2
+
3
+
<br />
4
+
5
+
## How Deduplication Works
6
+
7
+
Mixpanel deduplicates events using a combination of four key event properties:
8
+
9
+
- Event Name (`event`)
10
+
- Distinct ID (`distinct_id`)
11
+
- Timestamp (`time`)
12
+
- Insert ID (`$insert_id`)
13
+
14
+
If all four of these properties are identical across two or more events, Mixpanel considers them duplicates and will only show the most recent version of that event in your reports. This applies regardless of whether the events are sent via SDKs, APIs, or other integrations.
15
+
16
+
The `$insert_id` should be a randomly generated, unique value for each event to ensure proper deduplication. If `$insert_id` are reused, events may be unintentionally deduplicated.
17
+
18
+
Only the four key event properties listed above are used for deduplication. Additional event properties are not considered for the deduplication mechanism. For example, if two events share the same Event Name, Distinct ID, Timestamp, and Insert ID, but have different $city value, they are still considered duplicate events.
19
+
20
+
### Deduplication Example
21
+
22
+
Deduplication occurs when a subset of the event data (event name, distinct_id, timestamp, $insert_id) is identical. Other event properties are not considered.
"2-2": "The value of `distinct_id` will be treated as a string, and used to uniquely identify a user associated with your event. If you provide a distinct_id property with your events, you can track a given user through funnels and distinguish unique users for retention analyses. You should always send the same distinct_id when an event is triggered by the same user.",
@@ -27,32 +49,56 @@ Deduplication only occurs when a subset of the event data is exactly identical.
27
49
"5-2": "A unique UUID tied to exactly one occurrence of an event."
28
50
},
29
51
"cols": 3,
30
-
"rows": 6
52
+
"rows": 6,
53
+
"align": [
54
+
"left",
55
+
"left",
56
+
"left"
57
+
]
31
58
}
32
59
[/block]
33
60
34
-
In other words, each event containing an $insert_id is checked for duplication after being minimized to the following shape:
61
+
62
+
In other words, each event containing an `$insert_id` is checked for duplication after being minimized to the following shape:
If this simplified object is an exact match to any other simplified event it is marked as a duplicate. Ingested events that have been marked as a duplicate will be deleted within 24 hours.
76
+
If this minimized event object is an exact match to any other minimized event object, it is marked as a duplicate. Ingested events that have been marked as a duplicates will be deduplicated.
49
77
50
-
If an event is sent to the Ingestion API without an `$insert_id` one will be generated for it. However, it will not qualify for the deduplication process.
78
+
If an event is sent to the Ingestion API without an `$insert_id`, one will be generated for it. However, it will not qualify for the deduplication process.
51
79
52
-
[block:callout]
53
-
{
54
-
"type": "warning",
55
-
"title": "Deduplication does not rewrite data",
56
-
"body": "Using $insert_id is only used to prevent duplicate event data. It cannot be used to update, replace, or delete existing events."
57
-
}
58
-
[/block]
80
+
## Deduplication Mechanisms
81
+
82
+
Mixpanel uses two main deduplication processes:
83
+
84
+
### Query-Time Deduplication
85
+
86
+
- When: Happens immediately when you query data in the Mixpanel UI.
87
+
- How: If multiple events share the same event_name, distinct_id, timestamp, and $insert_id, only the most recent version of the event is shown in reports (based on the API ingestion time). This ensures that duplicate events do not affect your analytics in real time.
88
+
- Scope: This deduplication is visible in the Mixpanel UI and reports, but not in raw data exports. Raw event export will contain all data as they were ingested, without any deduplication.
89
+
90
+
### Compaction-Time Deduplication
91
+
92
+
- When: Runs periodically in the backend, typically after a few hours and again after about 20 days, once data ingestion for a day is complete.
93
+
- How: During compaction, Mixpanel scans for events with the same event name, distinct_id, and $insert_id (timestamp does not need to match exactly, just the same calendar day). The older event is deleted, and only the latest remains in storage.
94
+
- Scope: This process helps reduce storage of duplicate events and may affect event counts if duplicates were present with different timestamps
95
+
96
+
<br />
97
+
98
+
## Important Notes
99
+
100
+
**Raw Event Export** - Deduplication is not applied to raw data exports. If you export events via the API, you may see duplicates. It is recommended to apply the same deduplication logic (event name, distinct_id, timestamp, $insert_id) to your exported data
101
+
102
+
**Insert ID Best Practice** - Always generate a unique $insert_id for each event. Reusing $insert_id (e.g., setting it to the user’s distinct_id) can cause unintended deduplication and data loss
103
+
104
+
**Deduplication Timing** - Query-time deduplication is immediate. Compaction-time deduplication timing is not guaranteed and may take hours to days to complete.
Copy file name to clipboardExpand all lines: openapi/src/ingestion.openapi.yaml
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -101,7 +101,7 @@ paths:
101
101
time:
102
102
type: integer
103
103
title: time
104
-
description: The time at which the event occurred, in seconds or milliseconds since UTC epoch.
104
+
description: The time at which the event occurred, in seconds or milliseconds since UTC epoch. If the time value is set in the future, it will be overwritten with the current present time at ingestion.
105
105
distinct_id:
106
106
type: string
107
107
title: distinct_id
@@ -163,7 +163,7 @@ paths:
163
163
time:
164
164
type: integer
165
165
title: time
166
-
description: The time at which the event occurred, in seconds or milliseconds since UTC epoch.
166
+
description: The time at which the event occurred, in seconds or milliseconds since UTC epoch. If the time value is set in the future, it will be overwritten with the current present time at ingestion.
Copy file name to clipboardExpand all lines: pages/docs/data-structure/user-profiles.mdx
+2Lines changed: 2 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -159,9 +159,11 @@ See here for more on how to [import](https://docs.mixpanel.com/docs/tracking-met
159
159
Historical properties can be used anywhere that regular profile properties can be used.
160
160
161
161
For eg, when you apply breakdown by historical plan-type property, the property value will be picked based on the time of the event, instead of the current property value.
162
+
162
163

163
164
164
165
When you hover over a historical property, the context menu that pops up will show that the property was sourced from a history table, as well as the name of the source. This means that the value of the property used in charts can vary over time.
Copy file name to clipboardExpand all lines: pages/docs/features/attribution.mdx
+6-2Lines changed: 6 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ import { Callout } from 'nextra/components'
10
10
11
11
Attribution helps teams attribute conversion credit to the touchpoints in a user journey, whether it's to the first or last touch (single-touch attribution models) or to multiple touchpoints using a multi-touch attribution model like U-shape or Linear.
12
12
13
-
Let’s consider an example user journey:
13
+
Consider the following example user journey:
14
14
1. A user sees an ad for a product on Facebook
15
15
2. The user clicks on the ad and is taken to the product page on the company's website
16
16
3. The user adds the product to their cart and begins the checkout process
@@ -64,7 +64,7 @@ If you use a Mixpanel js-sdk, we’ve updated our sdk to track utm parameters mo
64
64
-**Attributed by property:** This is the property on a touchpoint event that we use for the attribution model. The canonical example is utm_source
65
65
-**Lookback window:** The time window where a user's events with this attribution property are counted towards the calculation. The window ends when the conversion metric happens.
66
66
67
-
## Frequently Asked Questions
67
+
## FAQ
68
68
69
69
### How does Mixpanel compute attribution under the hood?
70
70
@@ -146,3 +146,7 @@ NOTE: You can apply a filter on an attribution property only after an attributio
146
146
- Step 1: Turn on Attribution analysis by going to the breakdown section and choosing `Attributed by..` and property `XYZ`
147
147
- Step 2 (a): Once attribution model has been applied, go to the filter section and choose the computed property `Attributed by XXX`. You can apply an attribution filter only on the property used in the attribution breakdown
148
148
- Step 2 (b): Once attribution model has been applied, click on the chart bar and filter/exclude the segments as needed
149
+
150
+
### What does the "(not set)" attribution segment mean?
151
+
152
+
You may see a "(not set)" segment in your report when using the Attribution feature. This occurs when the attribution property is missing from all events being evaluated for the user.
Copy file name to clipboardExpand all lines: pages/docs/session-replay/implement-session-replay/session-replay-web.mdx
-1Lines changed: 0 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -100,7 +100,6 @@ mixpanel.init(
100
100
| --- | --- | --- |
101
101
|`record_block_class`| CSS class name or regular expression for elements which will be replaced with an empty element of the same dimensions, blocking all contents. |`new RegExp('^(mp-block\|fs-exclude\|amp-block\|rr-block\|ph-no-capture)$')` <br/> (common industry block classes) |
102
102
|`record_block_selector`| CSS selector for elements which will be replaced with an empty element of the same dimensions, blocking all contents. |`"img, video"`|
103
-
|`record_collect_fonts`| When true, Mixpanel will collect and store the fonts on your site to use in playback. |`false`|
104
103
|`record_idle_timeout_ms`| Duration of inactivity in milliseconds before ending a contiguous replay. A new replay collection will start when active again. |`1800000`<br/>(30 minutes) |
105
104
|`record_mask_text_class`| CSS class name or regular expression for elements that will have their text contents masked. |`new RegExp('^(mp-mask\|fs-mask\|amp-mask\|rr-mask\|ph-mask)$')` <br/> (common industry mask classes) |
106
105
|`record_mask_text_selector`| CSS selector for elements that will have their text contents masked. |`"*"`|
Copy file name to clipboardExpand all lines: pages/docs/tracking-best-practices/traffic-attribution.mdx
+7-1Lines changed: 7 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -23,6 +23,12 @@ Mixpanel's Javascript library will also track initial_utm_parameters as a profil
23
23
24
24
UTM parameters are by default persisted across events as [Super Properties](/docs/tracking-methods/sdks/javascript#setting-super-properties). To opt in to the recommended modern behavior most compatible with our [attribution models](/docs/features/attribution), use the SDK initialization option `{stop_utm_persistence: true}` to disable UTM param persistence (refer to our [Release Notes](https://github.com/mixpanel/mixpanel-js/releases/tag/v2.52.0) in GitHub).
25
25
26
+
#### Organic Traffic
27
+
28
+
If a user arrives at your landing page organically, no UTM tags will be parsed because the URL does not contain them. As a result, the UTM property will be absent from the events and will appear as "(not set)" when used as a breakdown in a report. You can interpret a "(not set)" value for any UTM property as indicating organic or direct traffic.
29
+
30
+
Learn more about falsy values [here](/docs/data-structure/property-reference/data-type#undefined-and-null).
31
+
26
32
### Initial Referrer and Initial Referring Domain Properties
27
33
28
34
Mixpanel's Javascript library will track Initial Referrer and Initial Referring Domain and append them as a property to any event that a user completes. These properties are stored in the Mixpanel cookie the first time a user comes to your site and will not change on future site visits as long as the cookie is not cleared.
@@ -33,7 +39,7 @@ Having this information allows you to build reports to see how users from differ
33
39
34
40
#### $direct
35
41
36
-
An initial referrer is equal to $direct when a user first lands on a site without being referred by another website. The user may have typed the website address directly, clicked a bookmark, clicked a link from an email, or might have security settings in their browser that prevent referrer data from being passed.
42
+
An initial referrer is equal to `$direct` when a user first lands on a site without being referred by another website. The user may have typed the website address directly, clicked a bookmark, clicked a link from an email, or might have security settings in their browser that prevent referrer data from being passed.
Copy file name to clipboardExpand all lines: pages/docs/tracking-methods/id-management/identifying-users-simplified.mdx
+5-4Lines changed: 5 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -54,11 +54,12 @@ If an event contains a `$user_id`, the value of the `$user_id` will be set as th
54
54
55
55
## Client-side Identity Management
56
56
57
-
If using our Web/Mobile SDKs or a CDP like Segment or Rudderstack, there are only 2 steps:
58
-
1. Call `.identify(<user_id>)` when a user signs up or logs in. Pass in the user's known identifier (eg: their ID from your database).
59
-
2. Call `.reset()` when a user logs out.
57
+
If using our Web/Mobile SDKs or a CDP like Segment or Rudderstack, there are only 2 steps to identity management:
58
+
1. Call `.identify(<user_id>)` when a user signs up or logs in, passing in the user's known identifier (eg: their ID from your database).
59
+
2. Send at least one event after the `.identify()` call. This is necessary to get the `$user_id` and `$device_id` to merge. Learn more about [the merge mechanism above](/docs/tracking-methods/id-management/identifying-users-simplified#mechanism).
60
+
3. Call `.reset()` when a user logs out.
60
61
61
-
- Any events prior to calling `.identify` are considered anonymous events. Mixpanel's SDKs will generate a `$device_id` to associate these events to the same anonymous user. By calling `.identify(<user_id>)` when a user signs up or logs in, you're telling Mixpanel that `$device_id` belongs to a known user with ID `user_id`.
62
+
- Any events prior to calling `.identify()` are considered anonymous events. Mixpanel's SDKs will generate a `$device_id` to associate these events to the same anonymous user. By calling `.identify(<user_id>)` when a user signs up or logs in, you're telling Mixpanel that `$device_id` belongs to a known user with ID `user_id`.
62
63
63
64
- Under the hood, Mixpanel will stitch the event streams of those users together. This works even if a user has multiple anonymous sessions (eg: on desktop and mobile). As long as you always call `.identify` when the user logs in, all of that activity will be stitched together.
0 commit comments