Skip to content

Memgraph in mission critical workloads #1267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 64 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
64 commits
Select commit Hold shift + click to select a range
5e5b886
init
katarinasupe Mar 13, 2025
55ff98c
Merge branch 'main' into memgraph-3-2
katarinasupe Mar 28, 2025
17384d2
Merge branch 'main' into memgraph-3-2
katarinasupe Mar 28, 2025
46cfb1f
Add hardware sizing
Josipmrden Apr 15, 2025
ee9a7bb
Add vm.max_map_count explanation
Josipmrden Apr 15, 2025
de342fb
Add deployment options
Josipmrden Apr 15, 2025
0b741e8
Add flag set suggestions
Josipmrden Apr 15, 2025
e8353d3
Indentation
Josipmrden Apr 15, 2025
b63517e
Remove unnecessary comment
Josipmrden Apr 15, 2025
cf434bb
Add enterprise, queries and import sections
Josipmrden Apr 15, 2025
bfc44eb
Finish general suggestions guide
Josipmrden Apr 15, 2025
d3c71fe
Make under construction notes
Josipmrden Apr 15, 2025
1a5097f
Add todo
Josipmrden Apr 15, 2025
5de8cac
Update property sizes
Josipmrden Apr 16, 2025
d63d9d1
Add backup considerations
Josipmrden Apr 16, 2025
fd35d83
Added overview page
Josipmrden Apr 16, 2025
0042907
Add hardware sizing
Josipmrden Apr 15, 2025
6de5e65
Add vm.max_map_count explanation
Josipmrden Apr 15, 2025
60f1f25
Add deployment options
Josipmrden Apr 15, 2025
01f350e
Add flag set suggestions
Josipmrden Apr 15, 2025
74715a8
Indentation
Josipmrden Apr 15, 2025
a97f41a
Remove unnecessary comment
Josipmrden Apr 15, 2025
e8753c1
Add enterprise, queries and import sections
Josipmrden Apr 15, 2025
bcf7a0c
Finish general suggestions guide
Josipmrden Apr 15, 2025
89ce823
Make under construction notes
Josipmrden Apr 15, 2025
6c5e47f
Add todo
Josipmrden Apr 15, 2025
84f4d0c
Update property sizes
Josipmrden Apr 16, 2025
51f07a4
Add backup considerations
Josipmrden Apr 16, 2025
2d481aa
Added overview page
Josipmrden Apr 16, 2025
0126e49
Merge branch 'memgraph-in-production' of github.com:memgraph/document…
Josipmrden Apr 16, 2025
81d7788
Main
Josipmrden Apr 16, 2025
72f6649
Main
Josipmrden Apr 16, 2025
c27b99d
Newline
Josipmrden Apr 16, 2025
6d1c3c3
Merge branch 'main' into memgraph-in-production
Josipmrden Apr 16, 2025
c58d7c5
Add section for query timeout
Josipmrden Apr 17, 2025
6742f61
Set up lab features
Josipmrden Apr 17, 2025
5cedea6
Sentence case
Josipmrden Apr 17, 2025
2a3aa4e
Remove empty construction pages
Josipmrden Apr 17, 2025
313e61b
Add callout
Josipmrden Apr 17, 2025
1b5c5f3
Update GraphRAG use case
Josipmrden Apr 17, 2025
05e8b71
Finish memgraph in production for graphrag
Josipmrden Apr 24, 2025
2c7e98f
Merge branch 'main' into memgraph-in-production
Josipmrden Apr 24, 2025
52485ac
Update initial page
Josipmrden Apr 24, 2025
7f2fcb2
Add graphrag link
Josipmrden Apr 24, 2025
44518e5
Merge branch 'memgraph-in-production' into evaluating-memgraph
Josipmrden Apr 24, 2025
f3b0c5c
Add page for evaluating memgraph -> mgbench
Josipmrden Apr 24, 2025
53e1526
Address PR comments
Josipmrden Apr 25, 2025
4a4c77a
Merge branch 'memgraph-in-production' into evaluating-memgraph
Josipmrden Apr 25, 2025
f2f4ff7
Add title for evaluating memgraph
Josipmrden Apr 25, 2025
5a828ad
Address PR comments
Josipmrden Apr 25, 2025
ae10ec9
Add contents before the deep-dive
Josipmrden Apr 25, 2025
b07865d
Edit titles
Josipmrden Apr 25, 2025
be53774
Add guidelines for updating the page
Josipmrden Apr 25, 2025
e9ce4ce
Finish guide for high throughput workloads
Josipmrden Apr 26, 2025
7c5fc5a
Omit constructed guide
Josipmrden Apr 26, 2025
bdc9514
Add initial mission critical page
Josipmrden Apr 26, 2025
c43bcd0
Omit guide being constructed
Josipmrden Apr 26, 2025
f6150f7
Dash
Josipmrden Apr 28, 2025
99a1ab2
Merge branch 'main' into memgraph-in-high-throughput-workloads
Josipmrden Apr 28, 2025
9d92156
Merge branch 'main' into memgraph-in-high-throughput-workloads
matea16 Apr 28, 2025
0ef77f4
Address PR commnets
Josipmrden Apr 28, 2025
fc5fbbc
Push memgraph up
Josipmrden Apr 28, 2025
17d25a8
Fix link for benchmarking Memgraph
Josipmrden Apr 28, 2025
4f0d144
Merge branch 'memgraph-in-high-throughput-workloads' into memgraph-in…
Josipmrden Apr 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pages/_meta.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ export default {
"database-management": "Database management",
"deployment": "Deployment",
"clustering": "Clustering",
"memgraph-in-production": "Memgraph in production",
"data-streams": "Data streams",
"help-center": "Help center",
"release-notes": "Release notes",
"memgraph-in-production": "Memgraph in production"
}
10 changes: 7 additions & 3 deletions pages/memgraph-in-production.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -44,14 +44,18 @@ Here are the currently available guides to help you deploy Memgraph effectively:
### [General suggestions](/memgraph-in-production/general-suggestions)
A foundational guide covering universal best practices for any production deployment - recommended reading before anything else.

### [Memgraph in high-throughput workloads](/memgraph-in-production/memgraph-in-high-throughput-workloads)
Scale your write throughput while keeping up with fast-changing, high-velocity graph data.

### [Memgraph in mission-critical workloads](/memgraph-in-production/memgraph-in-mission-critical-workloads)
Suggestions on how to bring your Memgraph to production in mission-critical and high-availability workloads.

### [Memgraph in GraphRAG use cases](/memgraph-in-production/memgraph-in-graphrag)
Learn how to optimize Memgraph for Retrieval-Augmented Generation (RAG) systems using graph data.

## 🚧 Guides in construction
- Memgraph in transactional workloads
- Memgraph in analytical workloads
- Memgraph in mission critical workloads
- Memgraph in high throughput workloads
- Memgraph in supply chain use cases
- Memgraph in cyber security use cases
- Memgraph in fraud detection use cases
Expand All @@ -70,7 +74,7 @@ smooth transition to production.
These guides focus on areas like performance benchmarking, testing, and operational readiness—offering additional tools
and frameworks that can help you get the most out of your Memgraph deployment.

### [📊 Evaluating Memgraph](/memgraph-in-production/evaluating-memgraph)
### [📊 Benchmarking Memgraph](/memgraph-in-production/benchmarking-memgraph)
Learn how to properly **test Memgraph for performance and scalability**.
This guide walks you through performance and stress testing scenarios, benchmarking with real-world data,
and identifying key metrics that can help validate Memgraph’s fit for your application needs.
Expand Down
2 changes: 2 additions & 0 deletions pages/memgraph-in-production/_meta.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
export default {
"general-suggestions": "General suggestions",
"memgraph-in-graphrag": "Memgraph in GraphRAG use cases",
"memgraph-in-high-throughput-workloads": "Memgraph in high-throughput workloads",
"memgraph-in-mission-critical-workloads": "Memgraph in mission-critical workloads",
"benchmarking-memgraph": "Benchmarking Memgraph",
}
2 changes: 1 addition & 1 deletion pages/memgraph-in-production/general-suggestions.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ based on your specific use case.
8. [Backup considerations](#backup-considerations) <br />
Learn about how to preserve your data in Memgraph to prevent any data loss.

9. [Importing mechanisms](#importing-mechanisms) <br />
9. [Importing mechanisms](#importing-mechanisms) <br />
Discover the best methods for importing your dataset into Memgraph, including Cypher queries, bulk loading, and integrations with other data sources.

10. [Enterprise features you might require](#enterprise-features-you-might-require) <br />
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,236 @@
---
title: Memgraph in high-throughput workloads
description: Suggestions on how to bring your Memgraph to production in high-throughput workloads.
---

import { Callout } from 'nextra/components'
import { CommunityLinks } from '/components/social-card/CommunityLinks'

# Memgraph in high-throughput workloads

<Callout type="info">
👉 **Start here first**
Before diving into this guide, we recommend starting with the [**General suggestions**](/memgraph-in-production/general-suggestions)
page. It provides **foundational, use-case-agnostic advice** for deploying Memgraph in production.

This guide builds on that foundation, offering **additional recommendations tailored to specific workloads**.
In cases where guidance overlaps, consider the information here as **complementary or overriding**, depending
on the unique needs of your use case.
</Callout>

## Is this guide for you?

This guide is for you if you're working with **high-throughput graph workloads** where performance, consistency,
and scale are critical.
You’ll benefit from this content if:

- ⚡ You’re handling **more than a thousand writes per second**, and your graph data is constantly changing at high velocity.
- 🔍 You want your **read performance to remain consistent**, even as new data is continuously ingested.
- 🔁 You’re dealing with **high volumes of concurrent reads and writes**, and need a database that can handle both without performance degradation.
- 🌊 Your data is flowing in from **real-time streaming systems** like **Kafka**, and you need a database that can keep up.

If this sounds like your use case, this guide will walk you through how to configure and scale Memgraph for **reliable, high-throughput performance** in production.

## Why choose Memgraph for high-throughput use cases?

When your workload involves thousands of writes per second and concurrent access to ever-changing graph data,
Memgraph provides the performance and architecture needed to keep up—without compromise.

Here's why Memgraph is a great fit for high-throughput use cases:

- **In-memory storage engine**: Memgraph operates entirely in-memory, eliminating the need to write to disk on every transaction.
This allows it to **scale write throughput far beyond traditional disk-based databases**. Unlike systems that rely on LRU
or OS-level caching, where **cache invalidation can degrade read performance during heavy writes**, Memgraph offers
**predictable read latency** even under constant data changes.

While many graph databases **max out around 1,000 writes per second**, Memgraph can handle **up to 50x more**
(see image below), making it ideal for **high-velocity, write-intensive workloads**.

![](/pages/memgraph-in-production/benchmarking-memgraph/realistic-workload.png)

- **Non-blocking reads and writes with MVCC**: Built on multi-version concurrency control (MVCC),
Memgraph ensures that **writes don’t block reads** and **reads don’t block writes**, allowing each to scale independently.

- **Fine-grained locking** : Locking happens at the node and relationship level, enabling **highly concurrent writes**
and minimizing contention across threads.

- **Lock-free skiplist storage**: Memgraph uses **lock-free, concurrent skip list structures** for storing nodes,
relationships, and indices, leading to faster data access and minimal coordination overhead between threads.

- **Snapshot isolation by default**: Unlike many databases that rely on **read-committed** isolation
(which can return inconsistent data), Memgraph provides **snapshot isolation**, ensuring data accuracy and
consistency in real-time queries.

- **Inter-query parallelization**: Each read and write query is handled on its own CPU core, meaning Memgraph can
**scale horizontally on a single machine** based on your hardware.

- **Horizontal read scaling with high availability**: Memgraph supports [replication](/clustering/replication) and
[high availability](/clustering/high-availability), allowing you to distribute **read traffic across multiple replicas**.
These replicas can also power **secondary workloads** like GraphRAG, analytics, or ML pipelines, **without affecting
the performance of the main write-intensive instance**.

## What is covered?

The suggestions for high-throughput workloads **complement** several key sections in the
[general suggestions guide](/memgraph-in-production/general-suggestions). These sections offer important context and
additional best practices tailored for performance, stability, and scalability in high-throughput systems:

- [Choosing the right Memgraph flag set](#choosing-the-right-memgraph-flag-set) <br />
Memgraph offers specific flags to optimize streaming graph updates.

- [Choosing the right Memgraph storage mode](#choosing-the-right-memgraph-storage-mode) <br />
Guidance on selecting the optimal **storage mode** for high-throughput use cases, depending on whether your focus is
analytical speed or transactional safety.

- [Importing mechanisms](#importing-mechanisms) <br />
With multithreaded writes, learn how to avoid write-write conflicts. Connect your streaming sources to Memgraph.

- [Enterprise features you might require](#enterprise-features-you-might-require) <br />
Understand which **enterprise features** — such as high availability, and dynamic graph algorithms will keep your real-time use case smooth.

- [Queries that best suit your workload](#queries-that-best-suit-your-workload)
Learn how to optimize update queries coming at the database.

## Choosing the right Memgraph flag set

When streaming data from systems like Kafka, the incoming payload is often **standardized**,
meaning that even when a node or relationship is updated, **some property values might remain unchanged**.

By default, Memgraph sets the flag `--storage-delta-on-identical-property-update=true`, which **updates all properties**
of a node or relationship during an update, even if the new value is identical to the existing one.
This can introduce unnecessary write overhead.

To optimize for **higher throughput** in scenarios where most incoming updates do not change all property values,
it’s recommended to set:

```bash
--storage-delta-on-identical-property-update=false
```

With this setting, Memgraph will **only create delta records for properties that have actually changed**,
reducing internal write operations and improving overall system throughput—especially important in high-velocity
streaming use cases.

## Choosing the right Memgraph storage mode

High-throughput scenarios in Memgraph can run effectively on both `IN_MEMORY_TRANSACTIONAL` and `IN_MEMORY_ANALYTICAL`
storage modes, depending on your specific workload requirements.

If your workload meets the following conditions:

- You are **updating properties** on existing nodes and relationships,
- You are **appending** new nodes and relationships to the graph,
- You are **not performing deletes**,

then it may be worth considering switching to `IN_MEMORY_ANALYTICAL` mode.
This mode allows **writes to be multithreaded**, unlocking **near limitless write speeds** by parallelizing ingestion
across CPU cores.

However, keep in mind:

- If you require **replication**, **high availability**, or **ACID guarantees**, you must use `IN_MEMORY_TRANSACTIONAL` mode.
- `IN_MEMORY_ANALYTICAL` is optimized for **bulk ingestion** and **read-only analytics**, but it
**does not support transactional rollback**, as it doesn’t create delta objects during writes.
Additionally, **WALs (write-ahead logs) are not generated** in this mode, meaning recovery relies solely on **snapshot creation**.

Learn more about [storage modes](/fundamentals/storage-memory-usage#storage-modes) in our documentation.

## Importing mechanisms

Memgraph natively supports **Apache Kafka** and **Apache Pulsar** consumers to directly ingest
[data streams](/data-streams). However, depending on the system architecture and flexibility
requirements, some users prefer building **custom Kafka consumers** that programmatically push data into Memgraph.

### Handling write-write conflicts
If you are ingesting data from **multiple topics** or multiple producers, it's important to
**handle potential write-write conflicts**—especially when running in `IN_MEMORY_TRANSACTIONAL` mode.
You can learn more about these scenarios in our
[conflicting transactions documentation](/help-center/errors/transactions#conflicting-transactions).

To safely manage these conflicts:

- Use a driver that supports **managed transactions** with automatic retries. For example, **Neo4j drivers**
offer an `execute_write()` method that **automatically retries transient errors** (errors that can be safely retried without client-side intervention).
- Wrap your **write operations inside retryable transactions** if there are **multiple concurrent writers**.
- If you **only have a single writer**, you generally don’t need to worry about transient errors or retries.

### Idempotency concept
In addition, when working with streams, **unexpected errors may require replaying the stream**.
To ensure **data consistency even after multiple replays**, it's crucial to make your ingestion logic **idempotent**.
We strongly recommend using **`MERGE` clauses** instead of `CREATE` when inserting data into Memgraph.
`MERGE` ensures that nodes and relationships are **created only if they don't already exist**, preventing
duplicates and keeping your graph clean no matter how many times the same event is replayed.

## Enterprise features you might require

For production-grade high-throughput deployments, you may need advanced capabilities to ensure
**availability**, **data management**, and **tenant isolation**. Memgraph offers several enterprise features
designed to support these needs:

- **Replication, high availability, and automatic failover**
If you require your system to be **available at all times**, Memgraph supports
[clustering and high availability](/clustering/high-availability), allowing you to minimize downtime and recover
automatically from failures.

- **Node and relationship TTL (time-to-live)**
In high-ingestion environments, you may need to automatically **clean up stale data** after a certain retention period.
Memgraph supports [time-to-live (TTL)](/querying/time-to-live) mechanisms for both **nodes** and **relationships**,
ensuring your graph remains manageable and efficient over time.

- **Multi-tenancy**
Some workloads require a **separate graph database per customer** to ensure strict data isolation and security.
Memgraph supports [multi-tenancy](/database-management/multi-tenancy), enabling you to manage multiple independent
graphs within a single Memgraph instance.

## Queries that best suit your workload

When ingesting data from streams, it's best to keep your Cypher queries **simple, idempotent, and efficient**. A typical ingestion query should look like:

```cypher
MERGE (n:Label) SET n += $row;
```

This approach ensures **idempotency** (safe reprocessing of the same events) and **minimizes query execution time** by keeping the transaction lightweight.
Keep in mind that adding **complex business logic** or **customization** to the ingestion queries will **increase query latency**, so it's always a good practice to **profile your queries** using Memgraph’s [`PROFILE` tool](/querying/clauses/profile) to understand and optimize performance.

---

### Dynamic labels and edge types

Memgraph also supports:

- [Dynamic node label creation](/querying/clauses/create#14-creating-node-labels-dynamically)
- [Dynamic relationship type creation](/querying/clauses/create#23-creating-relationship-types-dynamically)

However, **dynamic creation is only supported with `CREATE` operations**, and **matching or merging dynamically created
labels and types is not supported**.

If your payload contains **dynamic labels or edge types** and you still need **idempotency**, you have two options:

- **Programmatically construct your Cypher query strings** based on the payload to ensure correct label/type usage before sending the query to Memgraph.
- **Optionally use the [`merge`](/advanced-algorithms/available-algorithms/merge) procedure from MAGE**

<Callout type="warning">
Note: While MAGE procedures are **written in C++ and highly optimized**, they still introduce **slightly more overhead**
compared to **pure Cypher**, as they are executed as external modules. We recommend favoring pure Cypher when
possible for the **highest performance**.
</Callout>

### Using `convert.str2object` for parsing nested properties

When working with streamed data, sometimes your incoming payload contains **serialized JSON strings** that need to be
transformed into property maps inside your graph.
Memgraph provides the [`convert.str2object` function](/querying/functions#conversion-functions) to easily handle this scenario.

Example usage:

```cypher
WITH convert.str2object('{"name":"Alice", "age":30}') AS props
MERGE (n:Person)
SET n += props;
```

This function **parses a JSON-formatted string into a Cypher map**, making it very useful for flexible ingestion pipelines
where the payload structure might vary slightly or be semi-structured.

<CommunityLinks/>
Loading