You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/how/restore-indices.md
+32Lines changed: 32 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,38 @@ By default, restoring the indices from the local database will not remove any ex
11
11
the search and graph indices that no longer exist in the local database, potentially leading to inconsistencies
12
12
between the search and graph indices and the local database.
13
13
14
+
## Configuration
15
+
16
+
The upgrade jobs take arguments as command line args to the job itself rather than environment variables for job specific configuration. The RestoreIndices job is specified through the `-u RestoreIndices` upgrade ID parameter and then additional parameters are specified like `-a batchSize=1000`.
17
+
The following configurations are available:
18
+
19
+
### Time-Based Filtering
20
+
21
+
*`lePitEpochMs`: Restore records created before this timestamp (in milliseconds)
22
+
*`gePitEpochMs`: Restore records created after this timestamp (in milliseconds)
23
+
24
+
### Pagination and Performance Options
25
+
26
+
*`urnBasedPagination`: Enable key-based pagination instead of offset-based pagination. Recommended for large datasets as it's typically more efficient.
27
+
*`startingOffset`: When using default pagination, start from this offset
28
+
*`lastUrn`: Resume from a specific URN when using URN-based pagination
29
+
*`lastAspect`: Used with lastUrn to resume from a specific aspect, preventing reprocessing
30
+
*`numThreads`: Number of concurrent threads for processing restoration, only used with default offset based paging
31
+
*`batchSize`: Configures the size of each batch as the job pages through rows
32
+
*`batchDelayMs`: Adds a delay in between each batch to avoid overloading backend systems
33
+
34
+
### Content Filtering
35
+
36
+
*`aspectNames`: Comma-separated list of aspects to restore (e.g., "ownership,status")
37
+
*`urnLike`: SQL LIKE pattern to filter URNs (e.g., "urn:li:dataset%")
38
+
39
+
### Nuclear option
40
+
*`clean`: This option wipes out the current indices by running deletes of all the documents to guarantee a consistent state with SQL. This is generally not recommended unless there is significant data corruption on the instance.
41
+
42
+
### Helm
43
+
44
+
These are available in the helm charts as configurations for Kubernetes deployments under the `datahubUpgrade.restoreIndices.args` path which will set them up as args to the pod command.
45
+
14
46
## Quickstart
15
47
16
48
If you're using the quickstart images, you can use the `datahub` cli to restore the indices.
0 commit comments