Option to reset tries in `ping()` #40

alecgibson · 2021-06-25T15:07:18Z

The ping() method can be useful for setting up recurring jobs if we
deliberately avoid acking the job. For example:

Submit job
Pull job from queue
Process job
ping()
Go to Step 2

A real-world example of this might be notifying for recurring
appointments, or setting up long-running, cross-process, periodic jobs.

The main advantage this has over using ack() and add() is that it
effectively requeues a job in a single, atomic commit. If we tried the
above with ack() and add():

Submit job
Pull job from queue
Process job
ack()
add()
Go to Step 2

In this version, the process could crash or quit between Steps 4 & 5,
and our recurring job would be lost.

We could also try inverting Steps 4 & 5, but then we get the opposite
issue: if the process crashes or quits, then we might accidentally
duplicate our recurring job. It also prevents us from setting up any
unique indexes on our payload.

Using ping() perfectly solves this problem: there's only ever one
version of the job, and it's never dropped (because it's never acked).
If the process crashes before we ping(), we'll retry it, as with any
other normal job.

The one issue with this approach is that tries will steadily increase,
and - if you have maxRetries set up - the job will eventually be moved
to the dead queue, which isn't what we want.

This change adds an option to the ping() method: resetTries, which
will reset tries to zero, so that the job is treated like a "new" job
when it's pinged, and is only moved to the dead queue if it's genuinely
retried.

`package-lock.json` shouldn't be committed for libraries, since it's actively ignored by consuming apps.

The `ping()` method can be useful for setting up recurring jobs if we deliberately avoid acking the job. For example: 1. Submit job 2. Pull job from queue 3. Process job 4. `ping()` 5. Go to Step 2 A real-world example of this might be notifying for recurring appointments, or setting up long-running, cross-process, periodic jobs. The main advantage this has over using `ack()` and `add()` is that it effectively requeues a job in a single, atomic commit. If we tried the above with `ack()` and `add()`: 1. Submit job 2. Pull job from queue 3. Process job 4. `ack()` 5. `add()` 6. Go to Step 2 In this version, the process could crash or quit between Steps 4 & 5, and our recurring job would be lost. We could also try inverting Steps 4 & 5, but then we get the opposite issue: if the process crashes or quits, then we might accidentally duplicate our recurring job. It also prevents us from setting up any unique indexes on our `payload`. Using `ping()` perfectly solves this problem: there's only ever one version of the job, and it's never dropped (because it's never acked). If the process crashes before we `ping()`, we'll retry it, as with any other normal job. The one issue with this approach is that `tries` will steadily increase, and - if you have `maxRetries` set up - the job will eventually be moved to the dead queue, which isn't what we want. This change adds an option to the `ping()` method: `resetTries`, which will reset `tries` to zero, so that the job is treated like a "new" job when it's pinged, and is only moved to the dead queue if it's genuinely retried.

Option to reset tries in `ping()`

⬆️ Bump `mongodb`

The `mongodb` Node.js driver deprecated use of `returnOriginal` in favour of `returnDocument` in [v3.6][1]. This non-breaking change allows consumers to opt in to using the newer `returnDocument` by setting an option on construction ```js var queue = mongoDbQueue(db, 'queue', { returnDocument : true }) ``` [1]: mongodb/node-mongodb-native#2808

👽 Add option to use `returnDocument`

This change bumps the `mongodb` dependency to v4, and: - removes the `returnDocument` option: the deprecated version is no longer supported for v4, and we drop its support here - update `add()` to check `insertedIds`, since `ops` was removed in v4

💥 Require `mongodb` v4

nataliataylor · 2021-11-16T17:57:28Z

The ping() method can be useful for setting up recurring jobs if we deliberately avoid acking the job. For example:

Submit job

Pull job from queue

Process job

ping()

Go to Step 2

A real-world example of this might be notifying for recurring appointments, or setting up long-running, cross-process, periodic jobs.

The main advantage this has over using ack() and add() is that it effectively requeues a job in a single, atomic commit. If we tried the above with ack() and add():

Submit job

Pull job from queue

Process job

ack()

add()

Go to Step 2

In this version, the process could crash or quit between Steps 4 & 5, and our recurring job would be lost.

We could also try inverting Steps 4 & 5, but then we get the opposite issue: if the process crashes or quits, then we might accidentally duplicate our recurring job. It also prevents us from setting up any unique indexes on our payload.

Using ping() perfectly solves this problem: there's only ever one version of the job, and it's never dropped (because it's never acked). If the process crashes before we ping(), we'll retry it, as with any other normal job.

The one issue with this approach is that tries will steadily increase, and - if you have maxRetries set up - the job will eventually be moved to the dead queue, which isn't what we want.

This change adds an option to the ping() method: resetTries, which will reset tries to zero, so that the job is treated like a "new" job when it's pinged, and is only moved to the dead queue if it's genuinely retried.

Why don't just have a new method resetTries with an option {delayed: x}?

⚡ Add deleted index

[`mongodb@5`][1] drops support for callbacks, which breaks this library, which is all written with callbacks. This is a **BREAKING** change which drops callback support from this library as well, and fully embraces promises through `async`/`await` syntax in both the library code and test code. This allows us to support both `mongodb@4` and `mongodb@5`. [1]: https://github.com/mongodb/node-mongodb-native/releases/tag/v5.0.0

💥 Promisify and add `mongodb@5` support

This is a breaking change that moves this library to TypeScript. It's breaking because the import style has slightly changed due to the transpiler. Other than that, everything else is the same. We also: - drop our `-reedsy` version suffix, since we've now adapted this fork so heavily that it's unlikely we'll merge from upstream again - add an `.npmignore` file to avoid publishing tests and TypeScript files

👷‍♀️ Bump GitHub Actions versions

💥 Move to Typescript

`npm prepublish` was deprecated. This change updates the release to use `npm prepare` instead

🚨 Add linting

Run the build against different versions of the `mongodb` driver

This change adds support for [`mongodb@6`][1]. The main change that affects this library is that `findOneAndUpdate()` now returns the document itself by default, and metadata must be explicitly requested with `includeResultMetadata: true`. [1]: https://github.com/mongodb/node-mongodb-native/releases/tag/v6.0.0

⬆️ Add support for `mongodb@6`

MongoDB 4.4 will be [end-of-lifed in February][1]. This change runs the build against MongoDB 5.0 in readiness for upgrading our Staging and Production replica sets. [1]: https://www.mongodb.com/support-policy/lifecycles

At the moment when we pass payload like this: ```ts {someProp: undefined} ``` It stores it mongoDb Db, but it auto convert it to null value, however it might be error prone as we may enqueue job that has `callbackUrl: undefined`, then the mongo queue saves in db, which converts it to `callbackUrl: null`, then when getting the message from queue worker validate the job payload and has type `t.partial({callbackUrl: t.string}})`. The validation will fail.

🐛 Remove undefined props

At the moment, we have two Github Action workflows: - `test.yml`: runs build and test, then tags when bumping the version in `main` - `publish.yml`: releases the package when a new tag is published The issue with this setup is that the built-in `GITHUB_TOKEN` [will not trigger another workflow][1], so we had to add a separate PAT with write permissions to our repos, which was a bit of a security concern. In order to avoid the need for this extra token, with its associated risks and administrative overheads (like rotating), this change combines our workflows into a single workflow. We tweak the `tag.sh` to `release.sh`, and it's now also in charge of publishing (since it knows when we've pushed a new tag). [1]: https://docs.github.com/en/actions/using-workflows/triggering-a-workflow#triggering-a-workflow-from-a-workflow

Default exports have [bad interoperability][1] between Common JS and ES Module imports, so the safest thing to do is just remove them. This is a **BREAKING** change which removes our export default statements, and adds a linter rule to prevent us from adding them. [1]: evanw/esbuild#1719 (comment)

💥 Remove default exports

At the moment, when calling `get()`, we: - query on `deleted` and `visible` - sort on `_id` We have indexes for both of these things, but separately, which results in an inefficient query, since MongoDB will have to check both indexes and can result in a complete index scan, which isn't particularly great. We sort by `_id` presumably to get the oldest job (since `_id` is an `ObjectId` whose sort order correlates to creation time). However, we probably actually want the job that was visible first. This change updates to sort to use `visible`, which also means that the query and sort can use the same index.

⚡️ Sort `get()` by `visible`

At the moment, the `ack` is a completely random hex string. This change updates it to use an `ObjectId`, which: - encodes information about the time the job was pulled (for the first time) - ...and can therefore be used to sort jobs in the order they were originally pulled from the queue - allows us to simplify the dependencies and code a little

🗃 Change `ack` to (string) `ObjectId`

This is a non-breaking change that forms part of the migration path to improving our performance in #15 When acking a job, we now unset the `visible` property, which makes `visible` and `deleted` mutually exclusive properties, so that the presence of one implies the absence of the other. We can do this, because the only time we query on `visible` is when we *also* query for `deleted: {$exists: false}`. Similarly, the only times we query for `deleted: {$exists: true}` are times when we don't query `visible`.

🗃 Remove `visible` when acking jobs

This is a **BREAKING** change that will change the `deleted` field from a `string` to a `Date`, which will let us set up a TTL index on it. Although this change is technically breaking, it should be deployable with no impact on Production, since we only ever query `deleted` using the `$exists` operator. In order to leverage the TTL index, we'll need to: 1. Deploy this change 2. Migrate all existing `deleted` fields to `Date`: ```js db.collection.update( {deleted: {$type: "string"}}, [{$set: {deleted: {$toDate: "$deleted"}}}] ) ``` 3. Change the existing `deleted` index to TTL: ```js db.runCommand({ collMod: '...', index: { name: 'deleted_1', expireAfterSeconds: 2592000, // ~ 1 month } }) ```

💥 Change `deleted` from `string` to `Date`

This is a **BREAKING CHANGE** that aims to improve database performance by changing the queries and indexes used to perform operations. There will be no changes visible to consumers of the JavaScript API. The break will: - require consumers to run a migration query before upgrading - add new indexes - allow dropping of an old index More details on migration are at the bottom of this commit message. Motivation ---------- This library seems to have been written with the assumption that its collections are small. However, we have hundreds of thousands of jobs in various queues on Production, which causes some slow queries because of the design choices made in the schema of this library. In particular, we aim to address two issues: 1. `{deleted: {$exists: boolean}}` calls are inefficient, but used by practically every query in this library 2. counting inflight jobs has awful performance when there are many available jobs The result is that no further filtering is needed beyond the index on any of the issued queries. `$exists` --------- MongoDB's `$exists` operator has [tricky index performance][1]. In the best cases, `$exists: true` can use the index, but only if it's sparse, and `$exists: false` can **never** just use the index: it always needs to fetch documents. In order to avoid the constant use of `$exists` in this library, we rely on a logical paradigm shift: we `$unset` `visible` when acking the job, so that `visible` and `deleted` are mutually exclusive fields. Therefore we can: - add a sparse index for both of these fields - query on the field we care about, and **know** that it implies the absence of the other field, allowing the removal of `$exists` assertions It should be noted that in local testing, I observed [strange behaviour][2] when trying to use this partial index: we have to use `ack: {$gt: ''}` instead of `ack: {$exists: true}` to get MongoDB to leverage this index for some reason. `inFlight()` ------------ The existing `inFlight()` query has particularly bad performance in cases where there are many (hundreds of thousands) of jobs available to pick up. This is because the existing query uses the `deleted_1_visible_1` index, but even after filtering by `deleted` and `visible` with the index, the database will need to fetch every single job that could be picked up, and check for `ack`, which is very slow. We improve the performance here by: - the removal of the `$exists` query (see above) - the addition of a partial index that only contains unacked jobs that have been retrieved at some point by `get()`. We can then filter these by the current time to find in-flight jobs Migration path -------------- This performance improvement is built upon a shift in the assumptions made about underlying job structure: namely, that `deleted` and `visible` are now mutually exclusive properties (which was not true before). 1. Bump patch version to [`7.1.1`][3]: this will start removing the `visible` property from acked jobs in a non-breaking way 2. Deploy the patch to Production 3. Update any existing documents to match this new schema: ```js db.collection.updateMany( {deleted: {$exists: true}}, {$unset: {visible: 1}}, ) ``` 4. Bump major version to `8.0.0` and deploy 5. Drop old index `delted_1_visible_1`, which is no longer used [1]: https://www.mongodb.com/docs/manual/reference/operator/query/exists/#use-a-sparse-index-to-improve--exists-performance [2]: https://www.mongodb.com/community/forums/t/partial-index-is-not-used-during-search/290507/2 [3]: #16

💥 Improve database performance

Now that `deleted` is a `Date`, we can add a TTL index to it. This change adds an optional `expireAfterSeconds` option, which is passed through to the `deleted` index options if set to a `number`.

✨ Allow optional TTL on `deleted` field

At the moment, when calling: ```js queue.ping({resetTries: true}) ``` The tries are reset as if the job hasn't been picked up, but the `ack` is left as-is. This isn't necessarily problematic in the operation of the queue, but it does mean that the pinged job will still show up as [in-flight][1]. This change adds an optional `resetAck` flag, which will also unset the `ack`, and means that the job can be marked as not in-flight, as if it has never been picked up. [1]: https://github.com/reedsy/mongodb-queue/blob/6133fc9367f4fce719e36d8866841d531e956b6b/mongodb-queue.ts#L262

alecgibson added 13 commits June 25, 2021 11:28

👷 Set up GitHub Actions

efa4f38

👷 Add Mongo to GH Actions

8d42e8d

👷 Add Publish action

e97567c

🙈 Ignore package-lock.json

b0d9700

`package-lock.json` shouldn't be committed for libraries, since it's actively ignored by consuming apps.

Merge pull request #1 from reedsy/reset-tries

a32799a

Option to reset tries in `ping()`

⬆️ Bump mongodb

182ca32

Merge pull request #2 from reedsy/mongodb

5f0d9ab

⬆️ Bump `mongodb`

Merge pull request #3 from reedsy/return-document

0e11c97

👽 Add option to use `returnDocument`

🔖 v4.0.0-reedsy-1.1.0

341d8e2

💥 Require mongodb v4

39ed065

This change bumps the `mongodb` dependency to v4, and: - removes the `returnDocument` option: the deprecated version is no longer supported for v4, and we drop its support here - update `add()` to check `insertedIds`, since `ops` was removed in v4

Merge pull request #4 from reedsy/mongodb4

e9d7fe0

💥 Require `mongodb` v4

dawidreedsy and others added 16 commits November 24, 2022 10:58

⚡ Add deleted index

0b162cb

Merge pull request #5 from reedsy/add-deleted-index

9c7e02f

⚡ Add deleted index

Merge pull request #6 from reedsy/promisify

05d4a6b

💥 Promisify and add `mongodb@5` support

👷‍♀️ Bump GitHub Actions versions

3e0b48c

Merge pull request #8 from reedsy/bump-actions

836b208

👷‍♀️ Bump GitHub Actions versions

Merge pull request #7 from reedsy/typescript

3b4e5ba

💥 Move to Typescript

👷‍♀️ Fix release

d2e76c3

`npm prepublish` was deprecated. This change updates the release to use `npm prepare` instead

🚨 Add linting

e2045fb

Merge pull request #9 from reedsy/lint

dfdc3a5

🚨 Add linting

👷‍♀️ Add build matrix

1760551

Run the build against different versions of the `mongodb` driver

Merge pull request #10 from reedsy/mongodb6

6591bf7

⬆️ Add support for `mongodb@6`

👷‍♀️ Build against MongoDB v5.0

9a5c893

MongoDB 4.4 will be [end-of-lifed in February][1]. This change runs the build against MongoDB 5.0 in readiness for upgrading our Staging and Production replica sets. [1]: https://www.mongodb.com/support-policy/lifecycles

dawidreedsy and others added 20 commits January 2, 2024 12:28

Merge pull request #11 from reedsy/remove-undefined-prop

bf640e1

🐛 Remove undefined props

👷‍♀️ Use GITHUB_TOKEN for install

4e7cf41

👷‍♀️ Run against MongoDB v6

860daf8

👷‍♀️ Add success job

f58f61f

Merge pull request #12 from reedsy/no-default-export

29681c1

💥 Remove default exports

Merge pull request #13 from reedsy/sort-visible

8d04c21

⚡️ Sort `get()` by `visible`

Merge pull request #14 from reedsy/object-id-ack

edc03cb

🗃 Change `ack` to (string) `ObjectId`

Merge pull request #16 from reedsy/remove-visible

3b121da

🗃 Remove `visible` when acking jobs

Merge pull request #17 from reedsy/date-deleted

d4da1aa

💥 Change `deleted` from `string` to `Date`

Merge pull request #15 from reedsy/performance-improvement

8393845

💥 Improve database performance

✨ Allow optional TTL on deleted field

c26194d

Now that `deleted` is a `Date`, we can add a TTL index to it. This change adds an optional `expireAfterSeconds` option, which is passed through to the `deleted` index options if set to a `number`.

Merge pull request #18 from reedsy/ttl-option

6133fc9

✨ Allow optional TTL on `deleted` field

alecgibson force-pushed the ping-reset branch from dc285af to 5830ed9 Compare February 14, 2025 12:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to reset tries in `ping()` #40

Option to reset tries in `ping()` #40

alecgibson commented Jun 25, 2021

nataliataylor commented Nov 16, 2021

Option to reset tries in ping() #40

Are you sure you want to change the base?

Option to reset tries in ping() #40

Conversation

alecgibson commented Jun 25, 2021

nataliataylor commented Nov 16, 2021

Option to reset tries in `ping()` #40

Option to reset tries in `ping()` #40