Skip to content
This repository was archived by the owner on Oct 24, 2024. It is now read-only.

Option to reset tries in ping() #40

Open
wants to merge 49 commits into
base: master
Choose a base branch
from

Conversation

alecgibson
Copy link

The ping() method can be useful for setting up recurring jobs if we
deliberately avoid acking the job. For example:

  1. Submit job
  2. Pull job from queue
  3. Process job
  4. ping()
  5. Go to Step 2

A real-world example of this might be notifying for recurring
appointments, or setting up long-running, cross-process, periodic jobs.

The main advantage this has over using ack() and add() is that it
effectively requeues a job in a single, atomic commit. If we tried the
above with ack() and add():

  1. Submit job
  2. Pull job from queue
  3. Process job
  4. ack()
  5. add()
  6. Go to Step 2

In this version, the process could crash or quit between Steps 4 & 5,
and our recurring job would be lost.

We could also try inverting Steps 4 & 5, but then we get the opposite
issue: if the process crashes or quits, then we might accidentally
duplicate our recurring job. It also prevents us from setting up any
unique indexes on our payload.

Using ping() perfectly solves this problem: there's only ever one
version of the job, and it's never dropped (because it's never acked).
If the process crashes before we ping(), we'll retry it, as with any
other normal job.

The one issue with this approach is that tries will steadily increase,
and - if you have maxRetries set up - the job will eventually be moved
to the dead queue, which isn't what we want.

This change adds an option to the ping() method: resetTries, which
will reset tries to zero, so that the job is treated like a "new" job
when it's pinged, and is only moved to the dead queue if it's genuinely
retried.

`package-lock.json` shouldn't be committed for libraries, since it's
actively ignored by consuming apps.
The `ping()` method can be useful for setting up recurring jobs if we
deliberately avoid acking the job. For example:

 1. Submit job
 2. Pull job from queue
 3. Process job
 4. `ping()`
 5. Go to Step 2

A real-world example of this might be notifying for recurring
appointments, or setting up long-running, cross-process, periodic jobs.

The main advantage this has over using `ack()` and `add()` is that it
effectively requeues a job in a single, atomic commit. If we tried the
above with `ack()` and `add()`:

 1. Submit job
 2. Pull job from queue
 3. Process job
 4. `ack()`
 5. `add()`
 6. Go to Step 2

In this version, the process could crash or quit between Steps 4 & 5,
and our recurring job would be lost.

We could also try inverting Steps 4 & 5, but then we get the opposite
issue: if the process crashes or quits, then we might accidentally
duplicate our recurring job. It also prevents us from setting up any
unique indexes on our `payload`.

Using `ping()` perfectly solves this problem: there's only ever one
version of the job, and it's never dropped (because it's never acked).
If the process crashes before we `ping()`, we'll retry it, as with any
other normal job.

The one issue with this approach is that `tries` will steadily increase,
and - if you have `maxRetries` set up - the job will eventually be moved
to the dead queue, which isn't what we want.

This change adds an option to the `ping()` method: `resetTries`, which
will reset `tries` to zero, so that the job is treated like a "new" job
when it's pinged, and is only moved to the dead queue if it's genuinely
retried.
Option to reset tries in `ping()`
The `mongodb` Node.js driver deprecated use of `returnOriginal` in
favour of `returnDocument` in [v3.6][1].

This non-breaking change allows consumers to opt in to using the newer
`returnDocument` by setting an option on construction

```js
var queue = mongoDbQueue(db, 'queue', { returnDocument : true })
```

[1]: mongodb/node-mongodb-native#2808
👽 Add option to use `returnDocument`
This change bumps the `mongodb` dependency to v4, and:

 - removes the `returnDocument` option: the deprecated version is no
   longer supported for v4, and we drop its support here
 - update `add()` to check `insertedIds`, since `ops` was removed in v4
💥 Require `mongodb` v4
@nataliataylor
Copy link

The ping() method can be useful for setting up recurring jobs if we deliberately avoid acking the job. For example:

  1. Submit job
  2. Pull job from queue
  3. Process job
  4. ping()
  5. Go to Step 2

A real-world example of this might be notifying for recurring appointments, or setting up long-running, cross-process, periodic jobs.

The main advantage this has over using ack() and add() is that it effectively requeues a job in a single, atomic commit. If we tried the above with ack() and add():

  1. Submit job
  2. Pull job from queue
  3. Process job
  4. ack()
  5. add()
  6. Go to Step 2

In this version, the process could crash or quit between Steps 4 & 5, and our recurring job would be lost.

We could also try inverting Steps 4 & 5, but then we get the opposite issue: if the process crashes or quits, then we might accidentally duplicate our recurring job. It also prevents us from setting up any unique indexes on our payload.

Using ping() perfectly solves this problem: there's only ever one version of the job, and it's never dropped (because it's never acked). If the process crashes before we ping(), we'll retry it, as with any other normal job.

The one issue with this approach is that tries will steadily increase, and - if you have maxRetries set up - the job will eventually be moved to the dead queue, which isn't what we want.

This change adds an option to the ping() method: resetTries, which will reset tries to zero, so that the job is treated like a "new" job when it's pinged, and is only moved to the dead queue if it's genuinely retried.

Why don't just have a new method resetTries with an option {delayed: x}?

dawidreedsy and others added 16 commits November 24, 2022 10:58
[`mongodb@5`][1] drops support for callbacks, which breaks this library,
which is all written with callbacks.

This is a **BREAKING** change which drops callback support from this
library as well, and fully embraces promises through `async`/`await`
syntax in both the library code and test code.

This allows us to support both `mongodb@4` and `mongodb@5`.

[1]: https://github.com/mongodb/node-mongodb-native/releases/tag/v5.0.0
💥 Promisify and add `mongodb@5` support
This is a breaking change that moves this library to TypeScript. It's
breaking because the import style has slightly changed due to the
transpiler. Other than that, everything else is the same.

We also:

 - drop our `-reedsy` version suffix, since we've now adapted this fork
   so heavily that it's unlikely we'll merge from upstream again
 - add an `.npmignore` file to avoid publishing tests and TypeScript
   files
👷‍♀️ Bump GitHub Actions versions
`npm prepublish` was deprecated. This change updates the release to use
`npm prepare` instead
Run the build against different versions of the `mongodb` driver
This change adds support for [`mongodb@6`][1]. The main change that
affects this library is that `findOneAndUpdate()` now returns the
document itself by default, and metadata must be explicitly requested
with `includeResultMetadata: true`.

[1]: https://github.com/mongodb/node-mongodb-native/releases/tag/v6.0.0
⬆️ Add support for `mongodb@6`
MongoDB 4.4 will be [end-of-lifed in February][1]. This change runs the
build against MongoDB 5.0 in readiness for upgrading our Staging and
Production replica sets.

[1]: https://www.mongodb.com/support-policy/lifecycles
At the moment when we pass payload like this:
```ts
{someProp: undefined}
```

It stores it mongoDb Db, but it auto convert it to null value, however
it might be error prone as we may enqueue job that has `callbackUrl: undefined`, then the mongo queue saves in db, which converts it to `callbackUrl: null`, then when getting the message from queue worker validate the job payload and has type `t.partial({callbackUrl: t.string}})`. The validation will fail.
dawidreedsy and others added 20 commits January 2, 2024 12:28
At the moment, we have two Github Action workflows:

  - `test.yml`: runs build and test, then tags when bumping the version
    in `main`
  - `publish.yml`: releases the package when a new tag is published

The issue with this setup is that the built-in `GITHUB_TOKEN`
[will not trigger another workflow][1], so we had to add a separate
PAT with write permissions to our repos, which was a bit of a
security concern.

In order to avoid the need for this extra token, with its associated
risks and administrative overheads (like rotating), this change
combines our workflows into a single workflow.

We tweak the `tag.sh` to `release.sh`, and it's now also in charge of
publishing (since it knows when we've pushed a new tag).

[1]: https://docs.github.com/en/actions/using-workflows/triggering-a-workflow#triggering-a-workflow-from-a-workflow
Default exports have [bad interoperability][1] between Common JS and
ES Module imports, so the safest thing to do is just remove them.

This is a **BREAKING** change which removes our export default
statements, and adds a linter rule to prevent us from adding them.

[1]: evanw/esbuild#1719 (comment)
At the moment, when calling `get()`, we:

 - query on `deleted` and `visible`
 - sort on `_id`

We have indexes for both of these things, but separately, which results
in an inefficient query, since MongoDB will have to check both indexes
and can result in a complete index scan, which isn't particularly great.

We sort by `_id` presumably to get the oldest job (since `_id` is an
`ObjectId` whose sort order correlates to creation time).

However, we probably actually want the job that was visible first.

This change updates to sort to use `visible`, which also means that the
query and sort can use the same index.
⚡️ Sort `get()` by `visible`
At the moment, the `ack` is a completely random hex string.

This change updates it to use an `ObjectId`, which:

 - encodes information about the time the job was pulled (for the first
   time)
 - ...and can therefore be used to sort jobs in the order they were
   originally pulled from the queue
 - allows us to simplify the dependencies and code a little
🗃 Change `ack` to (string) `ObjectId`
This is a non-breaking change that forms part of the migration path to
improving our performance in #15

When acking a job, we now unset the `visible` property, which makes
`visible` and `deleted` mutually exclusive properties, so that the
presence of one implies the absence of the other.

We can do this, because the only time we query on `visible` is when we
*also* query for `deleted: {$exists: false}`. Similarly, the only times
we query for `deleted: {$exists: true}` are times when we don't query
`visible`.
🗃 Remove `visible` when acking jobs
This is a **BREAKING** change that will change the `deleted` field from
a `string` to a `Date`, which will let us set up a TTL index on it.

Although this change is technically breaking, it should be deployable
with no impact on Production, since we only ever query `deleted` using
the `$exists` operator.

In order to leverage the TTL index, we'll need to:

 1. Deploy this change
 2. Migrate all existing `deleted` fields to `Date`:
    ```js
    db.collection.update(
      {deleted: {$type: "string"}},
      [{$set: {deleted: {$toDate: "$deleted"}}}]
    )
    ```
 3. Change the existing `deleted` index to TTL:
    ```js
    db.runCommand({
      collMod: '...',
      index: {
        name: 'deleted_1',
        expireAfterSeconds: 2592000, // ~ 1 month
      }
    })
    ```
💥 Change `deleted` from `string` to `Date`
This is a **BREAKING CHANGE** that aims to improve database performance
by changing the queries and indexes used to perform operations. There
will be no changes visible to consumers of the JavaScript API.

The break will:

 - require consumers to run a migration query before upgrading
 - add new indexes
 - allow dropping of an old index

More details on migration are at the bottom of this commit message.

Motivation
----------
This library seems to have been written with the assumption that its
collections are small. However, we have hundreds of thousands of jobs in
various queues on Production, which causes some slow queries because of
the design choices made in the schema of this library.

In particular, we aim to address two issues:

 1. `{deleted: {$exists: boolean}}` calls are inefficient, but used by
    practically every query in this library
 2. counting inflight jobs has awful performance when there are many
    available jobs

The result is that no further filtering is needed beyond the index on
any of the issued queries.

`$exists`
---------
MongoDB's `$exists` operator has [tricky index performance][1]. In the
best cases, `$exists: true` can use the index, but only if it's sparse,
and `$exists: false` can **never** just use the index: it always needs
to fetch documents.

In order to avoid the constant use of `$exists` in this library, we rely
on a logical paradigm shift: we `$unset` `visible` when acking the job,
so that `visible` and `deleted` are mutually exclusive fields. Therefore
we can:

 - add a sparse index for both of these fields
 - query on the field we care about, and **know** that it implies the
   absence of the other field, allowing the removal of `$exists`
   assertions

It should be noted that in local testing, I observed
[strange behaviour][2] when trying to use this partial index: we have to
use `ack: {$gt: ''}` instead of `ack: {$exists: true}` to get MongoDB to
leverage this index for some reason.

`inFlight()`
------------
The existing `inFlight()` query has particularly bad performance in
cases where there are many (hundreds of thousands) of jobs available to
pick up.

This is because the existing query uses the `deleted_1_visible_1` index,
but even after filtering by `deleted` and `visible` with the index, the
database will need to fetch every single job that could be picked up,
and check for `ack`, which is very slow.

We improve the performance here by:

 - the removal of the `$exists` query (see above)
 - the addition of a partial index that only contains unacked jobs that
   have been retrieved at some point by `get()`. We can then filter
   these by the current time to find in-flight jobs

Migration path
--------------
This performance improvement is built upon a shift in the assumptions
made about underlying job structure: namely, that `deleted` and
`visible` are now mutually exclusive properties (which was not true
before).

 1. Bump patch version to [`7.1.1`][3]: this will start removing the
    `visible` property from acked jobs in a non-breaking way
 2. Deploy the patch to Production
 3. Update any existing documents to match this new schema:

    ```js
    db.collection.updateMany(
      {deleted: {$exists: true}},
      {$unset: {visible: 1}},
    )
    ```

 4. Bump major version to `8.0.0` and deploy
 5. Drop old index `delted_1_visible_1`, which is no longer used

[1]: https://www.mongodb.com/docs/manual/reference/operator/query/exists/#use-a-sparse-index-to-improve--exists-performance
[2]: https://www.mongodb.com/community/forums/t/partial-index-is-not-used-during-search/290507/2
[3]: #16
Now that `deleted` is a `Date`, we can add a TTL index to it. This
change adds an optional `expireAfterSeconds` option, which is passed
through to the `deleted` index options if set to a `number`.
✨ Allow optional TTL on `deleted` field
At the moment, when calling:

```js
queue.ping({resetTries: true})
```

The tries are reset as if the job hasn't been picked up, but the `ack`
is left as-is.

This isn't necessarily problematic in the operation of the queue, but it
does mean that the pinged job will still show up as [in-flight][1].

This change adds an optional `resetAck` flag, which will also unset the
`ack`, and means that the job can be marked as not in-flight, as if it
has never been picked up.

[1]: https://github.com/reedsy/mongodb-queue/blob/6133fc9367f4fce719e36d8866841d531e956b6b/mongodb-queue.ts#L262
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants