-
Notifications
You must be signed in to change notification settings - Fork 93
Option to reset tries in ping()
#40
base: master
Are you sure you want to change the base?
Conversation
`package-lock.json` shouldn't be committed for libraries, since it's actively ignored by consuming apps.
The `ping()` method can be useful for setting up recurring jobs if we deliberately avoid acking the job. For example: 1. Submit job 2. Pull job from queue 3. Process job 4. `ping()` 5. Go to Step 2 A real-world example of this might be notifying for recurring appointments, or setting up long-running, cross-process, periodic jobs. The main advantage this has over using `ack()` and `add()` is that it effectively requeues a job in a single, atomic commit. If we tried the above with `ack()` and `add()`: 1. Submit job 2. Pull job from queue 3. Process job 4. `ack()` 5. `add()` 6. Go to Step 2 In this version, the process could crash or quit between Steps 4 & 5, and our recurring job would be lost. We could also try inverting Steps 4 & 5, but then we get the opposite issue: if the process crashes or quits, then we might accidentally duplicate our recurring job. It also prevents us from setting up any unique indexes on our `payload`. Using `ping()` perfectly solves this problem: there's only ever one version of the job, and it's never dropped (because it's never acked). If the process crashes before we `ping()`, we'll retry it, as with any other normal job. The one issue with this approach is that `tries` will steadily increase, and - if you have `maxRetries` set up - the job will eventually be moved to the dead queue, which isn't what we want. This change adds an option to the `ping()` method: `resetTries`, which will reset `tries` to zero, so that the job is treated like a "new" job when it's pinged, and is only moved to the dead queue if it's genuinely retried.
Option to reset tries in `ping()`
⬆️ Bump `mongodb`
The `mongodb` Node.js driver deprecated use of `returnOriginal` in favour of `returnDocument` in [v3.6][1]. This non-breaking change allows consumers to opt in to using the newer `returnDocument` by setting an option on construction ```js var queue = mongoDbQueue(db, 'queue', { returnDocument : true }) ``` [1]: mongodb/node-mongodb-native#2808
👽 Add option to use `returnDocument`
This change bumps the `mongodb` dependency to v4, and: - removes the `returnDocument` option: the deprecated version is no longer supported for v4, and we drop its support here - update `add()` to check `insertedIds`, since `ops` was removed in v4
💥 Require `mongodb` v4
Why don't just have a new method resetTries with an option {delayed: x}? |
⚡ Add deleted index
[`mongodb@5`][1] drops support for callbacks, which breaks this library, which is all written with callbacks. This is a **BREAKING** change which drops callback support from this library as well, and fully embraces promises through `async`/`await` syntax in both the library code and test code. This allows us to support both `mongodb@4` and `mongodb@5`. [1]: https://github.com/mongodb/node-mongodb-native/releases/tag/v5.0.0
💥 Promisify and add `mongodb@5` support
This is a breaking change that moves this library to TypeScript. It's breaking because the import style has slightly changed due to the transpiler. Other than that, everything else is the same. We also: - drop our `-reedsy` version suffix, since we've now adapted this fork so heavily that it's unlikely we'll merge from upstream again - add an `.npmignore` file to avoid publishing tests and TypeScript files
👷♀️ Bump GitHub Actions versions
💥 Move to Typescript
`npm prepublish` was deprecated. This change updates the release to use `npm prepare` instead
🚨 Add linting
Run the build against different versions of the `mongodb` driver
This change adds support for [`mongodb@6`][1]. The main change that affects this library is that `findOneAndUpdate()` now returns the document itself by default, and metadata must be explicitly requested with `includeResultMetadata: true`. [1]: https://github.com/mongodb/node-mongodb-native/releases/tag/v6.0.0
⬆️ Add support for `mongodb@6`
MongoDB 4.4 will be [end-of-lifed in February][1]. This change runs the build against MongoDB 5.0 in readiness for upgrading our Staging and Production replica sets. [1]: https://www.mongodb.com/support-policy/lifecycles
At the moment when we pass payload like this: ```ts {someProp: undefined} ``` It stores it mongoDb Db, but it auto convert it to null value, however it might be error prone as we may enqueue job that has `callbackUrl: undefined`, then the mongo queue saves in db, which converts it to `callbackUrl: null`, then when getting the message from queue worker validate the job payload and has type `t.partial({callbackUrl: t.string}})`. The validation will fail.
🐛 Remove undefined props
At the moment, we have two Github Action workflows: - `test.yml`: runs build and test, then tags when bumping the version in `main` - `publish.yml`: releases the package when a new tag is published The issue with this setup is that the built-in `GITHUB_TOKEN` [will not trigger another workflow][1], so we had to add a separate PAT with write permissions to our repos, which was a bit of a security concern. In order to avoid the need for this extra token, with its associated risks and administrative overheads (like rotating), this change combines our workflows into a single workflow. We tweak the `tag.sh` to `release.sh`, and it's now also in charge of publishing (since it knows when we've pushed a new tag). [1]: https://docs.github.com/en/actions/using-workflows/triggering-a-workflow#triggering-a-workflow-from-a-workflow
Default exports have [bad interoperability][1] between Common JS and ES Module imports, so the safest thing to do is just remove them. This is a **BREAKING** change which removes our export default statements, and adds a linter rule to prevent us from adding them. [1]: evanw/esbuild#1719 (comment)
💥 Remove default exports
At the moment, when calling `get()`, we: - query on `deleted` and `visible` - sort on `_id` We have indexes for both of these things, but separately, which results in an inefficient query, since MongoDB will have to check both indexes and can result in a complete index scan, which isn't particularly great. We sort by `_id` presumably to get the oldest job (since `_id` is an `ObjectId` whose sort order correlates to creation time). However, we probably actually want the job that was visible first. This change updates to sort to use `visible`, which also means that the query and sort can use the same index.
⚡️ Sort `get()` by `visible`
At the moment, the `ack` is a completely random hex string. This change updates it to use an `ObjectId`, which: - encodes information about the time the job was pulled (for the first time) - ...and can therefore be used to sort jobs in the order they were originally pulled from the queue - allows us to simplify the dependencies and code a little
🗃 Change `ack` to (string) `ObjectId`
This is a non-breaking change that forms part of the migration path to improving our performance in #15 When acking a job, we now unset the `visible` property, which makes `visible` and `deleted` mutually exclusive properties, so that the presence of one implies the absence of the other. We can do this, because the only time we query on `visible` is when we *also* query for `deleted: {$exists: false}`. Similarly, the only times we query for `deleted: {$exists: true}` are times when we don't query `visible`.
🗃 Remove `visible` when acking jobs
This is a **BREAKING** change that will change the `deleted` field from a `string` to a `Date`, which will let us set up a TTL index on it. Although this change is technically breaking, it should be deployable with no impact on Production, since we only ever query `deleted` using the `$exists` operator. In order to leverage the TTL index, we'll need to: 1. Deploy this change 2. Migrate all existing `deleted` fields to `Date`: ```js db.collection.update( {deleted: {$type: "string"}}, [{$set: {deleted: {$toDate: "$deleted"}}}] ) ``` 3. Change the existing `deleted` index to TTL: ```js db.runCommand({ collMod: '...', index: { name: 'deleted_1', expireAfterSeconds: 2592000, // ~ 1 month } }) ```
💥 Change `deleted` from `string` to `Date`
This is a **BREAKING CHANGE** that aims to improve database performance by changing the queries and indexes used to perform operations. There will be no changes visible to consumers of the JavaScript API. The break will: - require consumers to run a migration query before upgrading - add new indexes - allow dropping of an old index More details on migration are at the bottom of this commit message. Motivation ---------- This library seems to have been written with the assumption that its collections are small. However, we have hundreds of thousands of jobs in various queues on Production, which causes some slow queries because of the design choices made in the schema of this library. In particular, we aim to address two issues: 1. `{deleted: {$exists: boolean}}` calls are inefficient, but used by practically every query in this library 2. counting inflight jobs has awful performance when there are many available jobs The result is that no further filtering is needed beyond the index on any of the issued queries. `$exists` --------- MongoDB's `$exists` operator has [tricky index performance][1]. In the best cases, `$exists: true` can use the index, but only if it's sparse, and `$exists: false` can **never** just use the index: it always needs to fetch documents. In order to avoid the constant use of `$exists` in this library, we rely on a logical paradigm shift: we `$unset` `visible` when acking the job, so that `visible` and `deleted` are mutually exclusive fields. Therefore we can: - add a sparse index for both of these fields - query on the field we care about, and **know** that it implies the absence of the other field, allowing the removal of `$exists` assertions It should be noted that in local testing, I observed [strange behaviour][2] when trying to use this partial index: we have to use `ack: {$gt: ''}` instead of `ack: {$exists: true}` to get MongoDB to leverage this index for some reason. `inFlight()` ------------ The existing `inFlight()` query has particularly bad performance in cases where there are many (hundreds of thousands) of jobs available to pick up. This is because the existing query uses the `deleted_1_visible_1` index, but even after filtering by `deleted` and `visible` with the index, the database will need to fetch every single job that could be picked up, and check for `ack`, which is very slow. We improve the performance here by: - the removal of the `$exists` query (see above) - the addition of a partial index that only contains unacked jobs that have been retrieved at some point by `get()`. We can then filter these by the current time to find in-flight jobs Migration path -------------- This performance improvement is built upon a shift in the assumptions made about underlying job structure: namely, that `deleted` and `visible` are now mutually exclusive properties (which was not true before). 1. Bump patch version to [`7.1.1`][3]: this will start removing the `visible` property from acked jobs in a non-breaking way 2. Deploy the patch to Production 3. Update any existing documents to match this new schema: ```js db.collection.updateMany( {deleted: {$exists: true}}, {$unset: {visible: 1}}, ) ``` 4. Bump major version to `8.0.0` and deploy 5. Drop old index `delted_1_visible_1`, which is no longer used [1]: https://www.mongodb.com/docs/manual/reference/operator/query/exists/#use-a-sparse-index-to-improve--exists-performance [2]: https://www.mongodb.com/community/forums/t/partial-index-is-not-used-during-search/290507/2 [3]: #16
💥 Improve database performance
Now that `deleted` is a `Date`, we can add a TTL index to it. This change adds an optional `expireAfterSeconds` option, which is passed through to the `deleted` index options if set to a `number`.
✨ Allow optional TTL on `deleted` field
At the moment, when calling: ```js queue.ping({resetTries: true}) ``` The tries are reset as if the job hasn't been picked up, but the `ack` is left as-is. This isn't necessarily problematic in the operation of the queue, but it does mean that the pinged job will still show up as [in-flight][1]. This change adds an optional `resetAck` flag, which will also unset the `ack`, and means that the job can be marked as not in-flight, as if it has never been picked up. [1]: https://github.com/reedsy/mongodb-queue/blob/6133fc9367f4fce719e36d8866841d531e956b6b/mongodb-queue.ts#L262
dc285af
to
5830ed9
Compare
The
ping()
method can be useful for setting up recurring jobs if wedeliberately avoid acking the job. For example:
ping()
A real-world example of this might be notifying for recurring
appointments, or setting up long-running, cross-process, periodic jobs.
The main advantage this has over using
ack()
andadd()
is that iteffectively requeues a job in a single, atomic commit. If we tried the
above with
ack()
andadd()
:ack()
add()
In this version, the process could crash or quit between Steps 4 & 5,
and our recurring job would be lost.
We could also try inverting Steps 4 & 5, but then we get the opposite
issue: if the process crashes or quits, then we might accidentally
duplicate our recurring job. It also prevents us from setting up any
unique indexes on our
payload
.Using
ping()
perfectly solves this problem: there's only ever oneversion of the job, and it's never dropped (because it's never acked).
If the process crashes before we
ping()
, we'll retry it, as with anyother normal job.
The one issue with this approach is that
tries
will steadily increase,and - if you have
maxRetries
set up - the job will eventually be movedto the dead queue, which isn't what we want.
This change adds an option to the
ping()
method:resetTries
, whichwill reset
tries
to zero, so that the job is treated like a "new" jobwhen it's pinged, and is only moved to the dead queue if it's genuinely
retried.