Skip to content

storage: preserve original error on function fail#629

Merged
Gerold103 merged 1 commit intotarantool:masterfrom
kamenkremen:gh-614
Feb 18, 2026
Merged

storage: preserve original error on function fail#629
Gerold103 merged 1 commit intotarantool:masterfrom
kamenkremen:gh-614

Conversation

@kamenkremen
Copy link
Contributor

@kamenkremen kamenkremen commented Dec 18, 2025

For older Tarantool versions vshard uses func.call to execute functions.

If such a function raises an error during an uncommitted transaction, 'Transaction is active...' error is going to be raised, which will mask the original error.

This patch fixes that bug by checking the function call result. If the function returns an error and a transaction is still active, vshard now explicitly rolls back the transaction.

Fixes #614

NO_DOC=bugfix

Copy link
Collaborator

@Serpentian Serpentian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done! I'm glad, you've figured out the reason 🎉

P.S. And please, don't forget to rebase)

@Serpentian Serpentian assigned kamenkremen and unassigned Serpentian Dec 22, 2025
@kamenkremen kamenkremen force-pushed the gh-614 branch 5 times, most recently from 949d8a9 to 80bd895 Compare December 26, 2025 23:26
Copy link
Collaborator

@Serpentian Serpentian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there! Well done

Copy link
Collaborator

@Serpentian Serpentian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nits are left

Copy link
Collaborator

@Serpentian Serpentian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the patches! Well done

@kamenkremen
Copy link
Contributor Author

I ran some performance tests, and here are the results.

Setup

Hardware

Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz

Tarantool

Tarantool 2.11.0-0-g247a9a4183
Target: Linux-x86_64-Release
Build options: cmake . -DCMAKE_INSTALL_PREFIX=/usr/local -DENABLE_BACKTRACE=TRUE
Compiler: GNU-8.3.1
C_FLAGS: -fexceptions -funwind-tables -fasynchronous-unwind-tables -fno-common -fopenmp -msse2 -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -fPIC -fmacro-prefix-map=./tarantool=. -std=c11 -Wall -Wextra -Wno-gnu-alignof-expression -fno-gnu89-inline -Wno-cast-function-type
CXX_FLAGS: -fexceptions -funwind-tables -fasynchronous-unwind-tables -fno-common -fopenmp -msse2 -Wformat -Wformat-security -Werror=format-security -fstack-protector-strong -fPIC -fmacro-prefix-map=./tarantool=. -std=c++11 -Wall -Wextra -Wno-invalid-offsetof -Wno-gnu-alignof-expression -Wno-cast-function-type

2 shards, 2 replicas in one shard, 9 routers, 9 load generators, which are created as follows:

tarantool generate_load.lua \
            --bucket_count 30000 \
            --op_type replace \
            --warmup \
            --uri localhost:3305,localhost:3306,localhost:3307,localhost:3308,localhost:3309,localhost:3310,localhost:3311,localhost:3312,localhost:3313 \
            --fibers 50 \
            --ops 1000000 \

Mean of 20 iterations

The perf test is from https://github.com/tarantool/vshard-ee/pull/51, with registered replace function in schema.

  • Before patch: 193721.8000 (100%)
  • After only patch that preserves original error on function fail: 193125.2000 (99.7%)
  • After whole patch: 175869.9000 (90.7%)

@Serpentian
Copy link
Collaborator

Let's do only that variant:

After only patch that preserves original error on function fail: 193125.2000 (99.7%)

The perf degradation is way too huge in the second variant. We'll leave #630 unresolved for now, since nobody complained about that yet.

@Serpentian Serpentian assigned kamenkremen and unassigned Gerold103 Feb 2, 2026
@kamenkremen kamenkremen force-pushed the gh-614 branch 2 times, most recently from ee34ee2 to 4c6abe1 Compare February 3, 2026 04:07
@kamenkremen kamenkremen assigned Serpentian and unassigned kamenkremen Feb 3, 2026
@Serpentian Serpentian requested a review from mrForza February 4, 2026 13:30
Copy link
Contributor

@mrForza mrForza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clean patch! 👍 Left some minor comments below

@mrForza mrForza assigned kamenkremen and unassigned mrForza Feb 5, 2026
@Serpentian Serpentian removed their assignment Feb 6, 2026
@Serpentian
Copy link
Collaborator

Serpentian commented Feb 6, 2026

I can confirm, that @kamenkremen commit doesn't cause major perf degradation, so I remove do not merge lablel, we`re safe now.

However, @Gerold103, can we go even further and drop that piece of shit from net.box, which we don't need at all:

    local results = {...}
    for i = 1, select('#', ...) do
        if type(results[i]) == 'cdata' then
            results[i] = msgpack.decode(msgpack.encode(results[i]))
        end
    end
    return unpack(results)

This breaks backward compatibility of vshard.storage.call, but who cares, when it costs us 10% of performance. Here're my measurements. This perf bump will affect all requests, which return cdata (e.g. get, replace, insert). The only one not affected - select (it returns table). All tests (even cartridge's ones) works: #641

5 workers (load generators), 9 routers, 2 shards, 2 replicas in one shard. replaces.

Before the patches:
Average per iteration: 186312.8500 (100%)

After Ivan`s commit:
Average per iteration: 185511.4500 (99.6%)

After two commits (Ivan's + Mine: #641):
Average per iteration: 204105.0500 (109.6%)

Edit

Actually, we must fix it in core: tarantool/tarantool#11624 (comment). The inconsistency between versions is much more serious: the return type on router becomes cdata instead of table. Id rather return cdata everywhere and that's it. When we'll fix it in core, we can use the net.box.self.call` from there, but on all versions before it we should reimplement net.box.self.call, as it's done in #641, in order to avoid that perf degradation.

This way we have the following backward compatibility breakage: when cdata is returned on old vshard versions, vshard automatically did table from it, from now on cdata itself is returned. IMHO, it's ok.

@Serpentian Serpentian removed the do not merge Not ready to be merged label Feb 6, 2026
In Tarantool versions prior to 3.0.0-beta1-18, `net_box.self.call()`
cannot invoke C stored or Lua persistent functions.
Therefore, if such a function is registered in `box.func`, vshard
executes it via `func.call` instead.

If such a function raises an error during an uncommitted transaction,
'Transaction is active...' error is going to be raised, which will mask
the original error.

This patch fixes that bug by checking the function call result. If the
function returns an error and a transaction is still active, vshard now
explicitly rolls back the transaction.

Fixes tarantool#614

NO_DOC=bugfix
@Gerold103
Copy link
Collaborator

I think it is fine to just drop netbox.self from vshard entirely. I can't imagine what can really break besides for people who were calling vshard.storage.call() locally on storage and were checking type() of results.

@Serpentian Serpentian assigned Gerold103 and unassigned Serpentian Feb 11, 2026
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this was a tricky topic in terms of perf and tiny details of behaviour.

Copy link
Contributor

@mrForza mrForza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the patch!

@Gerold103 Gerold103 merged commit 4406aee into tarantool:master Feb 18, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

vshard.storage.call masks the original error thrown from a function in some cases

4 participants