Skip to content

Correct matching against description and memo#2185

Merged
code-gnucash-org merged 3 commits intoGnucash:stablefrom
jpschewe:bug799745
Mar 1, 2026
Merged

Correct matching against description and memo#2185
code-gnucash-org merged 3 commits intoGnucash:stablefrom
jpschewe:bug799745

Conversation

@jpschewe
Copy link
Contributor

When the description or the memo is of length zero or one, doing a fuzzy match causes the the check to be against zero characters and this always comes back true. Because of this, skip the match when the string is too short.

Also enhance the debugging to make it easier to track why some transactions aren't considered possible matches.

This is a fix for https://bugs.gnucash.org/show_bug.cgi?id=799745

Copy link
Member

@jralls jralls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please separate uncommenting the DEBUG statements and fixing the bug into separate commits.

And since you created a companion bug (no need to do that), please title the bug fix commit with the bug number and title$mdash;"[Bug 799745] - Import matcher doesn't handle zero length memo and description properly" so that the release note script categorizes it correctly.

Comment on lines +711 to +712
else if (strlen(match_memo) > 1 && (strncasecmp(memo, match_memo,
strlen(match_memo) / 2) == 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strlen and strncasecmp need to be null-protected. I'd also go for a bit more than a more restrictive match if one of the strings is substantially longer than the other, maybe test 1/2 the length of whichever is longer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying for the minimal set of changes to fix the issue. The change to the length of the match would be a larger change in behavior. I can make the larger change if you want though.

As far as the companion bug, I was just following the developer guidelines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes are in place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as the companion bug, I was just following the developer guidelines.

Where did you find that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was reading https://wiki.gnucash.org/wiki/Git#Pull_Requests and saw the requirement for bug reports under patches and assumed it was for all changes. I've had other projects refuse PRs for not opening a bug when I see such language.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strlen and strncasecmp need to be null-protected. I'd also go for a bit more than a more restrictive match if one of the strings is substantially longer than the other, maybe test 1/2 the length of whichever is longer.

@jralls I've updated the PR do use this algorithm. Let me know what you think.

/* If it does, abort the process for this transaction, since
it is already in the system. */
DEBUG("%s", "Transaction with same online ID exists, destroying current transaction");
DEBUG("Transaction with same online ID exists, destroying current transaction: %s", qof_print_date(xaccTransGetDate(trans)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a useful change, but will leak qof_print_date. Options: g_free later

char* date = qof_print_date(...);
DEBUG("...%s",date);
g_free(date);

Alternatively just show the pointer DEBUG("...%p", trans);.

Or even better, also debug-log the online_id from within gnc_import_exists_online_id:

modified   gnucash/import-export/import-backend.cpp
@@ -1052,6 +1052,10 @@ gboolean gnc_import_exists_online_id (Transaction *trans, GHashTable* acct_id_ha
     }
 
     auto online_id_exists = g_hash_table_contains (online_id_hash, source_online_id);
+
+    if (online_id_exists)
+        DEBUG ("transaction with online id %s already exists", source_online_id);
+
     g_free (source_online_id);
     return online_id_exists;
 }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THank you, I had forgotten about freeing the pointers. 2 questions:

  1. any thoughts on using shared pointers to avoid issues with free?
  2. is it all right to have the fix as a separate commit or do you want me to rebase the changes into the previous commits?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::shared_ptr carries a fair amount of overhead for instance-counting so you only want to use it if you're actually sharing ownership of the pointer. std::unique_ptr has less overhead and is intended for automatic freeing of pointers where ownership isn't shared, so that's what you'd use here. But both of them use operator delete by default so if you use them on pointers that aren't allocated with operator new you must provide a custom deleter; that can be a simple lambda that calls free. g_free is just a null-protected wrapper around free.
I don't think that using std::unique_ptr here would get you anything. You can't be sure when using it inline like

DEBUG("Transaction with same online ID exists, destroying current transaction: %s", (std::unique_ptr<char*, (void(*)(char*)>(qof_print_date(xaccTransGetDate(trans)[](char* p){ g_free(p); })));

that the temporary will survive long enough for the printf buried in the DEBUG macro to use it before it's freed so you would still need to create a local variable. Plus you're creating a lot of extra work for yourself and the compiler. The problem for most of us is noticing that we need to free the rv from a function like qof_print_date and therefore can't use it as a temporary.

You should fix the first commit. You can force-push your branch and the PR will update automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation about shared and unique pointers. I don't see why there would be any issue with the compiler executing the free before the printf gets the pointer. I expect that would be a bug in the compiler.

That all said, I'm using g_free in my updated commits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a useful change, but will leak qof_print_date. Options: g_free later

@christopherlam I've updated the PR with the freeing of the memory. I've also added the debug statement that you suggested in gnc_import_exists_online_id.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why there would be any issue with the compiler executing the free before the printf gets the pointer. I expect that would be a bug in the compiler.

I got distracted by the deleter lambda and forgot to get() the char* from the unique_ptr in my example. Here's the corrected one:

DEBUG("Transaction with same online ID exists, destroying current transaction: %s", (std::unique_ptr<char*, (void(*)(char*)>(qof_print_date(xaccTransGetDate(trans)[](char* p){ g_free(p); }))))->get());

It's complicated and I think implementation dependent because the actual temporary is the char* from the get, though that might qualify as creating a reference that extends the lifetime of the unique_ptr. As it happens both gcc and clang call the deleter after the printf, see compiler explorer.

Enhance the debugging to make it easier to track why some
transactions aren't considered possible matches.

Uncomment debug statements so that users dont' need to build their own
binary to debug.
Check that strings aren't null before using them.

Factor out access to strings to avoid extra calls to the xacc* methods.
…cription properly

When doing a fuzzy check on strings, check half the length of the
longest string and only check if both strings are longer than 1
character.
@code-gnucash-org code-gnucash-org merged commit 8538dcf into Gnucash:stable Mar 1, 2026
4 checks passed
@jralls
Copy link
Member

jralls commented Mar 1, 2026

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants