Reporting blanks #779

jkmaxwell · 2023-12-03T05:14:07Z

I believe there is an error happening but I cannot confirm it and am therefore considering it a possible bug.

Summary:
I am getting empty notifications. I believe I should not be, given my parameters.

urlwatch.yaml

Relevant sections

display:
  empty-diff: false
  error: true
  new: true
  unchanged: false

Reporting via email and pushover

urls.yaml

name: "Dahlias Maks Georgia"
navigate: "https://lobaughsdahlias.com/product/maks-georgia-2020-introduction/"
browser: chromium
wait_until: load
filter:
  - css:
      selector: div.column.purchase-options-container.span-6
  - html2text
  - strip
diff_filter:
  - grep: -v 'Out of stock'
diff_tool: "diff --ignore-all-space --unified"

(i already tried with url instead of playwright, but i'm trying to outrun the dynamic content and it seems impossible)

Logs

test-diff-filter

xxx@xxx ~ % urlwatch --test-diff-filter 2
=== Filtered diff between state 0 and state 1 ===

=== Filtered diff between state 1 and state 2 ===

=== Filtered diff between state 2 and state 3 ===

=== Filtered diff between state 3 and state 4 ===

=== Filtered diff between state 4 and state 5 ===

=== Filtered diff between state 5 and state 6 ===

test-filter

xxx@xxx ~ % urlwatch --test-filter 2     
Maks Georgia
$16.00
Out of stock
		SKU: 107
	Category: Dahlia

--verbose

xxx@xxx ~ % urlwatch --verbose           
2023-12-02 21:18:05,236 cli INFO: turning on verbose logging mode
2023-12-02 21:18:05,243 minidb DEBUG: PRAGMA table_info(CacheEntry)
2023-12-02 21:18:05,243 main INFO: Using /Users/xxx/Library/Application Support/urlwatch/urls.yaml as URLs file
2023-12-02 21:18:05,243 main INFO: Using /Users/xxx/Library/Application Support/urlwatch/hooks.py for hooks
2023-12-02 21:18:05,243 main INFO: Using /Users/xxx/Library/Caches/urlwatch/cache.db as cache database
2023-12-02 21:18:05,246 main INFO: Found 3 jobs
2023-12-02 21:18:05,246 worker DEBUG: Processing 3 jobs (out of 3)
2023-12-02 21:18:05,247 handler INFO: Processing: <url url='https://www.mayberryblooms.com/product/mayberry-s-pink-champagne-limit-one-per-household/291?cs=true&cst=custom' name='Dahlias Mayberry' filter=[{'css': {'selector': 'div.product-inventory__wrapper'}}] max_tries=5>
2023-12-02 21:18:05,247 handler INFO: Processing: <browser navigate='https://lobaughsdahlias.com/product/maks-georgia-2020-introduction/' wait_until='load' browser='chromium' name='Dahlias Maks Georgia' filter=[{'css': {'selector': 'div.column.purchase-options-container.span-6'}}, 'html2text', 'strip'] diff_tool='diff --ignore-all-space --unified' diff_filter=[{'grep': "-v 'Out of stock'"}]>
2023-12-02 21:18:05,247 minidb DEBUG: SELECT data, timestamp, tries, etag FROM CacheEntry WHERE guid = ? ORDER BY timestamp DESC, tries DESC LIMIT ? ['67733641008777777d1b0700d290e41ff938b9f4', 1]
2023-12-02 21:18:05,247 handler INFO: Processing: <browser navigate='https://lobaughsdahlias.com/product/newaukum-honey/' wait_until='load' browser='chromium' name='Dahlias Honey' filter=[{'css': {'selector': 'div.column.purchase-options-container.span-6'}}, 'html2text', 'strip'] diff_tool='diff --ignore-all-space --unified' diff_filter=[{'grep': "-v 'Out of stock'"}]>
2023-12-02 21:18:05,249 minidb DEBUG: SELECT data, timestamp, tries, etag FROM CacheEntry WHERE guid = ? ORDER BY timestamp DESC, tries DESC LIMIT ? ['56cc3f3526d3800b442c5dc93dfd8779f1356c2a', 1]
2023-12-02 21:18:05,251 minidb DEBUG: SELECT data, timestamp, tries, etag FROM CacheEntry WHERE guid = ? ORDER BY timestamp DESC, tries DESC LIMIT ? ['5cf8173e2609fe5b1bc182edae8db9031eb440fb', 1]
2023-12-02 21:18:05,253 connectionpool DEBUG: Starting new HTTPS connection (1): www.mayberryblooms.com:443
2023-12-02 21:18:05,291 selector_events DEBUG: Using selector: KqueueSelector
2023-12-02 21:18:05,291 selector_events DEBUG: Using selector: KqueueSelector
2023-12-02 21:18:05,559 connectionpool DEBUG: https://www.mayberryblooms.com:443 "GET /product/mayberry-s-pink-champagne-limit-one-per-household/291?cs=true&cst=custom HTTP/1.1" 200 None
2023-12-02 21:18:05,572 filters INFO: Applying filter 'css', subfilter {'selector': 'div.product-inventory__wrapper'} to https://www.mayberryblooms.com/product/mayberry-s-pink-champagne-limit-one-per-household/291?cs=true&cst=custom
2023-12-02 21:18:05,574 worker DEBUG: Job finished: <url url='https://www.mayberryblooms.com/product/mayberry-s-pink-champagne-limit-one-per-household/291?cs=true&cst=custom' method='GET' name='Dahlias Mayberry' filter=[{'css': {'selector': 'div.product-inventory__wrapper'}}] max_tries=5>
2023-12-02 21:18:05,574 worker DEBUG: Using max_tries of 5 for <url url='https://www.mayberryblooms.com/product/mayberry-s-pink-champagne-limit-one-per-household/291?cs=true&cst=custom' method='GET' name='Dahlias Mayberry' filter=[{'css': {'selector': 'div.product-inventory__wrapper'}}] max_tries=5>
2023-12-02 21:18:07,182 filters INFO: Applying filter 'css', subfilter {'selector': 'div.column.purchase-options-container.span-6'} to https://lobaughsdahlias.com/product/newaukum-honey/
2023-12-02 21:18:07,185 filters INFO: Applying filter 'html2text', subfilter {} to https://lobaughsdahlias.com/product/newaukum-honey/
2023-12-02 21:18:07,185 filters INFO: Applying filter 'strip', subfilter {} to https://lobaughsdahlias.com/product/newaukum-honey/
2023-12-02 21:18:07,185 worker DEBUG: Job finished: <browser navigate='https://lobaughsdahlias.com/product/newaukum-honey/' wait_until='load' browser='chromium' name='Dahlias Honey' filter=[{'css': {'selector': 'div.column.purchase-options-container.span-6'}}, 'html2text', 'strip'] diff_tool='diff --ignore-all-space --unified' diff_filter=[{'grep': "-v 'Out of stock'"}]>
2023-12-02 21:18:07,185 worker DEBUG: Using max_tries of 0 for <browser navigate='https://lobaughsdahlias.com/product/newaukum-honey/' wait_until='load' browser='chromium' name='Dahlias Honey' filter=[{'css': {'selector': 'div.column.purchase-options-container.span-6'}}, 'html2text', 'strip'] diff_tool='diff --ignore-all-space --unified' diff_filter=[{'grep': "-v 'Out of stock'"}]>
2023-12-02 21:18:07,374 filters INFO: Applying filter 'css', subfilter {'selector': 'div.column.purchase-options-container.span-6'} to https://lobaughsdahlias.com/product/maks-georgia-2020-introduction/
2023-12-02 21:18:07,377 filters INFO: Applying filter 'html2text', subfilter {} to https://lobaughsdahlias.com/product/maks-georgia-2020-introduction/
2023-12-02 21:18:07,377 filters INFO: Applying filter 'strip', subfilter {} to https://lobaughsdahlias.com/product/maks-georgia-2020-introduction/
2023-12-02 21:18:07,377 worker DEBUG: Job finished: <browser navigate='https://lobaughsdahlias.com/product/maks-georgia-2020-introduction/' wait_until='load' browser='chromium' name='Dahlias Maks Georgia' filter=[{'css': {'selector': 'div.column.purchase-options-container.span-6'}}, 'html2text', 'strip'] diff_tool='diff --ignore-all-space --unified' diff_filter=[{'grep': "-v 'Out of stock'"}]>
2023-12-02 21:18:07,377 worker DEBUG: Using max_tries of 0 for <browser navigate='https://lobaughsdahlias.com/product/maks-georgia-2020-introduction/' wait_until='load' browser='chromium' name='Dahlias Maks Georgia' filter=[{'css': {'selector': 'div.column.purchase-options-container.span-6'}}, 'html2text', 'strip'] diff_tool='diff --ignore-all-space --unified' diff_filter=[{'grep': "-v 'Out of stock'"}]>
2023-12-02 21:18:07,377 reporters INFO: Submitting with stdout (<class 'urlwatch.reporters.StdoutReporter'>)
2023-12-02 21:18:07,377 reporters INFO: Submitting with email (<class 'urlwatch.reporters.EMailReporter'>)
2023-12-02 21:18:07,378 reporters DEBUG: Not sending e-mail (no changes)
2023-12-02 21:18:07,378 reporters INFO: Submitting with ifttt (<class 'urlwatch.reporters.IFTTTReport'>)
2023-12-02 21:18:07,378 reporters INFO: Submitting with pushover (<class 'urlwatch.reporters.PushoverReport'>)
2023-12-02 21:18:07,378 reporters DEBUG: Not sending pushover (no changes)

Problem

However, I continually get positive hits and these generate reporter actions. Why is it reporting them as new?

===========================================================================
01. NEW: Dahlias Maks Georgia
02. NEW: Dahlias Honey
===========================================================================

---------------------------------------------------------------------------
NEW: Dahlias Maks Georgia ( https://lobaughsdahlias.com/product/maks-georgia-2020-introduction/ )
---------------------------------------------------------------------------


---------------------------------------------------------------------------
NEW: Dahlias Honey ( https://lobaughsdahlias.com/product/newaukum-honey/ )
---------------------------------------------------------------------------


-- 
urlwatch 2.28, Copyright 2008-2023 Thomas Perl
Website: https://thp.io/2008/urlwatch/
Support urlwatch development: https://github.com/sponsors/thp
watched 3 URLs in 2 seconds

Thank you!

The text was updated successfully, but these errors were encountered:

jkmaxwell · 2023-12-03T17:16:37Z

Maybe it is a cache problem? There are two things confusing me:

why are the reports calling these new every time when they are not new?
why is the report reporting on blank information when my criteria should be preventing that?

Thanks.

thp · 2023-12-03T20:10:55Z

urlwatch stores its history in directories related to cache (see e.g. #416). If you have a program on your machine which regularly cleans cache directories, this currently will clean out the history.

"NEW" in that report means that there is no previous entry for it. Otherwise the report would mention "CHANGED".

However, it's weird that --test-diff-filter does show history entries being available.

After you see the "NEW" report, does --test-diff-filter still work or does it complain that there are no historic entries?

jkmaxwell · 2023-12-03T23:17:08Z

Thanks @thp ! I don't have anything running that i know of that is cleaning the cache (no cleanmymac). --test-diff-filter works every time, which is why this is so weird. It has never complained about any lack of historic entries. I don't understand why it both registers it as new every time and reports historic entries.

FWIW it seems to be only with these https://lobaughsdahlias.com/ ecommerce urls. I've been using urlwatch successfully on a few other urls for a month or so and it works as expected

gaia · 2023-12-04T22:26:31Z

@jkmaxwell how did you install urlwatch? i've been having sporadic alarms about NEW items which are not new (and there is no cache autoclean, especially bc if there was all jobs would alert when one job is new. in this case only a subset of the jobs alert as new).

I've installed urlwatch using rye. at first i thought it was urlwatch being 2.28, but i've downgraded to 2.25 and the issue persists. so it must be rye's fault, as 2.25 works fine on other machines.

thp · 2023-12-09T07:28:17Z

Between 2.27 and 2.28, the backend for "navigate" jobs changed from Pyppeteer to Playwright.

In the logs you posted above, it says:

2023-12-02 21:18:07,377 reporters INFO: Submitting with stdout (<class 'urlwatch.reporters.StdoutReporter'>)
2023-12-02 21:18:07,377 reporters INFO: Submitting with email (<class 'urlwatch.reporters.EMailReporter'>)
2023-12-02 21:18:07,378 reporters DEBUG: Not sending e-mail (no changes)
2023-12-02 21:18:07,378 reporters INFO: Submitting with ifttt (<class 'urlwatch.reporters.IFTTTReport'>)
2023-12-02 21:18:07,378 reporters INFO: Submitting with pushover (<class 'urlwatch.reporters.PushoverReport'>)
2023-12-02 21:18:07,378 reporters DEBUG: Not sending pushover (no changes)

It seems like you have multiple reporters configured (stdout + ifttt appear in addition to e-mail and pushover?).

Also, according to the logs, this specific run didn't cause any e-mail or pushover notification to be sent?

Can you post (with private information replaced with "XXX"):

The full urlwatch.yaml config file
The full urls.yaml (you might have already posted it above?)

gaia · 2023-12-10T00:18:34Z

@thp the issue is with rye.

since installing it with pipx it's been working fine.

I am new to rye though I've been using urlwatch for years and find it extremely helpful. rye looks very promising, so maybe you could take a look over there and give Armin a hand (if you find the time)?

thp · 2023-12-10T10:30:17Z

Not sure if rye can cause run-time issues like this, maybe some direct or indirect dependencies of urlwatch are installed in the wrong version?

@jkmaxwell did you use rye to install or is your issue unrelated to rye?

bb6 · 2024-01-13T21:09:28Z

Encountering the same issue, where the reporter is frequently reporting blanks.

Started several days ago when it appears that urlwatch was somehow updated, it was updated from 2.28 to 2.28_4. (2.28_4 was installed through Homebrew while other dependencies were updated).

Similar to above, the test-diff-filter shows the past several months of update history, and test-filter shows the current update. Inspecting the urlwatch cache shows that the history is being stored without issues. The job's cache guid has also not changed and is consistent. But every few hours the reporter will fire with NEW (not CHANGES), and the report will show there are no additions nor changes (it is blank/identical to above).

Inspection with --verbose looks similar, where the filter is applied, reports are made with stdout and telegram, and the debug states (no changes).
2024-01-13 15:14:58,353 reporters INFO: Submitting with stdout (<class 'urlwatch.reporters.StdoutReporter'>)
2024-01-13 15:14:58,353 reporters INFO: Submitting with telegram (<class 'urlwatch.reporters.TelegramReporter'>)
2024-01-13 15:14:58,353 reporters DEBUG: Not calling telegram API (no changes)

Uninstalling and reinstalling urlwatch hasn't seemed to have made any difference, as the blank reports are still surfacing. rye is not installed on the machine.

jkmaxwell · 2024-01-18T06:28:35Z

installed via brew here as well, not rye

bb6 · 2024-01-19T04:10:55Z

So far, have run into a single ERROR report - aside from the blank "NEW" reports. The error seems to be identical to the ERROR that @gaia encountered in the rye issues thread, except the use of brew in pathing instead of rye.

Traceback (most recent call last):
  File "/usr/local/Cellar/urlwatch/2.28_4/libexec/lib/python3.12/site-packages/urlwatch/handler.py", line 106, in process
    self.load()
  File "/usr/local/Cellar/urlwatch/2.28_4/libexec/lib/python3.12/site-packages/urlwatch/handler.py", line 85, in load
    self.old_data, self.timestamp, self.tries, self.etag = self.cache_storage.load(self.job, guid)
                                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/urlwatch/2.28_4/libexec/lib/python3.12/site-packages/urlwatch/storage.py", line 571, in load
    for data, timestamp, tries, etag in CacheEntry.query(self.db,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: not enough values to unpack (expected 4, got 0)

The "NEW" blank reports only occur on a subset of the jobs.

After receiving the "NEW" blank reports for up to a week, they seemingly only occur when the computer is actively in use. If the computer is asleep/urlwatch is still running with a scheduled cron job, it appears the "NEW" blank reports will not occur.

Edit: March 4, 2024- This error is still occurring. For example, my reporter had fired approximately 40 times with "CHANGES" from July 2023 to January 2024 (around 8 months). From January to March, the reporter has fired approximately 585 times containing blank "NEW" reports, and only 5 of these reports have actually contained changes.

mhalano · 2024-03-04T23:42:48Z

So far, have run into a single ERROR report - aside from the blank "NEW" reports. The error seems to be identical to the ERROR that @gaia encountered in the rye issues thread, except the use of brew in pathing instead of rye.
Traceback (most recent call last):
  File "/usr/local/Cellar/urlwatch/2.28_4/libexec/lib/python3.12/site-packages/urlwatch/handler.py", line 106, in process
    self.load()
  File "/usr/local/Cellar/urlwatch/2.28_4/libexec/lib/python3.12/site-packages/urlwatch/handler.py", line 85, in load
    self.old_data, self.timestamp, self.tries, self.etag = self.cache_storage.load(self.job, guid)
                                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/Cellar/urlwatch/2.28_4/libexec/lib/python3.12/site-packages/urlwatch/storage.py", line 571, in load
    for data, timestamp, tries, etag in CacheEntry.query(self.db,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: not enough values to unpack (expected 4, got 0)
The "NEW" blank reports only occur on a subset of the jobs.

After receiving the "NEW" blank reports for up to a week, they seemingly only occur when the computer is actively in use. If the computer is asleep/urlwatch is still running with a scheduled cron job, it appears the "NEW" blank reports will not occur.

Edit: March 4, 2024- This error is still occurring. For example, my reporter had fired approximately 40 times with "CHANGES" from July 2023 to January 2024 (around 8 months). From January to March, the reporter has fired approximately 585 times containing blank "NEW" reports, and only 5 of these reports have actually contained changes.

Giving my sad face emoji because this also happens to me. Or I get this Python error or a NEW entry appears.

Jamstah · 2024-03-05T12:03:38Z

@mhalano can you help me with a recreate? If you can help me trigger the issue I can try and work out why its happening. Are you also installing with rye?

mhalano · 2024-03-05T12:10:27Z

No. I generated a wheel file from source and installed with pip. I think the problem is related to Python version 3.12. Version 3.11 didn't have any problems. I can install using pip or Ubuntu package manager. The question is I'm using the development version, that uses Python version 3.12. But sure, I will help you.

Jamstah · 2024-03-05T12:12:31Z

Can you share the commands you used to build and install the wheel, and the exact python version?

Jamstah · 2024-03-05T12:21:21Z

A pip list --format freeze would be useful too.

mhalano · 2024-03-05T12:30:14Z

I built from master but the most recent commit was e7a4d004425bdb275201ec5af45b24ac8c8d1946.
The commands I used:

python3 setup.py bdist_wheel
pip3 install --force-reinstall --user dist/urlwatch-2.28-py3-none-any.whl

My Python version is 3.12.2

And pip list --format freeze, as requested:

alabaster==0.7.12
ansible==9.3.0
ansible-core==2.16.4
anyio==4.2.0
apparmor==4.0.0a4
appdirs==1.4.4
argcomplete==3.1.4
attrs==23.2.0
Babel==2.10.3
bcrypt==3.2.2
beautifulsoup4==4.12.3
blessed==1.20.0
blinker==1.7.0
BlinkStick==1.2.0
Brlapi==0.8.5
Brotli==1.1.0
certifi==2024.2.2
cffi==1.16.0
chardet==5.2.0
charset-normalizer==3.3.2
click==8.1.6
cloud-init==24.1
cloudscraper==1.2.71
colorama==0.4.6
command-not-found==0.3
configobj==5.0.8
construct==2.10.68
construct-legacy==2.5.3
cryptography==42.0.5
cssselect==1.2.0
cupshelpers==1.0
dbus-python==1.3.2
decorator==5.1.1
defer==1.0.6
defusedxml==0.7.1
Deprecated==1.2.14
devscripts==2.23.7
diceware==0.10
distlib==0.3.8
distro==1.9.0
distro-info==1.7
docker==5.0.3
docutils==0.20.1
dput==1.39
duplicity==2.1.4
enlighten==1.12.4
evdev==1.7.0
exceptiongroup==1.2.0
fasteners==0.18
filelock==3.13.1
freetype-py==2.4.0
future==0.18.2
gpg==1.18.0
h11==0.14.0
html5lib==1.1
httpcore==1.0.2
httpie==3.2.2
httplib2==0.20.4
httpx==0.26.0
idna==3.6
imagesize==1.4.1
img2pdf==0.5.1
importlib-metadata==4.12.0
iotop==0.6
jaraco.classes==3.3.1
jeepney==0.8.0
Jinja2==3.1.2
jmespath==1.0.1
jq==1.6.0
jsonpatch==1.32
jsonpointer==2.0
jsonschema==4.10.3
keyring==24.3.0
language-selector==0.1
launchpadlib==1.11.0
lazr.restfulclient==0.14.6
lazr.uri==1.0.6
LibAppArmor==4.0.0a4
libevdev==0.5
libvirt-python==10.0.0
lockfile==0.12.2
louis==3.28.0
lxml==5.1.0
Mako==1.3.2.dev0
Markdown==3.5.2
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
meson==1.3.2
minidb==2.0.7
monotonic==1.6
more-itertools==10.2.0
multidict==6.0.4
mutagen==1.46.0
nala==0.15.1
netifaces==0.11.0
nsz==4.6.1
ntlm-auth==1.5.0
oauthlib==3.2.2
olefile==0.46
packaging==23.2
paramiko==2.12.0
pbr==5.11.1
pefile==2023.2.7
pexpect==4.9.0
pikepdf==8.7.1
pillow==10.2.0
pip==24.0
platformdirs==4.2.0
prefixed==0.7.0
protobuf==4.21.12
psutil==5.9.8
ptyprocess==0.7.0
pycairo==1.25.1
pycparser==2.21
pycryptodome==3.20.0
pycryptodomex==3.20.0
pycups==2.0.1
Pygments==2.17.2
PyGObject==3.47.0.dev0
PyJWT==2.7.0
pykerberos==1.1.14
PyNaCl==1.5.0
pyparsing==3.1.1
pyrsistent==0.20.0
pyserial==3.5
PySocks==1.7.1
pyte==0.8.0
python-apt==2.7.6
python-dateutil==2.8.2
python-debian==0.1.49+ubuntu2
python-magic==0.4.27
python-xlib==0.33
pytz==2024.1
pyudev==0.24.0
pyusb==1.2.1.post2
pywinrm==0.4.3
pywinusb==0.4.2
pyxdg==0.28
PyYAML==6.0.1
reportlab==4.1.0
requests==2.31.0
requests-ntlm==1.1.0
requests-toolbelt==1.0.0
resolvelib==1.0.1
rich==13.7.0
rlPyCairo==0.3.0
rofication==1.2.2
roman==3.3
screen-resolution-extra==0.0.0
SecretStorage==3.3.3
setproctitle==1.3.3
setuptools==68.1.2
six==1.16.0
smartypants==2.0.0
sniffio==1.3.0
snowballstemmer==2.2.0
socksio==1.0.0
soupsieve==2.5
Sphinx==7.2.6
sphinx-rtd-theme==2.0.0
sphinxcontrib-jquery==4.1
ssh-import-id==5.11
stevedore==5.1.0
systemd-python==235
terminator==2.1.3
toml==0.10.2
tomli==2.0.1
typer==0.9.0
typing_extensions==4.9.0
typogrify==2.0.7
ubuntu-drivers-common==0.0.0
ubuntu-pro-client==8001
ufw==0.36.2
unattended-upgrades==0.1
unidiff==0.7.3
urllib3==2.2.1
urlwatch==2.28
usb-creator==0.3.16
virtualenv==20.25.0+ds
virtualenv-clone==0.5.7
virtualenvwrapper==4.8.4
wadllib==1.3.6
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.7.0
websockets==10.4
wheel==0.42.0
wrapt==1.15.0
xdg==5
xkit==0.0.0
xmltodict==0.13.0
yt-dlp==2023.12.30
zipp==1.0.0
zstandard==0.22.0

Jamstah · 2024-03-05T13:48:34Z

Thanks. I managed to make it fail and have been having a play.

I expect this is a race in the cache. If I switch MAX_WORKERS in worker.py to 1, I don't get the errors any more. It would be good to hear if the same applies for you, but I'm assuming there's a DB lock somewhere that's causing this.

Jamstah · 2024-03-05T14:56:49Z

@thp would be interested in your thought on this one. I'm not sure where to start looking, especially as minidb seems to be using locks on both read and write already. I might potentially try moving the job_state load call out of the parallelism.

I'm fairly sure its the minidb cache, because I also tries switching to the old file based caching mechanism, and that didn't exhibit the same problems. That one does also have a different relationship with history though.

Jamstah · 2024-03-05T15:03:53Z

Yes, moving the load to not attempt to be parallel fixes the bug for me: https://github.com/thp/urlwatch/compare/master...Jamstah:threading?expand=1

Its a workaround though, minidb should be doing the locking already, unless RLock is doing something odd with the threading around python vs system threads.

My local system python is currently 3.12.1 and it doesn't exhibit the symptoms. To show the bug I ran in a container with 3.12.2. I'll update my system to 3.12.2 and see if that's the underlying change that exposes the bug too.

Jamstah · 2024-03-05T15:11:35Z

Yes, I upgraded my system to python 3.12.2, and the bug appeared, so its definitely a change in 3.12.2 that exposes the bug.

Don't see anything obvious in the changelog though: https://docs.python.org/3/whatsnew/changelog.html#python-3-12-2-final

Jamstah · 2024-03-06T10:45:36Z

Added a test that runs static command jobs 100 times which flags up the error reliably on python 3.12.2, so we can add that along with whatever solution we decide to go with: https://github.com/thp/urlwatch/compare/master...Jamstah:threading?expand=1

No PR yet because I'm not happy with the solution.

bb6 · 2024-03-06T13:03:39Z

I expect this is a race in the cache. If I switch MAX_WORKERS in worker.py to 1, I don't get the errors any more. It would be good to hear if the same applies for you, but I'm assuming there's a DB lock somewhere that's causing this.

24 hours after switching MAX_WORKERS in worker.py from 10 to 1, there has not been a "blank NEW" report, nor have there been any error messages.

mhalano · 2024-03-06T19:14:02Z

I expect this is a race in the cache. If I switch MAX_WORKERS in worker.py to 1, I don't get the errors any more. It would be good to hear if the same applies for you, but I'm assuming there's a DB lock somewhere that's causing this.

24 hours after switching MAX_WORKERS in worker.py from 10 to 1, there has not been a "blank NEW" report, nor have there been any error messages.

This approach couldn't affect negatively the execution speed? I think 10 workers working in parallel is better than serialize everything to just one single worker, but this is a well-educated guess. I don't know if this is how urlwatch works.

Jamstah · 2024-03-06T21:58:37Z

Yes, that will slow down execution speed. That test was really to check if it was a race condition, that isn't how I would solve the problem.

One option is here, which just moves loading the cache to be serialized (which it already was effectively because the database is serialized), then checks the urls in parallel: https://github.com/thp/urlwatch/compare/master...Jamstah:threading?expand=1

I'd like to wait for @thp to take a look though, because he wrote both urlwatch and the database layer, so may have some more thoughts.

thp · 2024-03-12T04:06:15Z

Yes, loading the cache doesn't need to be parallel, and loading it upfront seems like a great idea.

Jamstah · 2024-03-12T10:43:17Z

Thanks, have opened the PR.

thp · 2024-04-24T18:56:04Z

Also, @Jamstah thanks for the test case for Python 3.12, I think this fixes it in minidb: thp/minidb@433ae34

thp · 2024-04-24T19:06:39Z

minidb 2.0.8 should contain the fix.

Threading was the cause of the issue in thp#779, so adding a test that will exercise multi-threading.

Threading was the cause of the issue in #779, so adding a test that will exercise multi-threading. Co-authored-by: Thomas Perl <[email protected]>

This was referenced Mar 4, 2024

Randomly "not enough values to unpack" Python errors #799

Closed

Cache inconsistency creating new items from nowhere #800

Closed

Jamstah mentioned this issue Mar 12, 2024

Add concurrency test #806

Merged

thp added a commit that referenced this issue Apr 24, 2024

Upgrade minidb to 2.0.8 (fixes #779)

325844b

thp linked a pull request Apr 24, 2024 that will close this issue

Upgrade minidb to 2.0.8 (fixes #779) #816

Merged

thp closed this as completed in #816 Apr 24, 2024

thp added a commit that referenced this issue Apr 24, 2024

Upgrade minidb to 2.0.8 (fixes #779) (#816)

17d02c4

Jamstah added a commit to Jamstah/urlwatch that referenced this issue May 5, 2024

Add tests that exercise threading

937201a

Threading was the cause of the issue in thp#779, so adding a test that will exercise multi-threading.

Jamstah added a commit to Jamstah/urlwatch that referenced this issue May 5, 2024

Add tests that exercise threading

4c467b0

Threading was the cause of the issue in thp#779, so adding a test that will exercise multi-threading.

thp added a commit that referenced this issue May 18, 2024

Add tests that exercise threading (#806)

8af9a1b

Threading was the cause of the issue in #779, so adding a test that will exercise multi-threading. Co-authored-by: Thomas Perl <[email protected]>

bb6 mentioned this issue Sep 19, 2024

Reports flipping between adding and removing all data #829

Closed

RunningDroid mentioned this issue Dec 30, 2024

urlwatch: update to 2.29 void-linux/void-packages#53768

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reporting blanks #779

Reporting blanks #779

jkmaxwell commented Dec 3, 2023 •

edited

Loading

jkmaxwell commented Dec 3, 2023

thp commented Dec 3, 2023

jkmaxwell commented Dec 3, 2023 •

edited

Loading

gaia commented Dec 4, 2023

thp commented Dec 9, 2023

gaia commented Dec 10, 2023

thp commented Dec 10, 2023

bb6 commented Jan 13, 2024 •

edited

Loading

jkmaxwell commented Jan 18, 2024 •

edited

Loading

bb6 commented Jan 19, 2024 •

edited

Loading

mhalano commented Mar 4, 2024

Jamstah commented Mar 5, 2024

mhalano commented Mar 5, 2024

Jamstah commented Mar 5, 2024 •

edited

Loading

Jamstah commented Mar 5, 2024

mhalano commented Mar 5, 2024

Jamstah commented Mar 5, 2024

Jamstah commented Mar 5, 2024

Jamstah commented Mar 5, 2024

Jamstah commented Mar 5, 2024 •

edited

Loading

Jamstah commented Mar 6, 2024 •

edited

Loading

bb6 commented Mar 6, 2024 •

edited

Loading

mhalano commented Mar 6, 2024

Jamstah commented Mar 6, 2024

thp commented Mar 12, 2024

Jamstah commented Mar 12, 2024

thp commented Apr 24, 2024

thp commented Apr 24, 2024

Reporting blanks #779

Reporting blanks #779

Comments

jkmaxwell commented Dec 3, 2023 • edited Loading

urlwatch.yaml

urls.yaml

Logs

test-diff-filter

test-filter

--verbose

Problem

jkmaxwell commented Dec 3, 2023

thp commented Dec 3, 2023

jkmaxwell commented Dec 3, 2023 • edited Loading

gaia commented Dec 4, 2023

thp commented Dec 9, 2023

gaia commented Dec 10, 2023

thp commented Dec 10, 2023

bb6 commented Jan 13, 2024 • edited Loading

jkmaxwell commented Jan 18, 2024 • edited Loading

bb6 commented Jan 19, 2024 • edited Loading

mhalano commented Mar 4, 2024

Jamstah commented Mar 5, 2024

mhalano commented Mar 5, 2024

Jamstah commented Mar 5, 2024 • edited Loading

Jamstah commented Mar 5, 2024

mhalano commented Mar 5, 2024

Jamstah commented Mar 5, 2024

Jamstah commented Mar 5, 2024

Jamstah commented Mar 5, 2024

Jamstah commented Mar 5, 2024 • edited Loading

Jamstah commented Mar 6, 2024 • edited Loading

bb6 commented Mar 6, 2024 • edited Loading

mhalano commented Mar 6, 2024

Jamstah commented Mar 6, 2024

thp commented Mar 12, 2024

Jamstah commented Mar 12, 2024

thp commented Apr 24, 2024

thp commented Apr 24, 2024

jkmaxwell commented Dec 3, 2023 •

edited

Loading

jkmaxwell commented Dec 3, 2023 •

edited

Loading

bb6 commented Jan 13, 2024 •

edited

Loading

jkmaxwell commented Jan 18, 2024 •

edited

Loading

bb6 commented Jan 19, 2024 •

edited

Loading

Jamstah commented Mar 5, 2024 •

edited

Loading

Jamstah commented Mar 5, 2024 •

edited

Loading

Jamstah commented Mar 6, 2024 •

edited

Loading

bb6 commented Mar 6, 2024 •

edited

Loading