Added oEmbed support to the Web plugin to improve title fetching by lodriguez · Pull Request #1613 · progval/Limnoria

lodriguez · 2025-02-01T16:11:35Z

Two methods are supported and can be configured:

Registry lookup via oembed.com (useOembedRegistry)
HTML discovery of endpoints (useOembedDiscovery)

Registry is tried first if enabled, then discovery if enabled,
finally falling back to HTML parsing.

I think it would be nicer to add the calls to getTitle().
I was trying to avoid using url_workaround(), since it would break support for reddit - for exsample, old.reddit.com isn't in the registry. However, following redirects could actually be beneficial, especially for handling v.reddit.com.
For discovery, it makes sense to place this logic close to the title parser, as that would save an extra request. That said, I wanted to open the pull request first to gather some feedback before making further adjustments.

progval

nice, thanks.

The design looks fine, but i have nitpick belows

progval · 2025-02-03T18:04:24Z

plugins/Web/config.py

+conf.registerGlobalValue(Web, 'useOembedRegistry',
+    registry.Boolean(False, _("""Determines whether the bot will use the 
+    oembed.com providers registry.""")))
+
+conf.registerGlobalValue(Web, 'useOembedDiscovery',
+    registry.Boolean(False, _("""Determines whether the bot will use HTML
+    discovery to find oEmbed endpoints.""")))


Name the config variables plugins.Web.oembed.registry and plugins.Web.oembed.discovery.

Could you also make them channel-specific, but not op-settable?

progval · 2025-02-03T18:07:16Z

plugins/Web/plugin.py

+                response = utils.web.getUrl(url, timeout=timeout)
+                text = response.decode('utf8', errors='replace')
+                match = re.search(
+                    r'<link[^>]+?type="application/json\+oembed"[^>]+?href="([^"]+)"',


That's insufficient, it does not cover different attribute orders or single quotes.

progval · 2025-02-03T18:08:03Z

plugins/Web/plugin.py

+                    re.IGNORECASE)
+                if match:
+                    endpoint = match.group(1)
+                    endpoint = endpoint.split('?')[0]


progval · 2025-02-03T18:09:24Z

plugins/Web/plugin.py

+            oembed_endpoint = self._getOEmbedEndpoint(url)
+            if not oembed_endpoint:
+                return None
+            oembed_url = f"{oembed_endpoint}?format=json&url={url}"


don't you need to escape the URL? (use urllib.parse.urlunparse)

progval · 2025-02-03T18:10:44Z

plugins/Web/test.py

+        def testtitleOembedRegistry(self):
+            try:
+                conf.supybot.plugins.Web.useOembedRegistry.setValue(True)
+                self.assertResponse(
+                    'title https://www.flickr.com/photos/bees/2362225867/',
+                    'Bacon Lollys')
+            finally:
+                conf.supybot.plugins.Web.useOembedRegistry.setValue(False)
+
+        def testtitleOembedDiscovery(self):
+            try:
+                conf.supybot.plugins.Web.useOembedDiscovery.setValue(True)
+                self.assertResponse(
+                    'title https://flickr.com/photos/bees/2362225867/',
+                    'Bacon Lollys')
+            finally:
+                conf.supybot.plugins.Web.useOembedDiscovery.setValue(False)
+
+        def testtitleOembedError(self):
+            try:
+                conf.supybot.plugins.Web.useOembedDiscovery.setValue(True)
+                self.assertError('title https://nonexistent.example.com/post/123')
+            finally:
+                conf.supybot.plugins.Web.useOembedDiscovery.setValue(False)


Suggested change

def testtitleOembedRegistry(self):

try:

conf.supybot.plugins.Web.useOembedRegistry.setValue(True)

self.assertResponse(

'title https://www.flickr.com/photos/bees/2362225867/',

'Bacon Lollys')

finally:

conf.supybot.plugins.Web.useOembedRegistry.setValue(False)

def testtitleOembedDiscovery(self):

try:

conf.supybot.plugins.Web.useOembedDiscovery.setValue(True)

self.assertResponse(

'title https://flickr.com/photos/bees/2362225867/',

'Bacon Lollys')

finally:

conf.supybot.plugins.Web.useOembedDiscovery.setValue(False)

def testtitleOembedError(self):

try:

conf.supybot.plugins.Web.useOembedDiscovery.setValue(True)

self.assertError('title https://nonexistent.example.com/post/123')

finally:

conf.supybot.plugins.Web.useOembedDiscovery.setValue(False)

def testtitleOembedRegistry(self):

with conf.supybot.plugins.Web.useOembedRegistry.context(True):

self.assertResponse(

'title https://www.flickr.com/photos/bees/2362225867/',

'Bacon Lollys')

def testtitleOembedDiscovery(self):

with conf.supybot.plugins.Web.useOembedDiscovery.context(True):

self.assertResponse(

'title https://flickr.com/photos/bees/2362225867/',

'Bacon Lollys')

def testtitleOembedError(self):

with conf.supybot.plugins.Web.useOembedDiscovery.context(True):

self.assertError('title https://nonexistent.example.com/post/123')

use the new style, it's shorter

lodriguez added 6 commits February 1, 2025 14:38

use oEmbed to check for title before parsing the page

ecd42ad

refactor oEmbed, only download json when needed

e7f79b5

add oEmbed discovery

eadac11

add config options useOembedRegistry and useOembedDiscovery

427845a

add oEmbed too title function

c1ceb77

add tests (result from flickr is different to the html-title-tag)

1a92dcd

progval reviewed Feb 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added oEmbed support to the Web plugin to improve title fetching#1613

Added oEmbed support to the Web plugin to improve title fetching#1613
lodriguez wants to merge 6 commits intoprogval:masterfrom
lodriguez:oEmbed

lodriguez commented Feb 1, 2025

Uh oh!

progval left a comment

Uh oh!

progval Feb 3, 2025

Uh oh!

progval Feb 3, 2025

Uh oh!

progval Feb 3, 2025

Uh oh!

progval Feb 3, 2025

Uh oh!

progval Feb 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lodriguez commented Feb 1, 2025

Uh oh!

progval left a comment

Choose a reason for hiding this comment

Uh oh!

progval Feb 3, 2025

Choose a reason for hiding this comment

Uh oh!

progval Feb 3, 2025

Choose a reason for hiding this comment

Uh oh!

progval Feb 3, 2025

Choose a reason for hiding this comment

Uh oh!

progval Feb 3, 2025

Choose a reason for hiding this comment

Uh oh!

progval Feb 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants