Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter using external script #215

Open
pacien opened this issue Dec 8, 2024 · 2 comments
Open

Filter using external script #215

pacien opened this issue Dec 8, 2024 · 2 comments

Comments

@pacien
Copy link

pacien commented Dec 8, 2024

Thanks for making this software.

Is your feature request related to a problem? Please describe.

Some feeds I'm subscribed to have some noise in them
(sponsored articles or articles from categories I'm not interested in).
I'd like to filter out those articles, and rewrite some of them.

Describe the solution you'd like

This can be solved by piping entries through an external filter script.
Such script could be called with each individual entry (as JSON for example),
allowing the external program to do rewrites or drop some entries entirely.

Describe alternatives you've considered

Using email filters is hard because most of the information has already been
lost, and rewriting is even harder at this stage.

Proxy-ing the source RSS/Atom feed to filter and rewrite it beforehand is
possible but quite cumbersome (requiring setting up a local web server just for
that). This also means executing the script including for articles that goeland
knows it already saw.

Additional context

Some examples of my use-cases:

  • omitting Youtube shorts (using info from yt-dlp),
  • omitting articles by categories (filtering by URL),
  • fixing broken formatting from some sources.
@pacien
Copy link
Author

pacien commented Dec 8, 2024

This would also cover #17 for sources with content which cannot be retrieved through a generic way.

@slurdge
Copy link
Owner

slurdge commented Dec 23, 2024

Hello,

Sorry for the delay, IRL stuff is coming in right now.
I think it's great idea. Having a simple way to pipe to external tools would solve many use cases.
However:

Using email filters is hard because most of the information has already been lost, and rewriting is even harder at this stage.
I don't see how you would pipe at another stage. Either you got the entries from the feed, and then you have them in parsed form, which is more or less what the filters have available, or you have to do on way upward, with the feed itself (but then it's unparsed).

Also, this poses the problem of the format. My initial guess would to go to a json format, maybe xml or another as a command line flag, to provide some flexibility to outside programs.

For a security POV, you would have untrusted text/data going through others programs than goeland, but I think this is acceptable, even more so if the entries have been cleaned up (as it is right now).

I'm pretty busy right now, but I'll start fleshing out some implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants