Tracking downloads with plausible and ERDDAP #172

JessyBarrette · 2024-02-15T16:32:03Z

#166 is getting close to be able to running accordingly and need a few fine tuning and clean. One issue left is to be able to track the data downloads via plausible.

We can easily deploy scripts via the init.d folder within an erddap container and I think this would be the ideal method to implement the plausible download tracker.

@fostermh How is setup right now your log scrapper for tracking downloads. Can we include the dependacies within a Dockerfile and add the script via an executable file?

See example of the Dockerfile: here

and list of executable to run on the container start here: https://github.com/HakaiInstitute/hakai-datasets/tree/caprover-deploy/init.d

Any files within init.d set as executable or matching. *.sh will get executed when a container is started.

The text was updated successfully, but these errors were encountered:

JessyBarrette · 2024-02-15T20:19:20Z

After talking with @fostermh

Since the present method @fostermh use is using the appache log on top of the container. We will attempt to log usage by having a second container running in parallel that access the nginx log of the caprover erddap app.

The initial method was using appache, Matt will review if this method is compatible with NGINX which is used by caprover.

JessyBarrette · 2024-02-15T20:20:19Z

In consequence, we will ignore this present issue while developing #166

fostermh · 2024-02-15T20:44:42Z

expanding on the details Jessy has already posted.

We will attempt to use telegraf running in a docker container to parse the nginx logs from caprover. This mirrors the current production setup in which erddap is running behind a proxy (apache) and telegraf is used to scrape the apache logs and report back on erddap usage.

TODO:

export nginx logs from caprover nginx container so other containers can access them. See the following for details:
- [Question] Can nginx logs be persistent for protection with fail2ban? caprover/caprover#880
- https://caprover.com/docs/nginx-customization.html
  In short, update nginx config, via caprover setup page, to send access logs to /nginx-shared/
mount nginx logs in telegraf container (/captain/data/nginx-shared/) and adjust log scraper settings as needed
setup plausible page and add key to telegraf container config
setup sentry project and add key to telegraf container config (for cron job like monitoring)
adjust sentry cron timeout so we don't get spammed and hate matt
profit.

https://github.com/cioos-siooc/cwatch-telegraf

steviewanders · 2024-02-15T20:48:20Z

👌🏼

…

-- Steve Vandervalk Hakai Institute

On Thu, 15 Feb 2024 at 12:44, Matthew Foster ***@***.***> wrote: expanding on the details Jessy has already posted. We will attempt to use telegraph running in a docker container to parse the nginx logs from caprover. This mirrors the current production setup in which erddap is running behind a proxy (apache) and telegraph is used to scrape the apache logs and report back on erddap usage. TODO: - export nginx logs from caprover nginx container so other containers can access them. See the following for details: - caprover/caprover#880 <caprover/caprover#880> - https://caprover.com/docs/nginx-customization.html In short, update nginx config, via caprover setup page, to send access logs to /nginx-shared/ - mount nginx logs in telegraph container (/captain/data/nginx-shared/) and adjust log scraper settings as needed - setup plausible page and add key to telegraph container config - setup sentry project and add key to telegraph container config (for cron job like monitoring) - adjust sentry cron timeout so we don't get spammed and hate matt - profit. https://github.com/cioos-siooc/cwatch-telegraph — Reply to this email directly, view it on GitHub <#172 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ARESN62WAR6CKJADNCB6LITYTZXULAVCNFSM6AAAAABDKSKI6SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBXGMYTENBQGQ> . You are receiving this because you were assigned.Message ID: ***@***.***>

JessyBarrette · 2024-02-22T14:17:49Z

Just to add to present thread. Looks like ERDDAP itself is also suggesting to use the Tomcat/Apache/(NGINX?) log to track statistics and usage.
https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#tomcatLogs

@fostermh not sure if you started developing this after reading the ERDDAP docs but this is reassuring to see :)

fostermh · 2024-02-22T16:41:09Z

I had not read the docs but yes nice to see that we independently arrived at the same solution. Hopefully we are on the correct track. :-)

steviewanders · 2024-04-05T19:04:21Z

@fostermh Thanks, I'll give this a shot on the development instance next week.

steviewanders · 2024-04-09T17:31:50Z

Other options I've used to parse NGINX - out of the box - in the past.
These need to be ruled out before a custom setup is attempted.

steviewanders · 2024-04-12T23:42:38Z

[x] https://goaccess.io/

https://github.com/HakaiInstitute/erddap-goaccess

There is an example report.html in there we will need to check against requirements.

fostermh · 2024-04-15T21:06:40Z

telegraf is scraping logs for the development version of erddap

telegraf container has been setup https://captain.erddap.hakai.app/#/apps/details/telegraf
pushing to plausible https://plausible.server.hakai.app/development.erddap.hakai.app
and is checking in with sentry https://hakai-institute.sentry.io/crons/hakai-telegraf-erddap/hakai-erddap-telegraf-checkin/

steviewanders · 2024-04-15T21:07:50Z

Amazing "scrapping".

At the next ERDDAP 2.0 meeting lets compare and contrast and figure out what next or if this is oh so done.

fostermh · 2024-04-15T21:10:26Z

haha spelling corrected. Yes would be good to contrast, sounds good.

JessyBarrette · 2024-04-16T13:57:47Z

I just had a quick look at the plausible and for some reasons the urls it refers to have a section doubled:

https://development.erddap.hakai.appdevelopment.erddap.hakai.app/erddap/tabledap/HakaiBamfieldBoL5min.htmlTable

Must be something related to the setup with plausible.

fostermh · 2024-04-16T18:18:01Z

sorted. the host_url environment variable must contain the protocol, which I had forgotten to include in the telegraf setup. Thanks for noticing.

JessyBarrette · 2024-04-17T13:36:38Z

I just ran sucessfully on development.erddap.hakai.app/erddap the cde harvester. You should now
have logged a number of different csv querries to the erddap. :)

I think we should have all we need now

steviewanders · 2024-04-17T17:04:22Z

Nice! Lets meet at the end of this week and review these two analytics quick.

steviewanders · 2024-04-22T23:45:54Z

@steviewanders Needs to add this to an analytics section of the resulting documentation & diagram but basically:
https://github.com/allinurl/goaccess works to see all requests NGINX serves, which allows us to sort between monitoring bots, our own API requests (CDE et al.), and normal users of the HTML pages
The specific setup for the above is here https://github.com/HakaiInstitute/erddap-goaccess
It relies upon a NGINX access.log being persisted outside the ERDDAP container to the EC2 file system, which could benefit from a simple backup in case of $something_bad
Need to add a weekly cron job to generate a report and dump it and the log to S3
Plausible can and has been added to the HTML template for ERDDAP so it runs as a client side Javascript tracker just GA
Combined, these two address our two usage questions around ERDDAP.

JessyBarrette assigned fostermh, JessyBarrette and steviewanders Feb 15, 2024

steviewanders unassigned JessyBarrette Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking downloads with plausible and ERDDAP #172

Tracking downloads with plausible and ERDDAP #172

JessyBarrette commented Feb 15, 2024

JessyBarrette commented Feb 15, 2024

JessyBarrette commented Feb 15, 2024

fostermh commented Feb 15, 2024 •

edited

Loading

steviewanders commented Feb 15, 2024 via email

JessyBarrette commented Feb 22, 2024

fostermh commented Feb 22, 2024

steviewanders commented Apr 5, 2024

steviewanders commented Apr 9, 2024 •

edited

Loading

steviewanders commented Apr 12, 2024

fostermh commented Apr 15, 2024 •

edited

Loading

steviewanders commented Apr 15, 2024 •

edited

Loading

fostermh commented Apr 15, 2024

JessyBarrette commented Apr 16, 2024

fostermh commented Apr 16, 2024

JessyBarrette commented Apr 17, 2024

steviewanders commented Apr 17, 2024

steviewanders commented Apr 22, 2024 •

edited

Loading

Tracking downloads with plausible and ERDDAP #172

Tracking downloads with plausible and ERDDAP #172

Comments

JessyBarrette commented Feb 15, 2024

JessyBarrette commented Feb 15, 2024

JessyBarrette commented Feb 15, 2024

fostermh commented Feb 15, 2024 • edited Loading

steviewanders commented Feb 15, 2024 via email

JessyBarrette commented Feb 22, 2024

fostermh commented Feb 22, 2024

steviewanders commented Apr 5, 2024

steviewanders commented Apr 9, 2024 • edited Loading

steviewanders commented Apr 12, 2024

fostermh commented Apr 15, 2024 • edited Loading

steviewanders commented Apr 15, 2024 • edited Loading

fostermh commented Apr 15, 2024

JessyBarrette commented Apr 16, 2024

fostermh commented Apr 16, 2024

JessyBarrette commented Apr 17, 2024

steviewanders commented Apr 17, 2024

steviewanders commented Apr 22, 2024 • edited Loading

fostermh commented Feb 15, 2024 •

edited

Loading

steviewanders commented Apr 9, 2024 •

edited

Loading

fostermh commented Apr 15, 2024 •

edited

Loading

steviewanders commented Apr 15, 2024 •

edited

Loading

steviewanders commented Apr 22, 2024 •

edited

Loading