Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what about ignoring traffic from bots/cawlers #12

Open
bnomei opened this issue Mar 21, 2023 · 14 comments
Open

what about ignoring traffic from bots/cawlers #12

bnomei opened this issue Mar 21, 2023 · 14 comments

Comments

@bnomei
Copy link

bnomei commented Mar 21, 2023

No description provided.

@arnoson
Copy link
Owner

arnoson commented Mar 21, 2023

Most of the bots should be ignored, it uses both Matomo's DeviceDetector and Jaybizzle's CrawlerDetect.
I run it on my portfolio site for testing and get some obvious bots never the less. Do you have any idea on how to improve this?

@arnoson
Copy link
Owner

arnoson commented Mar 21, 2023

There is an undocumented arnoson.kirby-stats.debug option which can be enabled and will log the useragent/path to get more information on where the bot detection is failing.

@bnomei
Copy link
Author

bnomei commented Mar 22, 2023

for my pageview counter plugin i used a tracking pixel below the first render fold. not sure if you wanna go that way.

@arnoson
Copy link
Owner

arnoson commented Mar 22, 2023

Great solution, I will look into this (: I still like the simplicity of just using routes and, at least in my portofilio webiste, I have a lot of sub-pages that don't scroll at all, I have to think about how I could handle this

@grommasdietz
Copy link
Contributor

Similar to the pixel below the first render fold, there is another technique to filter bots by looking for user interaction, e.g. by using a png or svg on body:hover: https://herman.bearblog.dev/how-bear-does-analytics-with-css/

(Still needs additional style element added on each page, though)

@arnoson
Copy link
Owner

arnoson commented Dec 27, 2023

Looks great @grommasdietz! I definitely think it needs some sort of client side js/css logic for bot filtering. Maybe the first step would be to create an tracking endpoint in this plugin to test these methods.

One thing I just realized though, is that ublock origin blocks the tracking endpoint of the bearblog website. Im not sure if this is because it is included in a block-list or because of some rule based and the naming of the endpoint (including hit, ref, ...)

@arnoson
Copy link
Owner

arnoson commented Dec 27, 2023

Just checked and it is because the hit endpoint of bearblog is blocked by https://easylist.to/

bnomei added a commit to bnomei/kirby3-pageviewcounter that referenced this issue Dec 27, 2023
@arnoson
Copy link
Owner

arnoson commented Dec 28, 2023

I'm currently experimenting with an api endpoint for tracking and it seems CSS only doesn't work. This is because I don't want to hash/save any IP data and instead use the referrer to determin wether something is a visit or just a view (internal navigation inside the website). With the current route hook approach I can read the referrer, but when using an enpoint I would have to send any information I need. Right now I'm thinking about something like this as a start:

const isReload = performance.navigation.type === 1
if (!isReload) {
  const data = new FormData()
  data.append('path', location.pathname)
  data.append('referrer', document.referrer)
  navigator.sendBeacon('/stats/handle', data)
}

Additionally we could only trigger the endpoint if a certain event happens or after a timeout of say, 5sec. Goatcounters count.js might be a helpful resource.

@grommasdietz
Copy link
Contributor

grommasdietz commented Dec 28, 2023

I’m definitely not into best practices in this topic and don’t have insights as you have: While I prefer a way of handling statistics without additional images and css or js, shouldn’t it still be possible to trigger any php logic by returning the image with a simple root?

'routes' => [
  [
    'pattern' => 'statistics/(:all).svg',
    'action'  => function ($all) {
      $path = $all == '' ? option('home', 'home') : $all;
      $page = page($path);

      if (!$page) {
        return page('error');
      }

      // Handle necessary plugin logic

      $content = '<svg xmlns="http://www.w3.org/2000/svg" width="1" height="1"></svg>';

      return new Response($content, 'image/svg+xml');
    },
  ],
],

The kirby snippet could look like:

<style>
  body:hover {
    border-image: url("/statistics<?= Url::short($page->url()) ?>.svg");
  }
</style>

@arnoson
Copy link
Owner

arnoson commented Dec 29, 2023

The problem with this is that we loose the referrer and therefore can't distinguish between a view and a visit. Most analytic tools I know of use the hashed IP address instead to do this. We could send the referrer with php:

<style>
  body:hover {
    border-image: url("/statistics<?= Url::short($page->url()) ?>/<?= $_SERVER['HTTP_REFERER'] ?>.svg");
  }
</style>

but this won't work with caching. So I guess it is either sending the referrer with js oder use another method to distinguish views/visit. But maybe I'm missing something

@grommasdietz
Copy link
Contributor

Ah okay! So even when adding a random hash on each page load to the border image to avoid caching, the html still gets cached and the image/navigation won’t be recognised?

Just hopped on to the discussion after finding out about the technique used by bearblog. I’m sure you’ll find a good way to improve the plugin logic.

Thanks for your work, looking into the ideas and explaining your considerations!

@arnoson
Copy link
Owner

arnoson commented Dec 29, 2023

Yes, I meant the kirby html cache. If it is enabled the referrer part <?= $_SERVER['HTTP_REFERER'] ?> in my version of your svg example will also be html-cached and therefore a stale referrer will be sent to the route. So yeah, maybe a super simple script is the best option. This would also allow to add some additional logic to filter bots in the future. Thanks for your input and interest in this plugin :) It motivates me to continue the development now that other people want to use it too!

@arnoson
Copy link
Owner

arnoson commented Dec 29, 2023

I released a new version that provides an endpoint (/kirby-stats/hit) and a simple script to call it after user interaction.

Edit: composer/packagist didn't pick up v0.0.7 so I had to release v0.0.8 which are basically the same

@grommasdietz
Copy link
Contributor

grommasdietz commented Dec 30, 2023

Looks promising, just to get it: We have to call the scripts function each time we load a page, like on an ajax request, right? I think the removeEventListeners function has to be slightly corrected:

  const removeEventListeners = () => 
    events.forEach((e) => document.removeEventListener(e, sendStats, eventOptions))
  1. The function used addEventListener, probably a mistake?

  2. Not sure if necessary, but it’s more safe to include the eventOptions on removal as well:

    It's worth noting that some browser releases have been inconsistent on this, and unless you have specific reasons otherwise, it's probably wise to use the same values used for the call to addEventListener() when calling removeEventListener().

    MDN web docs

Created a pull request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants