Skip to content

[APM] Idea: Alternative transaction navigation for RUM #26544

@roncohen

Description

@roncohen

The RUM agent does not know about the abstract page patterns that the website it is installed on uses (/blog/:blogID). It only knows the concrete page path: /blog/10-tips-when-youre-building-your-own-airplane. Sorting in Elasticsearch #26443 will greatly improve the problems that we have with high cardinality due to concrete page names.

However, there are still cases where you could have a whole section of a website that should have a high impact, but does not show up in the top of the list because each page view is counted only once or very few times because the path names contains variables or similar. For example, it might be that you have a group of pages that is very slow /feed/:userID. Because the path contains a user ID, the transaction name will be /feed/42, /feed/43 etc. Because each user only looks at their feed a few times, it will be counted separately and it will never sum up to something significant compared to for example /blog/10-tips-when-youre-building-your-own-airplane which is a page many people will load. Another example is something like /my/search?q=every-search-is-a-snowflake

This has the effect that when the user logs in, they will not see single pages that have only been loaded once or very rarely with a high average response time because they drown out in the sea of pages that have names without IDs or parameters.

Details on why we can't get better transaction names We rely on an API call that web developers installing the agent must call on each page load to set the transaction name. We hoped that setting the default transaction name to "unknown" would make it obvious that developers need to make a conscious decision and an effort to figure out the abstract path names and use them in the API call. Setting the transaction name to the concrete path name would look correct in the UI by first glance so developers would just move on, thinking that it was installed correctly (we saw this in Opbeat).

However, developers don't necessarily have a single place where the URL structure is defined that they can just pull in and pass on to the RUM agent. Additionally, lots of users only have sporadic access to the "master" template of their website. They might be using an external consultancy to develop it etc.

So in the end, because that's the only thing that is convenient, web developers just resolve to setting the concrete page name as the transaction name.


Instead of trying to come up with better transaction names automatically or asking users to come up with complicated custom code to fix it, I suggest we change the navigation to be a path-hierarchy based navigation for RUM. This is similar to the "Content" navigation in Google Analytics. You see the top level paths first, and stats for every page that has url prefix:

1:
image
(numbers and order here are totally made up)

User then clicks "https://www.elastic.co/guide/en" and sees subpages for that with the stats for each:

2:
image
(again, numbers and order here are totally made up)

This should fix the problem of single/rate page urls not being counted/seen anywhere. It would also mean we can probably use page address as default in the RUM agent instead of asking the users to call the apm.setInitialPageLoadName(name). If this works as intended, it will make setting up RUM much much easier.

When using the hierarchy based navigation, we could also consider adding the option to go from the list to the transaction group details on a specific path prefix instead of the full URL. In other words, give the user the option between going a level deeper and going to page showing all the transaction that match the prefix:

image
(again, a totally faked screenshot)

Path hierarchical querying

query for (1)

  • the 29 number is the length of the top level filter https://www.elastic.co/guide/.
  • sum is actually avg * count, so we can use that for impact directly
GET apm-6.4.0-transaction-2018.11.23-reindex/_search?size=0
{
  "query": {
    "match": {"context.page.url.hierarchical": "https://www.elastic.co/guide"}
  },
  "aggs": {
    "txs": {
      "terms": {
         "script" : {
            "source": "def d = doc['context.page.url'].value; if (d.length() > 29) { def c = d.indexOf('/', 29); if (c>0) { return d.substring(0,c);}} return d",
            "lang": "painless",
          "order": {
            "duration_sum": "desc"
          }
        }
      },
      "aggs" : {
        "duration_avg" : { "avg": { "field" : "transaction.duration.us" } },
        "duration_sum" : { "sum": { "field" : "transaction.duration.us" } },
        "duration_p99" : { "percentiles": { "field" : "transaction.duration.us", "percents" : [99] } }
      }
    }
  }
}

query for (2)

GET apm-6.4.0-transaction-2018.11.23-reindex/_search?size=0
{
  "query": {
    "match": {"context.page.url.hierarchical": "https://www.elastic.co/guide/en"}
  },
  "aggs": {
    "txs": {
      "terms": {
         "script" : {
            "source": "def d = doc['context.page.url'].value; if (d.length() > 32) { def c = d.indexOf('/', 32); if (c>0) { return d.substring(0,c);}} return d",
            "lang": "painless",
          "order": {
            "duration_sum": "desc"
          }
        }
      },
      "aggs" : {
        "duration_avg" : { "avg": { "field" : "transaction.duration.us" } },
        "duration_sum" : { "sum": { "field" : "transaction.duration.us" } },
        "duration_p99" : { "percentiles": { "field" : "transaction.duration.us", "percents" : [99] } }
      }
    }
  }
}

These queries work by relying on the path hierarchical analyzer for context.page.url:

PUT apm-6.4.0-transaction-2018.11.23-reindex
{
  "settings": {
    "analysis": {
      "filter": {
        "url_stop": { 
          "type": "stop"
        }
      },
      "analyzer": {
        "page_hierarchy_analyzer": {
          "tokenizer": "path_hierarchy"
        }
      }
    }
  }
}

PUT apm-6.4.0-transaction-2018.11.23-reindex/doc/_mapping
{
  "properties": {
    "context.page.url": {
      "type": "keyword", 
      "fields": {
        "hierarchical": {
          "type": "text",
          "analyzer": "page_hierarchy_analyzer",
          "search_analyzer": "keyword"
        }
      }
    }
  }
}

Optimizations

We can avoid the performance hit from script based terms aggregation by trading for an increased index size. To avoid the script based term aggregation, we would instead create fields for the first 3-4 levels and store them in the index. That would allow us to avoid the scripted aggregation on the first 3-4 levels where the amount of data is the largest, and only use the scripted aggregation for levels that are deeper than those, where the amount of data that we need to aggregate over is significantly less.

Example:

{
  "context.page.url": "https://www.elastic.co",
  "context.page.url.level1": https://www.elastic.co",
  "context.page.url.level2": https://www.elastic.co/guide",
  "context.page.url.level3": https://www.elastic.co/guide/en"
}
Ingest pipeline to achieve this

This rudimentary ingest pipeline will parse the first levels. We could also imagine doing it in APM Server instead.

PUT _ingest/pipeline/levels
{
    "description": "parse levels",
    "processors": [
      {
        "script": {
          "source": """
            def s = ctx['context.page.url'];
            def i1 = s.indexOf('/', 8);
            def i2 = s.indexOf('/', i1+1);
            ctx['context.page.url-levels.level1']= s.substring(0, i1);
            ctx['context.page.url-levels.level2'] = s.substring(0, i2);
            ctx['context.page.url-levels.level3'] = s.substring(0, s.indexOf('/', i2+1));
          """
        }
      }
    ]
}

Note: this also needs a separate mapping update


For level 4 and up, we'd resort back to the scripted aggregation. This would be trading index size for speedier queries.

This query would show all sub paths to https://www.elastic.co/guide and group by the third level: https://www.elastic.co/guide/*

GET apm-6.4.0-transaction-2018.11.23-reindex/_search?size=0
{
  "query": {
    "match": {"context.page.url.hierarchical": "https://www.elastic.co/guide"}
  },
  "aggs": {
    "txs": {
      "terms": {
          "field": "context.page.url.level3",
          "order": {
            "duration_sum": "desc"
          }
        }
      },
      "aggs" : {
        "duration_avg" : { "avg": { "field" : "transaction.duration.us" } },
        "duration_sum" : { "sum": { "field" : "transaction.duration.us" } },
        "duration_p99" : { "percentiles": { "field" : "transaction.duration.us", "percents" : [99] } }
      }
    }
  }
}

It's possibly that there's an even better way to do the querying. We should investigate that if we chose to do this.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions