Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add exclude lists for flask integration #630

Merged
merged 14 commits into from
May 3, 2020
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions ext/opentelemetry-ext-flask/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

## Unreleased

- Add exclude list for paths and hosts
([#630](https://github.com/open-telemetry/opentelemetry-python/pull/630))

## 0.6b0

Released 2020-03-30
Expand Down
10 changes: 10 additions & 0 deletions ext/opentelemetry-ext-flask/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,16 @@ Installation

pip install opentelemetry-ext-flask

Configuration
-------------

Exclude lists
*************
Excludes certain hosts and paths from being tracked. Pass in comma delimited string into environment variables.

Excluded hosts: OPENTELEMETRY_PYTHON_FLASK_EXCLUDED_HOSTS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be explained here that the values of these environment variables are regexes that should match with everything but the r"(https?|ftp)://.*?/" (are there any other protocols besides http, https and ftp that we want to take into consideration?) part of the URL the user wants excluded from tracing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've included an explanation of what hosts and paths means. I don't think we need to inform the user of how the implementation is actually doing this. As for other protocols, these are the ones being excluded in OpenCensus, any new protocols I feel can be added later on.

Excluded paths: OPENTELEMETRY_PYTHON_FLASK_EXCLUDED_PATHS


References
----------
Expand Down
51 changes: 38 additions & 13 deletions ext/opentelemetry-ext-flask/src/opentelemetry/ext/flask/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,10 +51,14 @@ def hello():
import flask

import opentelemetry.ext.wsgi as otel_wsgi
from opentelemetry import context, propagators, trace
from opentelemetry import configuration, context, propagators, trace
from opentelemetry.auto_instrumentation.instrumentor import BaseInstrumentor
from opentelemetry.ext.flask.version import __version__
from opentelemetry.util import time_ns
from opentelemetry.util import (
disable_tracing_hostname,
disable_tracing_path,
time_ns,
)

logger = logging.getLogger(__name__)

Expand All @@ -80,17 +84,18 @@ def wrapped_app(environ, start_response):
environ[_ENVIRON_STARTTIME_KEY] = time_ns()

def _start_response(status, response_headers, *args, **kwargs):
span = flask.request.environ.get(_ENVIRON_SPAN_KEY)
if span:
otel_wsgi.add_response_attributes(
span, status, response_headers
)
else:
logger.warning(
"Flask environ's OpenTelemetry span "
"missing at _start_response(%s)",
status,
)
if not _disable_trace(flask.request.url):
span = flask.request.environ.get(_ENVIRON_SPAN_KEY)
if span:
otel_wsgi.add_response_attributes(
span, status, response_headers
)
else:
logger.warning(
"Flask environ's OpenTelemetry span "
"missing at _start_response(%s)",
status,
)

return start_response(
status, response_headers, *args, **kwargs
Expand All @@ -102,6 +107,9 @@ def _start_response(status, response_headers, *args, **kwargs):

@self.before_request
def _before_flask_request():
# Do not trace if the url is excluded
if _disable_trace(flask.request.url):
return
environ = flask.request.environ
span_name = (
flask.request.endpoint
Expand Down Expand Up @@ -132,6 +140,9 @@ def _before_flask_request():

@self.teardown_request
def _teardown_flask_request(exc):
# Not traced if the url is excluded
if _disable_trace(flask.request.url):
return
activation = flask.request.environ.get(_ENVIRON_ACTIVATION_KEY)
if not activation:
logger.warning(
Expand All @@ -150,6 +161,20 @@ def _teardown_flask_request(exc):
context.detach(flask.request.environ.get(_ENVIRON_TOKEN))


def _disable_trace(url):
excluded_hosts = configuration.Configuration().FLASK_EXCLUDED_HOSTS
excluded_paths = configuration.Configuration().FLASK_EXCLUDED_PATHS
Comment on lines +165 to +166
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this separation necessary? I think we can just define a sequence of regular expressions in one environment variable and if the hostname or path matches any regular expression it is rejected. This will be useful for the end user who would not need to repeat a lot of related paths in the environment variable, but use a single regex that matches them all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry a little bit about that, as it makes it hard if you happen to have a host or path that share the same name.

Horrible example, but what about the following:

host: localhost
path: check_if_localhost

and you only want to filter for the localhost host.

if excluded_hosts:
excluded_hosts = str.split(excluded_hosts, ",")
if disable_tracing_hostname(url, excluded_hosts):
return True
if excluded_paths:
excluded_paths = str.split(excluded_paths, ",")
if disable_tracing_path(url, excluded_paths):
return True
return False


class FlaskInstrumentor(BaseInstrumentor):
"""A instrumentor for flask.Flask

Expand Down
37 changes: 33 additions & 4 deletions ext/opentelemetry-ext-flask/tests/test_flask_integration.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,14 @@
# limitations under the License.

import unittest
from unittest.mock import patch

from flask import Flask, request
from werkzeug.test import Client
from werkzeug.wrappers import BaseResponse

from opentelemetry import trace as trace_api
from opentelemetry.configuration import Configuration
from opentelemetry.test.wsgitestutil import WsgiTestBase


Expand All @@ -45,18 +47,31 @@ def setUp(self):
# No instrumentation code is here because it is present in the
# conftest.py file next to this file.
super().setUp()

Configuration._instance = None # pylint:disable=protected-access
lzchen marked this conversation as resolved.
Show resolved Hide resolved
Configuration.__slots__ = []
self.app = Flask(__name__)

def hello_endpoint(helloid):
if helloid == 500:
raise ValueError(":-(")
return "Hello: " + str(helloid)

def excluded_endpoint():
return "excluded"

def excluded2_endpoint():
return "excluded2"

self.app.route("/hello/<int:helloid>")(hello_endpoint)
self.app.route("/excluded")(excluded_endpoint)
self.app.route("/excluded2")(excluded2_endpoint)

self.client = Client(self.app, BaseResponse)

def tearDown(self):
Configuration._instance = None # pylint:disable=protected-access
Configuration.__slots__ = []

def test_only_strings_in_environ(self):
"""
Some WSGI servers (such as Gunicorn) expect keys in the environ object
Expand All @@ -80,9 +95,8 @@ def test_simple(self):
expected_attrs = expected_attributes(
{"http.target": "/hello/123", "http.route": "/hello/<int:helloid>"}
)
resp = self.client.get("/hello/123")
self.assertEqual(200, resp.status_code)
self.assertEqual([b"Hello: 123"], list(resp.response))
self.client.get("/hello/123")

span_list = self.memory_exporter.get_finished_spans()
self.assertEqual(len(span_list), 1)
self.assertEqual(span_list[0].name, "hello_endpoint")
Expand Down Expand Up @@ -126,6 +140,21 @@ def test_internal_error(self):
self.assertEqual(span_list[0].kind, trace_api.SpanKind.SERVER)
self.assertEqual(span_list[0].attributes, expected_attrs)

@patch.dict(
"os.environ", # type: ignore
{
"OPENTELEMETRY_PYTHON_FLASK_EXCLUDED_HOSTS": "http://localhost/excluded",
"OPENTELEMETRY_PYTHON_FLASK_EXCLUDED_PATHS": "excluded2",
},
)
def test_excluded_path(self):
self.client.get("/hello/123")
self.client.get("/excluded")
self.client.get("/excluded2")
span_list = self.memory_exporter.get_finished_spans()
self.assertEqual(len(span_list), 1)
self.assertEqual(span_list[0].name, "hello_endpoint")


if __name__ == "__main__":
unittest.main()
36 changes: 29 additions & 7 deletions opentelemetry-api/src/opentelemetry/util/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,10 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import re
import time
from logging import getLogger
from typing import Union
from typing import Sequence, Union

from pkg_resources import iter_entry_points

Expand All @@ -33,18 +34,39 @@ def time_ns() -> int:
return int(time.time() * 1e9)


def _load_provider(provider: str) -> Union["TracerProvider", "MeterProvider"]: # type: ignore
def _load_provider(
provider: str,
) -> Union["TracerProvider", "MeterProvider"]: # type: ignore
try:
return next( # type: ignore
iter_entry_points(
"opentelemetry_{}".format(provider),
name=getattr( # type: ignore
Configuration(), provider, "default_{}".format(provider), # type: ignore
name=getattr(
Configuration(), # type: ignore
provider,
"default_{}".format(provider),
),
)
).load()()
except Exception: # pylint: disable=broad-except
logger.error(
"Failed to load configured provider %s", provider,
)
logger.error("Failed to load configured provider %s", provider)
raise


# Pattern for matching up until the first '/' after the 'https://' part.
URL_PATTERN = "^(https?|ftp):\\/\\/.*\\/"
lzchen marked this conversation as resolved.
Show resolved Hide resolved


def disable_tracing_path(url: str, excluded_paths: Sequence[str]) -> bool:
lzchen marked this conversation as resolved.
Show resolved Hide resolved
if excluded_paths:
# Match only the part after the first '/' that is not in URL_PATTERN
regex = "{}({})".format(URL_PATTERN, "|".join(excluded_paths))
if re.match(regex, url):
return True
return False


def disable_tracing_hostname(
url: str, excluded_hostnames: Sequence[str]
) -> bool:
return url in excluded_hostnames