Skip to content

Monitor Data Pipeline Step

Ivan Zhang edited this page Nov 1, 2023 · 1 revision

It may be useful to track each step of the pipeline so that you know each step is running as expected. Panda Patrol will store the start time, end time, and status of each step in a database. This gives you a high-level overview of your pipeline so you can know exactly where to look if something goes wrong. Adding a monitor requires just one line of code.

Usage

To monitor a node in your data pipeline, add the monitor context manager at the point when you want to start monitoring. This context manager takes the following parameters:

  • group_name: str - The patrol group related to this monitor
# Import the monitor method
from panda_patrol.patrols import monitor
...
def data_pipeline_step():
    # Do something before monitoring
    ...
    # Start monitoring
    with monitor(group_name="data_pipeline"):
        # Do something
        ...

This will create a patrol called Run Status that will monitor the status of this data pipeline step. This status will be set to success when the monitor is finished. If an exception is raised, the status will be set to failure and the exception will be logged. The start time and end time will also be logged.

Clone this wiki locally