Observability and telemetry tools - OpenSRE Documentation

Observability and telemetry tools collect, store, and visualize metrics, logs, and traces reported by systems and applications. They are effective at showing what was reported over time and at supporting alerting and dashboards across many environments. Tracer complements these tools by observing what actually executes at runtime. It captures execution behavior directly from the host and container runtime and organizes this data by pipeline, run, task, and tool.

If you’re new to Tracer or want a conceptual overview, see How Tracer fits in your stack.

What observability tools do well

Observability and telemetry platforms are designed to:

Collect metrics emitted by applications and system exporters
Store and query time-series data
Visualize metrics, logs, and traces in dashboards
Trigger alerts based on thresholds or rules

They are typically system-centric or service-centric, and rely on reported telemetry.

Where reported telemetry stops

Because observability and telemetry tools depend on emitted signals and periodic collection, they often lack visibility into:

Short-lived processes and subprocesses that start and finish between scrapes
Execution behavior between metric intervals, including stalls and idle time
Resource contention inside containers or tasks, such as I/O blocking or memory pressure
How reported metrics map to specific pipeline runs, steps, or tools
How infrastructure cost relates to actual execution, rather than to hosts, services, or time windows

As a result, cost is typically inferred from aggregated usage metrics (for example, instance-hours or container uptime), rather than attributed to the execution units that caused the spend. This makes it difficult to answer questions such as which pipeline step drove a cost increase, whether resources were idle during execution, or which tools are responsible for sustained spend. These questions require execution-aware attribution, not just reported metrics. As a result, conclusions about performance bottlenecks or cost drivers may be incomplete or misleading, particularly in pipelines with heterogeneous tools, nested execution, or variable workloads. These limitations are a direct consequence of relying on reported telemetry rather than observing execution itself. Execution-level observation captures what actually runs on the system:

When processes are scheduled
When they block on I/O
How memory is allocated and reclaimed
How long resources are consumed by each execution unit

This removes the need to infer execution behavior from indirect signals and allows performance and cost to be attributed to observed runtime activity, rather than to proxies.

What Tracer adds

Tracer observes execution directly from the operating system and runtime. When used alongside observability and telemetry tools, it adds:

Execution-level visibility for pipelines, runs, tasks, and tools
Observed CPU, memory, disk, and network behavior
Insight into stalls, idle execution, and contention
Resource usage and cost attribution aligned with actual work

Tracer does not replace metrics backends or dashboards. It adds execution context that reported telemetry alone cannot provide.

How Tracer works with current observability and telemetry tools

The pages below describe how Tracer works alongside common observability platforms used in scientific, data, HPC, and cloud environments. Select a tool to see how Tracer adds additional observability.

Tracer and Datadog

Pipeline-level insight within broad observabilityTracer organizes execution behavior around pipelines instead of services or hosts.

Tracer and Grafana

Execution-aware dashboards without manual wiringTracer provides pipeline-aware views that reduce the need to infer execution behavior from generic dashboards.

Tracer and Prometheus

Observed execution versus scraped metricsTracer captures runtime behavior that may not appear in scraped or aggregated metrics.

When Tracer is useful with observability tools

Tracer is most useful alongside observability and telemetry platforms when teams need to:

Understand pipeline behavior beyond reported metrics
Diagnose performance issues involving short-lived tasks
Attribute resource usage and cost to specific workflows or tools
Reduce manual dashboard configuration and metric correlation

Tracer focuses on execution behavior. Observability tools continue to provide metric storage, dashboards, and alerting across broader systems.

Where to go next

How Tracer fits in your stack – conceptual overview
Individual integration pages – tool-specific execution gaps and observability comparisons

Documentation Index

​What observability tools do well

​Where reported telemetry stops

​What Tracer adds

​How Tracer works with current observability and telemetry tools

Tracer and Datadog

Tracer and Grafana

Tracer and Prometheus

​When Tracer is useful with observability tools

​Where to go next

What observability tools do well

Where reported telemetry stops

What Tracer adds

How Tracer works with current observability and telemetry tools

When Tracer is useful with observability tools

Where to go next