Skip to content

RFC: OpenSearch SQL/PPL Telemetry Integration #5300

@penghuo

Description

@penghuo

Problem Statement

The OpenSearch SQL/PPL plugin has no integration with OpenSearch's core telemetry framework. There is no distributed tracing for query execution, making it difficult to diagnose latency issues across the parse → analyze → optimize → compile → execute → materialize pipeline. The existing profiling framework (QueryProfiling) provides wall-clock timing but is disconnected from the standard OpenSearch telemetry export pipeline (OTel SDK → OTLP → observability backends).

Goals

  • P0: Add distributed tracing spans to the PPL Calcite query execution pipeline
  • Follow OTel semantic conventions for database spans (Elasticsearch for db.namespace/db.collection.name, PostgreSQL/MySQL for db.operation.name)
  • Integrate with OpenSearch's TelemetryAwarePlugin interface
  • Graceful degradation via NoopTracer when telemetry is disabled

Non-Goals

  • Metrics migration (P1 — separate follow-up)
  • Tracing the legacy v2 engine path (SQL currently uses v2 exclusively; QueryService.shouldUseCalcite() gates Calcite to PPL only)
  • SQL query tracing (future work when SQL migrates to Calcite)

Background

OpenSearch Telemetry Framework

OpenSearch provides a backend-agnostic telemetry framework:

  • libs/telemetry/ — interfaces: Tracer, Span, SpanScope, MetricsRegistry
  • plugins/telemetry-otel/ — OTel SDK implementation that exports via BatchSpanProcessor
  • Plugins access the framework by implementing TelemetryAwarePlugin, which provides Tracer and MetricsRegistry

The framework is gated behind opensearch.experimental.feature.telemetry.enabled (defaults to false). When disabled, all tracing operations are no-ops via NoopTracer.

PPL Query Execution Pipeline (Calcite Path)

PPL text
  → Parse (PPLSyntaxParser → UnresolvedPlan AST)
  → Analyze (CalciteRelNodeVisitor → RelNode logical plan)
  → Optimize (HepPlanner → optimized RelNode)
  → Compile (RelRunner.prepareStatement → PreparedStatement)
  → Execute (PreparedStatement.executeQuery → ResultSet)
  → Materialize (ResultSet → QueryResponse)

Design

Plugin Interface

SQLPlugin implements TelemetryAwarePlugin. OpenSearch calls two createComponents() methods separately:

  1. TelemetryAwarePlugin.createComponents(... Tracer, MetricsRegistry) — called first when telemetry is enabled. Stores the Tracer reference. Returns empty (no components).
  2. Plugin.createComponents(...) — called always. Creates all plugin components. The stored Tracer (real or NoopTracer default) is passed to OpenSearchPluginModule for Guice binding.
public class SQLPlugin extends Plugin
    implements ActionPlugin, ScriptPlugin, SystemIndexPlugin,
               JobSchedulerExtension, ExtensiblePlugin, TelemetryAwarePlugin {

    private Tracer tracer = NoopTracer.INSTANCE;

    // Telemetry-enabled: stores Tracer, returns empty
    @Override
    public Collection<Object> createComponents(
        ..., Tracer tracer, MetricsRegistry metricsRegistry) {
        this.tracer = tracer;
        return Collections.emptyList();
    }

    // Always called: creates all components with stored Tracer
    @Override
    public Collection<Object> createComponents(...) {
        modules.add(new OpenSearchPluginModule(executionEngineExtensions, tracer));
        // ... component creation ...
    }
}

Tracer is bound via Guice @Provides @Singleton in OpenSearchPluginModule and injected into all instrumented components.

Span Hierarchy

Language-agnostic naming. Query language is an attribute (db.query.type), not part of the span name.

EXECUTE Request (7 spans)

[OpenSearch Transport Action]                 ← SERVER (auto, OpenSearch core)
  └── opensearch.query                        ← CLIENT (root span)
        ├── opensearch.query.parse            ← INTERNAL
        ├── opensearch.query.analyze          ← INTERNAL
        ├── opensearch.query.optimize         ← INTERNAL
        ├── opensearch.query.compile          ← INTERNAL
        ├── opensearch.query.execute          ← INTERNAL
        │     └── transport indices:data/read/search  ← auto, OpenSearch core
        │           └── [phase/query] → [phase/fetch] → [phase/expand]
        └── opensearch.query.materialize      ← INTERNAL

EXPLAIN Request (4-5 spans)

SIMPLE mode skips compile. Non-SIMPLE modes (STANDARD, EXTENDED, COST) include compile since OpenSearchRelRunners.run() calls prepareStatement() to capture the physical plan.

opensearch.query (db.operation.name="EXPLAIN")
  ├── opensearch.query.parse
  ├── opensearch.query.analyze
  ├── opensearch.query.optimize
  └── opensearch.query.compile              ← non-SIMPLE modes only

Span Attributes

Root Span (opensearch.query)

Following OTel DB semantic conventions (Elasticsearch for cluster/index, PostgreSQL/MySQL for db.operation.name):

Attribute Convention Value
db.system.name ES semconv "opensearch"
db.namespace ES semconv Cluster name
db.operation.name PG/MySQL semconv "EXECUTE" or "EXPLAIN"
db.query.text DB semconv Raw PPL query
db.query.summary DB semconv Command structure (e.g., "source | where | stats")
db.query.type Custom "ppl"
db.query.id Custom QueryContext.getRequestId() (UUID)
server.address DB semconv Node host address
server.port DB semconv Node transport port

db.query.summary is extracted by QuerySummaryExtractor, a regex-based utility that produces a low-cardinality pipe-delimited command structure suitable for grouping in observability backends.

db.query.id uses QueryContext.getRequestId() — a UUID generated at the start of doExecute(), before any query processing. It propagates via Log4j ThreadContext and is already used for log correlation.

Phase Span Attributes

Phase spans (INTERNAL) carry error and error.type on failure. Additional phase-specific attributes (e.g., opensearch.query.plan.node_count, opensearch.query.result.rows) are defined in the design spec but not yet implemented — they will be added as the instrumentation matures.

Instrumentation Points

Each component that owns a phase receives Tracer via Guice and creates its own span. Parent-child relationships propagate automatically via ThreadContextBasedTracerContextStorage.

Span Component Method
opensearch.query TransportPPLQueryAction doExecute()
opensearch.query.parse PPLService plan()
opensearch.query.analyze QueryService executeWithCalcite()
opensearch.query.optimize CalciteToolsHelper.OpenSearchRelRunners run()
opensearch.query.compile CalciteToolsHelper.OpenSearchRelRunners run()
opensearch.query.execute OpenSearchExecutionEngine execute(RelNode, ...)
opensearch.query.materialize OpenSearchExecutionEngine execute(RelNode, ...)

Async Span Lifecycle

The execution model is asynchronous: doExecute() returns before query execution finishes. The actual work runs on the sql-worker thread pool.

Key insight: SpanScope and Span have different lifecycles.

  • SpanScope is thread-local. Opened and closed on the transport thread via try/finally. Its only job is to be active during pplService.execute() so OpenSearchQueryManager.withCurrentContext() captures the span context in ThreadContext for worker thread propagation.

  • Span is thread-safe. Created on the transport thread, ended on the worker thread in the async listener callback via AtomicBoolean guard for exactly-once semantics.

// Transport thread
Span rootSpan = tracer.startSpan(SpanCreationContext.client().name("opensearch.query")...);
SpanScope spanScope = tracer.withSpanInScope(rootSpan);

try {
    ActionListener<...> tracedListener = new ActionListener<>() {
        private final AtomicBoolean ended = new AtomicBoolean(false);

        @Override public void onResponse(...) {
            try { listener.onResponse(response); }
            finally { if (ended.compareAndSet(false, true)) rootSpan.endSpan(); }
        }

        @Override public void onFailure(Exception e) {
            try { rootSpan.setError(e); listener.onFailure(e); }
            finally { if (ended.compareAndSet(false, true)) rootSpan.endSpan(); }
        }
    };

    pplService.execute(request, tracedListener, ...);
} catch (Exception e) {
    rootSpan.setError(e); rootSpan.endSpan(); listener.onFailure(e);
} finally {
    spanScope.close(); // Close scope on transport thread
}
Object Created On Closed/Ended On Thread-Safe?
Span Transport thread Worker thread (async callback) Yes
SpanScope Transport thread Transport thread (finally block) Must be same thread
ThreadContext snapshot Captured at submit time Restored on worker thread Yes (immutable)

Error Handling

Synchronous Phases (parse, analyze, optimize, compile)

Standard try/catch/finally with span.setError(e) + re-throw + span.endSpan() in finally.

Async Boundaries (execute, materialize)

Inside client.schedule() lambdas, exceptions must route to listener.onFailure() — never re-thrown. Re-throwing bypasses the listener chain and leaks the root span.

client.schedule(() -> {
    try (...) {
        // ... phase work ...
        listener.onResponse(response);
    } catch (Throwable t) {
        if (t instanceof VirtualMachineError) throw (VirtualMachineError) t;
        Exception e = (t instanceof Exception) ? (Exception) t : new RuntimeException(t);
        listener.onFailure(e);
    }
});

Pre-existing bug fixed: OpenSearchExecutionEngine.execute(RelNode, ...) previously caught SQLException and re-threw as RuntimeException without calling listener.onFailure(). Fixed to catch Throwable, re-throw only VirtualMachineError, and route everything else to listener.onFailure().

Telemetry Control

No SQL plugin-specific toggle. Controlled entirely by OpenSearch core:

Level Setting Default Effect
Feature Flag opensearch.experimental.feature.telemetry.enabled false Gates all telemetry settings
Tracer Feature telemetry.feature.tracer.enabled false Enables tracer infrastructure
Tracer Toggle telemetry.tracer.enabled false Dynamic on/off for tracing
Sampling telemetry.tracer.sampler.probability 0.01 Fraction of traces exported

When telemetry is disabled, Tracer is NoopTracer — all span operations are no-ops with near-zero overhead. No conditional checks needed in application code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCRequest For CommentsenhancementNew feature or request

    Type

    No type

    Projects

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions