Problem Statement
The OpenSearch SQL/PPL plugin has no integration with OpenSearch's core telemetry framework. There is no distributed tracing for query execution, making it difficult to diagnose latency issues across the parse → analyze → optimize → compile → execute → materialize pipeline. The existing profiling framework (QueryProfiling) provides wall-clock timing but is disconnected from the standard OpenSearch telemetry export pipeline (OTel SDK → OTLP → observability backends).
Goals
- P0: Add distributed tracing spans to the PPL Calcite query execution pipeline
- Follow OTel semantic conventions for database spans (Elasticsearch for
db.namespace/db.collection.name, PostgreSQL/MySQL for db.operation.name)
- Integrate with OpenSearch's
TelemetryAwarePlugin interface
- Graceful degradation via
NoopTracer when telemetry is disabled
Non-Goals
- Metrics migration (P1 — separate follow-up)
- Tracing the legacy v2 engine path (SQL currently uses v2 exclusively;
QueryService.shouldUseCalcite() gates Calcite to PPL only)
- SQL query tracing (future work when SQL migrates to Calcite)
Background
OpenSearch Telemetry Framework
OpenSearch provides a backend-agnostic telemetry framework:
libs/telemetry/ — interfaces: Tracer, Span, SpanScope, MetricsRegistry
plugins/telemetry-otel/ — OTel SDK implementation that exports via BatchSpanProcessor
- Plugins access the framework by implementing
TelemetryAwarePlugin, which provides Tracer and MetricsRegistry
The framework is gated behind opensearch.experimental.feature.telemetry.enabled (defaults to false). When disabled, all tracing operations are no-ops via NoopTracer.
PPL Query Execution Pipeline (Calcite Path)
PPL text
→ Parse (PPLSyntaxParser → UnresolvedPlan AST)
→ Analyze (CalciteRelNodeVisitor → RelNode logical plan)
→ Optimize (HepPlanner → optimized RelNode)
→ Compile (RelRunner.prepareStatement → PreparedStatement)
→ Execute (PreparedStatement.executeQuery → ResultSet)
→ Materialize (ResultSet → QueryResponse)
Design
Plugin Interface
SQLPlugin implements TelemetryAwarePlugin. OpenSearch calls two createComponents() methods separately:
TelemetryAwarePlugin.createComponents(... Tracer, MetricsRegistry) — called first when telemetry is enabled. Stores the Tracer reference. Returns empty (no components).
Plugin.createComponents(...) — called always. Creates all plugin components. The stored Tracer (real or NoopTracer default) is passed to OpenSearchPluginModule for Guice binding.
public class SQLPlugin extends Plugin
implements ActionPlugin, ScriptPlugin, SystemIndexPlugin,
JobSchedulerExtension, ExtensiblePlugin, TelemetryAwarePlugin {
private Tracer tracer = NoopTracer.INSTANCE;
// Telemetry-enabled: stores Tracer, returns empty
@Override
public Collection<Object> createComponents(
..., Tracer tracer, MetricsRegistry metricsRegistry) {
this.tracer = tracer;
return Collections.emptyList();
}
// Always called: creates all components with stored Tracer
@Override
public Collection<Object> createComponents(...) {
modules.add(new OpenSearchPluginModule(executionEngineExtensions, tracer));
// ... component creation ...
}
}
Tracer is bound via Guice @Provides @Singleton in OpenSearchPluginModule and injected into all instrumented components.
Span Hierarchy
Language-agnostic naming. Query language is an attribute (db.query.type), not part of the span name.
EXECUTE Request (7 spans)
[OpenSearch Transport Action] ← SERVER (auto, OpenSearch core)
└── opensearch.query ← CLIENT (root span)
├── opensearch.query.parse ← INTERNAL
├── opensearch.query.analyze ← INTERNAL
├── opensearch.query.optimize ← INTERNAL
├── opensearch.query.compile ← INTERNAL
├── opensearch.query.execute ← INTERNAL
│ └── transport indices:data/read/search ← auto, OpenSearch core
│ └── [phase/query] → [phase/fetch] → [phase/expand]
└── opensearch.query.materialize ← INTERNAL
EXPLAIN Request (4-5 spans)
SIMPLE mode skips compile. Non-SIMPLE modes (STANDARD, EXTENDED, COST) include compile since OpenSearchRelRunners.run() calls prepareStatement() to capture the physical plan.
opensearch.query (db.operation.name="EXPLAIN")
├── opensearch.query.parse
├── opensearch.query.analyze
├── opensearch.query.optimize
└── opensearch.query.compile ← non-SIMPLE modes only
Span Attributes
Root Span (opensearch.query)
Following OTel DB semantic conventions (Elasticsearch for cluster/index, PostgreSQL/MySQL for db.operation.name):
| Attribute |
Convention |
Value |
db.system.name |
ES semconv |
"opensearch" |
db.namespace |
ES semconv |
Cluster name |
db.operation.name |
PG/MySQL semconv |
"EXECUTE" or "EXPLAIN" |
db.query.text |
DB semconv |
Raw PPL query |
db.query.summary |
DB semconv |
Command structure (e.g., "source | where | stats") |
db.query.type |
Custom |
"ppl" |
db.query.id |
Custom |
QueryContext.getRequestId() (UUID) |
server.address |
DB semconv |
Node host address |
server.port |
DB semconv |
Node transport port |
db.query.summary is extracted by QuerySummaryExtractor, a regex-based utility that produces a low-cardinality pipe-delimited command structure suitable for grouping in observability backends.
db.query.id uses QueryContext.getRequestId() — a UUID generated at the start of doExecute(), before any query processing. It propagates via Log4j ThreadContext and is already used for log correlation.
Phase Span Attributes
Phase spans (INTERNAL) carry error and error.type on failure. Additional phase-specific attributes (e.g., opensearch.query.plan.node_count, opensearch.query.result.rows) are defined in the design spec but not yet implemented — they will be added as the instrumentation matures.
Instrumentation Points
Each component that owns a phase receives Tracer via Guice and creates its own span. Parent-child relationships propagate automatically via ThreadContextBasedTracerContextStorage.
| Span |
Component |
Method |
opensearch.query |
TransportPPLQueryAction |
doExecute() |
opensearch.query.parse |
PPLService |
plan() |
opensearch.query.analyze |
QueryService |
executeWithCalcite() |
opensearch.query.optimize |
CalciteToolsHelper.OpenSearchRelRunners |
run() |
opensearch.query.compile |
CalciteToolsHelper.OpenSearchRelRunners |
run() |
opensearch.query.execute |
OpenSearchExecutionEngine |
execute(RelNode, ...) |
opensearch.query.materialize |
OpenSearchExecutionEngine |
execute(RelNode, ...) |
Async Span Lifecycle
The execution model is asynchronous: doExecute() returns before query execution finishes. The actual work runs on the sql-worker thread pool.
Key insight: SpanScope and Span have different lifecycles.
-
SpanScope is thread-local. Opened and closed on the transport thread via try/finally. Its only job is to be active during pplService.execute() so OpenSearchQueryManager.withCurrentContext() captures the span context in ThreadContext for worker thread propagation.
-
Span is thread-safe. Created on the transport thread, ended on the worker thread in the async listener callback via AtomicBoolean guard for exactly-once semantics.
// Transport thread
Span rootSpan = tracer.startSpan(SpanCreationContext.client().name("opensearch.query")...);
SpanScope spanScope = tracer.withSpanInScope(rootSpan);
try {
ActionListener<...> tracedListener = new ActionListener<>() {
private final AtomicBoolean ended = new AtomicBoolean(false);
@Override public void onResponse(...) {
try { listener.onResponse(response); }
finally { if (ended.compareAndSet(false, true)) rootSpan.endSpan(); }
}
@Override public void onFailure(Exception e) {
try { rootSpan.setError(e); listener.onFailure(e); }
finally { if (ended.compareAndSet(false, true)) rootSpan.endSpan(); }
}
};
pplService.execute(request, tracedListener, ...);
} catch (Exception e) {
rootSpan.setError(e); rootSpan.endSpan(); listener.onFailure(e);
} finally {
spanScope.close(); // Close scope on transport thread
}
| Object |
Created On |
Closed/Ended On |
Thread-Safe? |
Span |
Transport thread |
Worker thread (async callback) |
Yes |
SpanScope |
Transport thread |
Transport thread (finally block) |
Must be same thread |
| ThreadContext snapshot |
Captured at submit time |
Restored on worker thread |
Yes (immutable) |
Error Handling
Synchronous Phases (parse, analyze, optimize, compile)
Standard try/catch/finally with span.setError(e) + re-throw + span.endSpan() in finally.
Async Boundaries (execute, materialize)
Inside client.schedule() lambdas, exceptions must route to listener.onFailure() — never re-thrown. Re-throwing bypasses the listener chain and leaks the root span.
client.schedule(() -> {
try (...) {
// ... phase work ...
listener.onResponse(response);
} catch (Throwable t) {
if (t instanceof VirtualMachineError) throw (VirtualMachineError) t;
Exception e = (t instanceof Exception) ? (Exception) t : new RuntimeException(t);
listener.onFailure(e);
}
});
Pre-existing bug fixed: OpenSearchExecutionEngine.execute(RelNode, ...) previously caught SQLException and re-threw as RuntimeException without calling listener.onFailure(). Fixed to catch Throwable, re-throw only VirtualMachineError, and route everything else to listener.onFailure().
Telemetry Control
No SQL plugin-specific toggle. Controlled entirely by OpenSearch core:
| Level |
Setting |
Default |
Effect |
| Feature Flag |
opensearch.experimental.feature.telemetry.enabled |
false |
Gates all telemetry settings |
| Tracer Feature |
telemetry.feature.tracer.enabled |
false |
Enables tracer infrastructure |
| Tracer Toggle |
telemetry.tracer.enabled |
false |
Dynamic on/off for tracing |
| Sampling |
telemetry.tracer.sampler.probability |
0.01 |
Fraction of traces exported |
When telemetry is disabled, Tracer is NoopTracer — all span operations are no-ops with near-zero overhead. No conditional checks needed in application code.
Problem Statement
The OpenSearch SQL/PPL plugin has no integration with OpenSearch's core telemetry framework. There is no distributed tracing for query execution, making it difficult to diagnose latency issues across the parse → analyze → optimize → compile → execute → materialize pipeline. The existing profiling framework (
QueryProfiling) provides wall-clock timing but is disconnected from the standard OpenSearch telemetry export pipeline (OTel SDK → OTLP → observability backends).Goals
db.namespace/db.collection.name, PostgreSQL/MySQL fordb.operation.name)TelemetryAwarePlugininterfaceNoopTracerwhen telemetry is disabledNon-Goals
QueryService.shouldUseCalcite()gates Calcite to PPL only)Background
OpenSearch Telemetry Framework
OpenSearch provides a backend-agnostic telemetry framework:
libs/telemetry/— interfaces:Tracer,Span,SpanScope,MetricsRegistryplugins/telemetry-otel/— OTel SDK implementation that exports viaBatchSpanProcessorTelemetryAwarePlugin, which providesTracerandMetricsRegistryThe framework is gated behind
opensearch.experimental.feature.telemetry.enabled(defaults tofalse). When disabled, all tracing operations are no-ops viaNoopTracer.PPL Query Execution Pipeline (Calcite Path)
Design
Plugin Interface
SQLPluginimplementsTelemetryAwarePlugin. OpenSearch calls twocreateComponents()methods separately:TelemetryAwarePlugin.createComponents(... Tracer, MetricsRegistry)— called first when telemetry is enabled. Stores theTracerreference. Returns empty (no components).Plugin.createComponents(...)— called always. Creates all plugin components. The storedTracer(real orNoopTracerdefault) is passed toOpenSearchPluginModulefor Guice binding.Traceris bound via Guice@Provides @SingletoninOpenSearchPluginModuleand injected into all instrumented components.Span Hierarchy
Language-agnostic naming. Query language is an attribute (
db.query.type), not part of the span name.EXECUTE Request (7 spans)
EXPLAIN Request (4-5 spans)
SIMPLE mode skips compile. Non-SIMPLE modes (STANDARD, EXTENDED, COST) include compile since
OpenSearchRelRunners.run()callsprepareStatement()to capture the physical plan.Span Attributes
Root Span (
opensearch.query)Following OTel DB semantic conventions (Elasticsearch for cluster/index, PostgreSQL/MySQL for
db.operation.name):db.system.name"opensearch"db.namespacedb.operation.name"EXECUTE"or"EXPLAIN"db.query.textdb.query.summary"source | where | stats")db.query.type"ppl"db.query.idQueryContext.getRequestId()(UUID)server.addressserver.portdb.query.summaryis extracted byQuerySummaryExtractor, a regex-based utility that produces a low-cardinality pipe-delimited command structure suitable for grouping in observability backends.db.query.idusesQueryContext.getRequestId()— a UUID generated at the start ofdoExecute(), before any query processing. It propagates via Log4j ThreadContext and is already used for log correlation.Phase Span Attributes
Phase spans (
INTERNAL) carryerroranderror.typeon failure. Additional phase-specific attributes (e.g.,opensearch.query.plan.node_count,opensearch.query.result.rows) are defined in the design spec but not yet implemented — they will be added as the instrumentation matures.Instrumentation Points
Each component that owns a phase receives
Tracervia Guice and creates its own span. Parent-child relationships propagate automatically viaThreadContextBasedTracerContextStorage.opensearch.queryTransportPPLQueryActiondoExecute()opensearch.query.parsePPLServiceplan()opensearch.query.analyzeQueryServiceexecuteWithCalcite()opensearch.query.optimizeCalciteToolsHelper.OpenSearchRelRunnersrun()opensearch.query.compileCalciteToolsHelper.OpenSearchRelRunnersrun()opensearch.query.executeOpenSearchExecutionEngineexecute(RelNode, ...)opensearch.query.materializeOpenSearchExecutionEngineexecute(RelNode, ...)Async Span Lifecycle
The execution model is asynchronous:
doExecute()returns before query execution finishes. The actual work runs on thesql-workerthread pool.Key insight:
SpanScopeandSpanhave different lifecycles.SpanScopeis thread-local. Opened and closed on the transport thread viatry/finally. Its only job is to be active duringpplService.execute()soOpenSearchQueryManager.withCurrentContext()captures the span context in ThreadContext for worker thread propagation.Spanis thread-safe. Created on the transport thread, ended on the worker thread in the async listener callback viaAtomicBooleanguard for exactly-once semantics.SpanSpanScopeError Handling
Synchronous Phases (parse, analyze, optimize, compile)
Standard
try/catch/finallywithspan.setError(e)+ re-throw +span.endSpan()infinally.Async Boundaries (execute, materialize)
Inside
client.schedule()lambdas, exceptions must route tolistener.onFailure()— never re-thrown. Re-throwing bypasses the listener chain and leaks the root span.Pre-existing bug fixed:
OpenSearchExecutionEngine.execute(RelNode, ...)previously caughtSQLExceptionand re-threw asRuntimeExceptionwithout callinglistener.onFailure(). Fixed to catchThrowable, re-throw onlyVirtualMachineError, and route everything else tolistener.onFailure().Telemetry Control
No SQL plugin-specific toggle. Controlled entirely by OpenSearch core:
opensearch.experimental.feature.telemetry.enabledfalsetelemetry.feature.tracer.enabledfalsetelemetry.tracer.enabledfalsetelemetry.tracer.sampler.probability0.01When telemetry is disabled,
TracerisNoopTracer— all span operations are no-ops with near-zero overhead. No conditional checks needed in application code.