Skip to content

[Feature] Add validation hardening, sort/limit restrictions, and integration tests for vectorSearch()#5324

Open
mengweieric wants to merge 8 commits intoopensearch-project:feature/vector-search-p0from
mengweieric:feature/vector-search-p0-hardening
Open

[Feature] Add validation hardening, sort/limit restrictions, and integration tests for vectorSearch()#5324
mengweieric wants to merge 8 commits intoopensearch-project:feature/vector-search-p0from
mengweieric:feature/vector-search-p0-hardening

Conversation

@mengweieric
Copy link
Copy Markdown
Collaborator

@mengweieric mengweieric commented Apr 8, 2026

Summary

Adds validation hardening, query restrictions, and integration tests on top of the
core vectorSearch() execution pipeline in #5320.

What this PR adds

Validation hardening

  • Option mutual exclusivity: exactly one of k, max_distance, min_score
  • k range validation: [1, 10000]
  • LIMIT > k rejection in top-k mode
  • Sort restriction: only ORDER BY _score DESC allowed; rejects non-_score
    fields and _score ASC
  • Radial size policy: caps max_distance/min_score queries at maxResultWindow
    to prevent unbounded result sets

pushDownSort() contract preservation

  • Preserves base OpenSearchIndexScanQueryBuilder.pushDownSort() contract for
    sort.getCount() → limit pushdown. SQL always sets count=0, but PPL or future
    callers may combine sort+limit in one LogicalSort node. Focused test verifies
    LogicalSort(count=7) is correctly pushed down as request size.

Test coverage

  • VectorSearchIndexTest (9 tests): knn JSON generation, options rendering,
    non-numeric option quoting, nested fields
  • VectorSearchIT (16 integration tests): 4 _explain DSL shape verification +
    8 validation error tests + 3 ORDER BY restriction tests + 1 LIMIT success test
  • Additional unit tests across existing test files: multi-sort rejection,
    case-insensitive arg lookup, sort count preservation, radial LIMIT modes,
    resolver argument count edge cases, NaN/Infinity edge cases

Benchmark results (against live OpenSearch + k-NN)

  • Semantic correctness: 10/10 on all 8 test cases (top-k, radial, post-filter, hotels)
  • SQL overhead: +1.4ms to +2.6ms p50 vs native DSL

Test plan

  • ./gradlew spotlessCheck — PASS
  • ./gradlew :opensearch:test :core:test :sql:test — PASS
  • ./gradlew :integ-test:integTest -Dtests.class="*VectorSearchIT" — 16/16 PASS
  • Correctness benchmark: 10/10 semantic match across A1, A2, A4, A5, A6a, A6b, A7-tight, A8

@mengweieric mengweieric added SQL feature skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. skip-diff-reviewer Maintainer to skip code-diff-reviewer check, after reviewing issues in AI analysis. labels Apr 8, 2026
@mengweieric mengweieric changed the title Add vectorSearch execution pipeline, validation hardening, and integration tests [Feature] Add vectorSearch execution pipeline, validation hardening, and integration tests Apr 8, 2026
@mengweieric mengweieric changed the title [Feature] Add vectorSearch execution pipeline, validation hardening, and integration tests [Feature] Add vectorSearch validation hardening, and integration tests Apr 8, 2026
@mengweieric mengweieric changed the title [Feature] Add vectorSearch validation hardening, and integration tests Add validation hardening, sort/limit restrictions, and integration tests for vectorSearch() Apr 8, 2026
@mengweieric mengweieric changed the title Add validation hardening, sort/limit restrictions, and integration tests for vectorSearch() [Feature] Add validation hardening, sort/limit restrictions, and integration tests for vectorSearch() Apr 8, 2026
- Enforce exactly one of k, max_distance, or min_score
- Validate k is in [1, 10000] range
- Add 6 tests: mutual exclusivity (3 combos), k too small, k too
  large, k boundary values (1 and 10000)

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
VectorSearchQueryBuilder now accepts options map and rejects
pushDownLimit when LIMIT exceeds k. Radial modes (max_distance,
min_score) have no LIMIT restriction.

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Create VectorSearchIndexTest: 7 tests covering buildKnnQueryJson()
  for top-k, max_distance, min_score, nested fields, multi-element
  and single-element vectors, numeric option rendering
- Add edge case tests to VectorSearchTableFunctionImplementationTest:
  NaN vector component, empty option key/value, negative k, NaN for
  max_distance and min_score (6 new tests)
- Add VectorSearchQueryBuilderTest: min_score radial mode LIMIT,
  pushDownSort delegation to parent (2 new tests)
- Extract buildKnnQueryJson() as package-private for direct testing

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Test too-many (5) and zero arguments paths in
VectorSearchTableFunctionResolver to complement existing
too-few (2) test.

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Cap radial mode (max_distance/min_score) results at maxResultWindow
  to prevent unbounded result sets
- Reject ORDER BY on non-_score fields and _score ASC in vectorSearch
  since knn results are naturally sorted by _score DESC
- Add 12 integration tests: 4 _explain DSL shape verification tests
  and 8 validation error path tests

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Add multi-sort expression test: ORDER BY _score DESC, name ASC
  correctly rejects the non-_score field (VectorSearchQueryBuilderTest)
- Add case-insensitive argument name lookup test to verify
  TABLE='x' resolves same as table='x' (Implementation test)
- Add non-numeric option fallback test: verifies string options
  are quoted in JSON output (VectorSearchIndexTest)
- Add 4 integration tests: ORDER BY _score DESC succeeds,
  ORDER BY non-score rejects, ORDER BY _score ASC rejects,
  LIMIT within k succeeds (VectorSearchIT, now 16 tests)

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
The base OpenSearchIndexScanQueryBuilder.pushDownSort() pushes
sort.getCount() as a limit when non-zero. Our override validated
_score DESC and returned true, but did not preserve this contract.

SQL always sets count=0, so this was not reachable today, but PPL
or future callers may set a non-zero count to combine sort+limit
in one LogicalSort node. Preserve the behavior defensively.

Add focused test: LogicalSort(count=7) with _score DESC verifies
the count is pushed down as request size.

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
@mengweieric mengweieric force-pushed the feature/vector-search-p0-hardening branch from 08ab0c6 to 534b5f4 Compare April 9, 2026 18:46
if (hasK) {
parseIntOption(options, "k");
int k = Integer.parseInt(options.get("k"));
if (k < 1 || k > 10000) {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- Unit test: compound AND predicate survives pushdown into bool.filter
- Integration test: compound WHERE (term + range) produces bool query
- Integration test: radial max_distance with WHERE produces bool query

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
@mengweieric mengweieric force-pushed the feature/vector-search-p0-hardening branch from 5b0373c to 0e02488 Compare April 9, 2026 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. skip-diff-reviewer Maintainer to skip code-diff-reviewer check, after reviewing issues in AI analysis. SQL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant