[Feature] Add validation hardening, sort/limit restrictions, and integration tests for vectorSearch()#5324
Open
mengweieric wants to merge 8 commits intoopensearch-project:feature/vector-search-p0from
Conversation
- Enforce exactly one of k, max_distance, or min_score - Validate k is in [1, 10000] range - Add 6 tests: mutual exclusivity (3 combos), k too small, k too large, k boundary values (1 and 10000) Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
VectorSearchQueryBuilder now accepts options map and rejects pushDownLimit when LIMIT exceeds k. Radial modes (max_distance, min_score) have no LIMIT restriction. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Create VectorSearchIndexTest: 7 tests covering buildKnnQueryJson() for top-k, max_distance, min_score, nested fields, multi-element and single-element vectors, numeric option rendering - Add edge case tests to VectorSearchTableFunctionImplementationTest: NaN vector component, empty option key/value, negative k, NaN for max_distance and min_score (6 new tests) - Add VectorSearchQueryBuilderTest: min_score radial mode LIMIT, pushDownSort delegation to parent (2 new tests) - Extract buildKnnQueryJson() as package-private for direct testing Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
Test too-many (5) and zero arguments paths in VectorSearchTableFunctionResolver to complement existing too-few (2) test. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Cap radial mode (max_distance/min_score) results at maxResultWindow to prevent unbounded result sets - Reject ORDER BY on non-_score fields and _score ASC in vectorSearch since knn results are naturally sorted by _score DESC - Add 12 integration tests: 4 _explain DSL shape verification tests and 8 validation error path tests Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
- Add multi-sort expression test: ORDER BY _score DESC, name ASC correctly rejects the non-_score field (VectorSearchQueryBuilderTest) - Add case-insensitive argument name lookup test to verify TABLE='x' resolves same as table='x' (Implementation test) - Add non-numeric option fallback test: verifies string options are quoted in JSON output (VectorSearchIndexTest) - Add 4 integration tests: ORDER BY _score DESC succeeds, ORDER BY non-score rejects, ORDER BY _score ASC rejects, LIMIT within k succeeds (VectorSearchIT, now 16 tests) Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
The base OpenSearchIndexScanQueryBuilder.pushDownSort() pushes sort.getCount() as a limit when non-zero. Our override validated _score DESC and returned true, but did not preserve this contract. SQL always sets count=0, so this was not reachable today, but PPL or future callers may set a non-zero count to combine sort+limit in one LogicalSort node. Preserve the behavior defensively. Add focused test: LogicalSort(count=7) with _score DESC verifies the count is pushed down as request size. Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
08ab0c6 to
534b5f4
Compare
mengweieric
commented
Apr 9, 2026
| if (hasK) { | ||
| parseIntOption(options, "k"); | ||
| int k = Integer.parseInt(options.get("k")); | ||
| if (k < 1 || k > 10000) { |
Collaborator
Author
There was a problem hiding this comment.
- Unit test: compound AND predicate survives pushdown into bool.filter - Integration test: compound WHERE (term + range) produces bool query - Integration test: radial max_distance with WHERE produces bool query Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
5b0373c to
0e02488
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds validation hardening, query restrictions, and integration tests on top of the
core vectorSearch() execution pipeline in #5320.
What this PR adds
Validation hardening
k,max_distance,min_scorekrange validation:[1, 10000]LIMIT > krejection in top-k modeORDER BY _score DESCallowed; rejects non-_scorefields and
_score ASCmax_distance/min_scorequeries atmaxResultWindowto prevent unbounded result sets
pushDownSort() contract preservation
OpenSearchIndexScanQueryBuilder.pushDownSort()contract forsort.getCount()→ limit pushdown. SQL always setscount=0, but PPL or futurecallers may combine sort+limit in one
LogicalSortnode. Focused test verifiesLogicalSort(count=7)is correctly pushed down as request size.Test coverage
VectorSearchIndexTest(9 tests): knn JSON generation, options rendering,non-numeric option quoting, nested fields
VectorSearchIT(16 integration tests): 4_explainDSL shape verification +8 validation error tests + 3 ORDER BY restriction tests + 1 LIMIT success test
case-insensitive arg lookup, sort count preservation, radial LIMIT modes,
resolver argument count edge cases, NaN/Infinity edge cases
Benchmark results (against live OpenSearch + k-NN)
Test plan
./gradlew spotlessCheck— PASS./gradlew :opensearch:test :core:test :sql:test— PASS./gradlew :integ-test:integTest -Dtests.class="*VectorSearchIT"— 16/16 PASS