Expected Behavior
It is supposed to build an index and run a search benchmark test on my sift1M dataset
Actual Behavior
Getting the response
######################
# Running Job 1 of 1 #
######################
Disk Index Build
Data Type: float32
Data File: /home/deniz/dev/thesis/datasets/sift/sift_base.fvecs
Distance: squared_l2
Dim: 128
Max Degree: 70
L Build: 125
Build Threads: 32
Build RAM Limit GB: 48
PQ Chunks: 33
Quantization: pq, chunks 32
Save Path: data.index
Error: Dimension must be greater than zero
This doesn't make any sense as the dimension field is visibly 128
Example Code
create a run.json file as such
{
"search_directories": [
"<path to the directory with sift1M dataset>"
],
"jobs": [
{
"type": "disk-index",
"content": {
"source": {
"disk-index-source": "Build",
"dim": 128,
"data_type": "float32",
"data": "sift_base.fvecs",
"distance": "squared_l2",
"max_degree": 70,
"alpha": 2,
"l_build": 125,
"num_threads": 32,
"build_ram_limit_gb": 48,
"quantization_type": "PQ_32",
"num_pq_chunks": 32,
"save_path": "data.index"
},
"search_phase": {
"queries": "sift_query.fvecs",
"groundtruth": "sift_groundtruth.ivecs",
"search_list": [10, 20, 40],
"beam_width": 4,
"recall_at": 10,
"num_threads": 1,
"is_flat_search": false,
"distance": "squared_l2",
"vector_filters_file": null
}
}
}
]
}
and run cargo run --release --package diskann-benchmark --features "disk-index product-quantization" -- run --input-file ./run.json --output-file output.json
Dataset Description
Please tell us about the shape and datatype of your data, (e.g. 128 dimensions, 12.3 billion points, floats)
- Dimensions: 128
- Number of Points: 1M
- Data type: float32
Error
######################
# Running Job 1 of 1 #
######################
Disk Index Build
Data Type: float32
Data File: $$REDACTED$$/datasets/sift/sift_base.fvecs
Distance: squared_l2
Dim: 128
Max Degree: 70
L Build: 125
Build Threads: 32
Build RAM Limit GB: 48
PQ Chunks: 33
Quantization: pq, chunks 32
Save Path: data.index
Error: Dimension must be greater than zero
Your Environment
- Fedora 42 Work Station
- commit: 779977c
Additional Details
I believe the program is trying to read the dimension value of from the metadata of my vector data file at diskann-providers/src/utils/file_util.rs:32 but for some reason it reads the dimension as 0. The base vector file I used can be downloaded from: ftp://ftp.irisa.fr/local/texmex/corpus/sift.tar.gz
Expected Behavior
It is supposed to build an index and run a search benchmark test on my sift1M dataset
Actual Behavior
Getting the response
This doesn't make any sense as the dimension field is visibly 128
Example Code
create a run.json file as such
and run
cargo run --release --package diskann-benchmark --features "disk-index product-quantization" -- run --input-file ./run.json --output-file output.jsonDataset Description
Please tell us about the shape and datatype of your data, (e.g. 128 dimensions, 12.3 billion points, floats)
Error
Your Environment
Additional Details
I believe the program is trying to read the dimension value of from the metadata of my vector data file at
diskann-providers/src/utils/file_util.rs:32but for some reason it reads the dimension as 0. The base vector file I used can be downloaded from: ftp://ftp.irisa.fr/local/texmex/corpus/sift.tar.gz