[Question] 运行速度很慢

hi~感谢贵组开源了自己的工作，我在 infinite-bench 尝试用 A100 80 G 复现 meta-llama/Meta-Llama-3-8B-Instruct，发现速度很慢，请问是我硬件（比如 CPU / 内存）限制，还是算法本身就不是很快呢？我在论文中并未找到关于「时间 / 空间的具体分析」，只有 "In terms of efficiency, InfLLM achieves a 34% decrease in time consumption while using only 34% of the GPU memory compared to the full-attention models".

```bash
read kv_retrieval.jsonl
Pred kv_retrieval
  2%|██▍                                       | 8/500 [07:18<7:22:35, 53.97s/it]
```

以下是相关配置（取自于仓库，并未做修改）

```yaml
model:
  type: inf-llm
  path: meta-llama/Meta-Llama-3-8B-Instruct
  block_size: 128
  fattn: false
  n_init: 128
  n_local: 4096
  topk: 16
  repr_topk: 4
  max_cached_block: 32
  exc_block_size: 512
  base: 500000
  distance_scale: 1.0

max_len: 2147483647
chunk_size: 8192
conv_type: llama-3-inst
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] 运行速度很慢 #56

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Question] 运行速度很慢 #56

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions