Skip to content

[Question] 运行速度很慢 #56

@sherlcok314159

Description

@sherlcok314159

hi~感谢贵组开源了自己的工作,我在 infinite-bench 尝试用 A100 80 G 复现 meta-llama/Meta-Llama-3-8B-Instruct,发现速度很慢,请问是我硬件(比如 CPU / 内存)限制,还是算法本身就不是很快呢?我在论文中并未找到关于「时间 / 空间的具体分析」,只有 "In terms of efficiency, InfLLM achieves a 34% decrease in time consumption while using only 34% of the GPU memory compared to the full-attention models".

read kv_retrieval.jsonl
Pred kv_retrieval
  2%|██▍                                       | 8/500 [07:18<7:22:35, 53.97s/it]

以下是相关配置(取自于仓库,并未做修改)

model:
  type: inf-llm
  path: meta-llama/Meta-Llama-3-8B-Instruct
  block_size: 128
  fattn: false
  n_init: 128
  n_local: 4096
  topk: 16
  repr_topk: 4
  max_cached_block: 32
  exc_block_size: 512
  base: 500000
  distance_scale: 1.0

max_len: 2147483647
chunk_size: 8192
conv_type: llama-3-inst

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions