Skip to content

InfLLM-v2 attention kernel #3

Description

@maxin9966

It seems that vllm does not support the InfLLM-v2 attention kernel. Could you tell me how much of a performance improvement InfLLM-v2 offers, and what the trade-offs are? For example, any performance loss? How can CPM.cu be set up to run an OpenAI-compatible API service like vllm?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions