InferX Endpoint | Qwen2.5-Coder-1.5B-Instruct

Metadata

Name

Qwen2.5-Coder-1.5B-Instruct

Provider

Qwen

Parameter Size

1.50B

GPU Count

1

Context Length

2000

Concurrency

146.64x

Cold Start TTFT

—

Recommended Use Cases

Coding assistant, Code generation

Detailed Intro

testest test

Log In To Use This Endpoint

This public page shows the published endpoint metadata and integration shape. Log in to get a tenant-scoped endpoint URL, inference API key, and the interactive playground. Log in

Integration

Use these values in Dify, OpenWebUI, Continue, OpenCode, or any OpenAI-compatible client that asks for a base URL, API key, and model name.

API Base URL

https://dev1.inferx.net/funccall/<tenant>/endpoints/Qwen2.5-Coder-1.5B-Instruct/v1

Model Name

Qwen/Qwen2.5-Coder-1.5B-Instruct

API Key

<INFERENCE_API_KEY>

An inference API key is required for this endpoint. Until one is available, the sample request below keeps the correct request shape and uses a placeholder token.

Sample REST Call

Model Spec

{
    "image": "vllm/vllm-openai:v0.12.0",
    "commands": [
        "--model",
        "Qwen/Qwen2.5-Coder-1.5B-Instruct",
        "--trust-remote-code",
        "--gpu-memory-utilization",
        "0.95",
        "--max-model-len",
        "2000",
        "--tensor-parallel-size=1"
    ],
    "resources": {
        "GPU": {
            "Count": 1,
            "vRam": 13000
        }
    },
    "envs": [],
    "policy": {
        "Obj": {
            "min_replica": 0,
            "max_replica": 1,
            "standby_per_node": 1,
            "parallel": 50,
            "queue_len": 100,
            "queue_timeout": 30.0,
            "scalein_timeout": 1.0,
            "scaleout_policy": {
                "WaitQueueRatio": {
                    "wait_ratio": 0.1
                }
            },
            "runtime_config": {
                "graph_sync": false
            }
        }
    },
    "sample_query": {
        "apiType": "text2text",
        "raw_query": false,
        "path": "v1/completions",
        "prompt": "write a quick sort algorithm.",
        "prompts": [
            "Write a Python function that computes Fibonacci numbers. Explain time complexity.",
            "Translate the following Chinese text to English: \u4eca\u5929\u5929\u6c14\u5f88\u597d\u3002",
            "Explain general relativity in simple language.",
            "Write a legal contract clause about liability and indemnification.",
            "Summarize the plot of a fantasy novel involving dragons.",
            "Solve this calculus integral: \u222b x^3 log(x) dx",
            "Generate a JSON schema describing a user profile.",
            "Explain why emojis like \ud83d\ude00\ud83d\udd25\ud83d\ude80 represent byte-level tokens."
        ],
        "dataUrl": "",
        "max_tokens": 0,
        "body": {
            "max_tokens": "1000",
            "model": "Qwen/Qwen2.5-Coder-1.5B-Instruct",
            "stream": "true",
            "temperature": "0"
        },
        "loadingTimeout": 90
    }
}

InferX Beta — Serverless GPU Inference Platform, Built for Agent-Native Workloads

Endpoint Qwen2.5-Coder-1.5B-Instruct

Metadata

Log In To Use This Endpoint

Integration

Sample REST Call

Model Spec