Documentation | Deployments

API Reference

Resources

Predictions

Models

Deployments

Methods

client.deployments.list():

CursorURLPage

DeploymentListResponse

get/deployments

Get a list of deployments associated with the current account, including the latest release configuration for each deployment.

Example cURL request:

curl -s \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  https://api.replicate.com/v1/deployments

The response will be a paginated JSON array of deployment objects, sorted with the most recent deployment first:

{
  "next": "http://api.replicate.com/v1/deployments?cursor=cD0yMDIzLTA2LTA2KzIzJTNBNDAlM0EwOC45NjMwMDAlMkIwMCUzQTAw",
  "previous": null,
  "results": [
    {
      "owner": "replicate",
      "name": "my-app-image-generator",
      "current_release": {
        "number": 1,
        "model": "stability-ai/sdxl",
        "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
        "created_at": "2024-02-15T16:32:57.018467Z",
        "created_by": {
          "type": "organization",
          "username": "acme",
          "name": "Acme Corp, Inc.",
          "avatar_url": "https://cdn.replicate.com/avatars/acme.png",
          "github_url": "https://github.com/acme"
        },
        "configuration": {
          "hardware": "gpu-t4",
          "min_instances": 1,
          "max_instances": 5
        }
      }
    }
  ]
}

client.deployments.create(, ):

DeploymentCreateResponse

post/deployments

Create a new deployment:

Example cURL request:

curl -s \
  -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
        "name": "my-app-image-generator",
        "model": "stability-ai/sdxl",
        "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
        "hardware": "gpu-t4",
        "min_instances": 0,
        "max_instances": 3
      }' \
  https://api.replicate.com/v1/deployments

The response will be a JSON object describing the deployment:

{
  "owner": "acme",
  "name": "my-app-image-generator",
  "current_release": {
    "number": 1,
    "model": "stability-ai/sdxl",
    "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
    "created_at": "2024-02-15T16:32:57.018467Z",
    "created_by": {
      "type": "organization",
      "username": "acme",
      "name": "Acme Corp, Inc.",
      "avatar_url": "https://cdn.replicate.com/avatars/acme.png",
      "github_url": "https://github.com/acme"
    },
    "configuration": {
      "hardware": "gpu-t4",
      "min_instances": 1,
      "max_instances": 5
    }
  }
}

Parameters

body:

DeploymentCreateParams

hardware: string

The SKU for the hardware used to run the model. Possible values can be retrieved from the hardware.list endpoint.

max_instances: number

(maximum: 20, minimum: 0)

The maximum number of instances for scaling.

min_instances: number

(maximum: 5, minimum: 0)

The minimum number of instances for scaling.

model: string

The full name of the model that you want to deploy e.g. stability-ai/sdxl.

name: string

The name of the deployment.

version: string

The 64-character string ID of the model version that you want to deploy.

Returns

DeploymentCreateResponse{

current_release?:

CurrentRelease

name?: string

The name of the deployment.

owner?: string

The owner of the deployment.

Request example

import Replicate from 'replicate';

const replicate = new Replicate({
  bearerToken: process.env['REPLICATE_API_TOKEN'], // This is the default and can be omitted
});

const deployment = await replicate.deployments.create({
  hardware: 'hardware',
  max_instances: 0,
  min_instances: 0,
  model: 'model',
  name: 'name',
  version: 'version',
});

console.log(deployment.current_release);

200Example

{
  "current_release": {
    "configuration": {
      "hardware": "hardware",
      "max_instances": 0,
      "min_instances": 0
    },
    "created_at": "2019-12-27T18:11:19.117Z",
    "created_by": {
      "type": "organization",
      "username": "username",
      "avatar_url": "https://example.com",
      "github_url": "https://example.com",
      "name": "name"
    },
    "model": "model",
    "number": 0,
    "version": "version"
  },
  "name": "name",
  "owner": "owner"
}

client.deployments.delete(, ): void

delete/deployments/{deployment_owner}/{deployment_name}

Delete a deployment

Deployment deletion has some restrictions:

You can only delete deployments that have been offline and unused for at least 15 minutes.

Example cURL request:

curl -s -X DELETE \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  https://api.replicate.com/v1/deployments/acme/my-app-image-generator

The response will be an empty 204, indicating the deployment has been deleted.

client.deployments.get(, ):

DeploymentGetResponse

get/deployments/{deployment_owner}/{deployment_name}

Get information about a deployment by name including the current release.

Example cURL request:

curl -s \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  https://api.replicate.com/v1/deployments/replicate/my-app-image-generator

The response will be a JSON object describing the deployment:

{
  "owner": "acme",
  "name": "my-app-image-generator",
  "current_release": {
    "number": 1,
    "model": "stability-ai/sdxl",
    "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
    "created_at": "2024-02-15T16:32:57.018467Z",
    "created_by": {
      "type": "organization",
      "username": "acme",
      "name": "Acme Corp, Inc.",
      "avatar_url": "https://cdn.replicate.com/avatars/acme.png",
      "github_url": "https://github.com/acme"
    },
    "configuration": {
      "hardware": "gpu-t4",
      "min_instances": 1,
      "max_instances": 5
    }
  }
}

client.deployments.update(, ):

DeploymentUpdateResponse

patch/deployments/{deployment_owner}/{deployment_name}

Update properties of an existing deployment, including hardware, min/max instances, and the deployment's underlying model version.

Example cURL request:

curl -s \
  -X PATCH \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"min_instances": 3, "max_instances": 10}' \
  https://api.replicate.com/v1/deployments/acme/my-app-image-generator

The response will be a JSON object describing the deployment:

{
  "owner": "acme",
  "name": "my-app-image-generator",
  "current_release": {
    "number": 2,
    "model": "stability-ai/sdxl",
    "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
    "created_at": "2024-02-15T16:32:57.018467Z",
    "created_by": {
      "type": "organization",
      "username": "acme",
      "name": "Acme Corp, Inc.",
      "avatar_url": "https://cdn.replicate.com/avatars/acme.png",
      "github_url": "https://github.com/acme"
    },
    "configuration": {
      "hardware": "gpu-t4",
      "min_instances": 3,
      "max_instances": 10
    }
  }
}

Updating any deployment properties will increment the number field of the current_release.

Deployments

Predictions

Deployments.Predictions

Methods

client.deployments.predictions.create(, ):

Prediction

post/deployments/{deployment_owner}/{deployment_name}/predictions

Create a prediction for the deployment and inputs you provide.

Example cURL request:

curl -s -X POST -H 'Prefer: wait' \
  -d '{"input": {"prompt": "A photo of a bear riding a bicycle over the moon"}}' \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H 'Content-Type: application/json' \
  https://api.replicate.com/v1/deployments/acme/my-app-image-generator/predictions

The request will wait up to 60 seconds for the model to run. If this time is exceeded the prediction will be returned in a "starting" state and need to be retrieved using the predictions.get endpoint.

For a complete overview of the deployments.predictions.create API check out our documentation on creating a prediction which covers a variety of use cases.