Documentation | Deployments › Predictions

API Reference

Resources

Predictions

Models

Create a prediction using a deployment

Deployments

Methods

client.deployments.list():

CursorURLPage

DeploymentListResponse

get/deployments

Get a list of deployments associated with the current account, including the latest release configuration for each deployment.

Example cURL request:

curl -s \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  https://api.replicate.com/v1/deployments

The response will be a paginated JSON array of deployment objects, sorted with the most recent deployment first:

{
  "next": "http://api.replicate.com/v1/deployments?cursor=cD0yMDIzLTA2LTA2KzIzJTNBNDAlM0EwOC45NjMwMDAlMkIwMCUzQTAw",
  "previous": null,
  "results": [
    {
      "owner": "replicate",
      "name": "my-app-image-generator",
      "current_release": {
        "number": 1,
        "model": "stability-ai/sdxl",
        "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
        "created_at": "2024-02-15T16:32:57.018467Z",
        "created_by": {
          "type": "organization",
          "username": "acme",
          "name": "Acme Corp, Inc.",
          "avatar_url": "https://cdn.replicate.com/avatars/acme.png",
          "github_url": "https://github.com/acme"
        },
        "configuration": {
          "hardware": "gpu-t4",
          "min_instances": 1,
          "max_instances": 5
        }
      }
    }
  ]
}

client.deployments.create(, ):

DeploymentCreateResponse

post/deployments

Create a new deployment:

Example cURL request:

curl -s \
  -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
        "name": "my-app-image-generator",
        "model": "stability-ai/sdxl",
        "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
        "hardware": "gpu-t4",
        "min_instances": 0,
        "max_instances": 3
      }' \
  https://api.replicate.com/v1/deployments

The response will be a JSON object describing the deployment:

{
  "owner": "acme",
  "name": "my-app-image-generator",
  "current_release": {
    "number": 1,
    "model": "stability-ai/sdxl",
    "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
    "created_at": "2024-02-15T16:32:57.018467Z",
    "created_by": {
      "type": "organization",
      "username": "acme",
      "name": "Acme Corp, Inc.",
      "avatar_url": "https://cdn.replicate.com/avatars/acme.png",
      "github_url": "https://github.com/acme"
    },
    "configuration": {
      "hardware": "gpu-t4",
      "min_instances": 1,
      "max_instances": 5
    }
  }
}

client.deployments.delete(, ): void

delete/deployments/{deployment_owner}/{deployment_name}

Delete a deployment

Deployment deletion has some restrictions:

You can only delete deployments that have been offline and unused for at least 15 minutes.

Example cURL request:

curl -s -X DELETE \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  https://api.replicate.com/v1/deployments/acme/my-app-image-generator

The response will be an empty 204, indicating the deployment has been deleted.

client.deployments.get(, ):

DeploymentGetResponse

get/deployments/{deployment_owner}/{deployment_name}

Get information about a deployment by name including the current release.

Example cURL request:

curl -s \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  https://api.replicate.com/v1/deployments/replicate/my-app-image-generator

The response will be a JSON object describing the deployment:

{
  "owner": "acme",
  "name": "my-app-image-generator",
  "current_release": {
    "number": 1,
    "model": "stability-ai/sdxl",
    "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
    "created_at": "2024-02-15T16:32:57.018467Z",
    "created_by": {
      "type": "organization",
      "username": "acme",
      "name": "Acme Corp, Inc.",
      "avatar_url": "https://cdn.replicate.com/avatars/acme.png",
      "github_url": "https://github.com/acme"
    },
    "configuration": {
      "hardware": "gpu-t4",
      "min_instances": 1,
      "max_instances": 5
    }
  }
}

client.deployments.update(, ):

DeploymentUpdateResponse

patch/deployments/{deployment_owner}/{deployment_name}

Update properties of an existing deployment, including hardware, min/max instances, and the deployment's underlying model version.

Example cURL request:

curl -s \
  -X PATCH \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"min_instances": 3, "max_instances": 10}' \
  https://api.replicate.com/v1/deployments/acme/my-app-image-generator

The response will be a JSON object describing the deployment:

{
  "owner": "acme",
  "name": "my-app-image-generator",
  "current_release": {
    "number": 2,
    "model": "stability-ai/sdxl",
    "version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
    "created_at": "2024-02-15T16:32:57.018467Z",
    "created_by": {
      "type": "organization",
      "username": "acme",
      "name": "Acme Corp, Inc.",
      "avatar_url": "https://cdn.replicate.com/avatars/acme.png",
      "github_url": "https://github.com/acme"
    },
    "configuration": {
      "hardware": "gpu-t4",
      "min_instances": 3,
      "max_instances": 10
    }
  }
}

Updating any deployment properties will increment the number field of the current_release.

Deployments

Predictions

Deployments.Predictions

Methods

client.deployments.predictions.create(, ):

Prediction

post/deployments/{deployment_owner}/{deployment_name}/predictions

Create a prediction for the deployment and inputs you provide.

Example cURL request:

curl -s -X POST -H 'Prefer: wait' \
  -d '{"input": {"prompt": "A photo of a bear riding a bicycle over the moon"}}' \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H 'Content-Type: application/json' \
  https://api.replicate.com/v1/deployments/acme/my-app-image-generator/predictions

The request will wait up to 60 seconds for the model to run. If this time is exceeded the prediction will be returned in a "starting" state and need to be retrieved using the predictions.get endpoint.

For a complete overview of the deployments.predictions.create API check out our documentation on creating a prediction which covers a variety of use cases.

Parameters

params:

PredictionCreateParams

deployment_owner: string

Path param: The name of the user or organization that owns the deployment.

deployment_name: string

Path param: The name of the deployment.

input: unknown

Body param: The model's input as a JSON object. The input schema depends on what model you are running. To see the available inputs, click the "API" tab on the model you are running or get the model version and look at its openapi_schema property. For example, stability-ai/sdxl takes prompt as an input.

Files should be passed as HTTP URLs or data URLs.

Use an HTTP URL when:

you have a large file > 256kb
you want to be able to use the file multiple times
you want your prediction metadata to be associable with your input files

Use a data URL when:

you have a small file <= 256kb
you don't want to upload and host the file somewhere
you don't need to use the file again (Replicate will not store it)

stream?: boolean

Body param: This field is deprecated.

Request a URL to receive streaming output using server-sent events (SSE).

This field is no longer needed as the returned prediction will always have a stream entry in its urls property if the model supports streaming.

webhook?: string

Body param: An HTTPS URL for receiving a webhook when the prediction has new output. The webhook will be a POST request where the request body is the same as the response body of the get prediction operation. If there are network problems, we will retry the webhook a few times, so make sure it can be safely called more than once. Replicate will not follow redirects when sending webhook requests to your service, so be sure to specify a URL that will resolve without redirecting.

webhook_events_filter?: Array<"start" | "output" | "logs" | 1 more...>

Body param: By default, we will send requests to your webhook URL whenever there are new outputs or the prediction has finished. You can change which events trigger webhook requests by specifying webhook_events_filter in the prediction request:

start: immediately on prediction start
output: each time a prediction generates an output (note that predictions can generate multiple outputs)
logs: each time log output is generated by a prediction
completed: when the prediction reaches a terminal state (succeeded/canceled/failed)

For example, if you only wanted requests to be sent at the start and end of the prediction, you would provide:

{
  "input": {
    "text": "Alice"
  },
  "webhook": "https://example.com/my-webhook",
  "webhook_events_filter": ["start", "completed"]
}

Requests for event types output and logs will be sent at most once every 500ms. If you request start and completed webhooks, then they'll always be sent regardless of throttling.

cancelAfter?: string

Header param: The maximum time the prediction can run before it is automatically canceled. The lifetime is measured from when the prediction is created.

The duration can be specified as string with an optional unit suffix:

s for seconds (e.g., 30s, 90s)
m for minutes (e.g., 5m, 15m)
h for hours (e.g., 1h, 2h30m)
defaults to seconds if no unit suffix is provided (e.g. 30 is the same as 30s)

You can combine units for more precision (e.g., 1h30m45s).

The minimum allowed duration is 5 seconds.

Prefer?: string

Header param: Leave the request open and wait for the model to finish generating output. Set to wait=n where n is a number of seconds between 1 and 60.

See https://replicate.com/docs/topics/predictions/create-a-prediction#sync-mode for more information.

Returns

Prediction

Request example

import Replicate from 'replicate';

const replicate = new Replicate({
  bearerToken: process.env['REPLICATE_API_TOKEN'], // This is the default and can be omitted
});

const prediction = await replicate.deployments.predictions.create({
  deployment_owner: 'deployment_owner',
  deployment_name: 'deployment_name',
  input: { prompt: 'Tell me a joke', system_prompt: 'You are a helpful assistant' },
});

console.log(prediction.id);

200Example

{
  "id": "id",
  "created_at": "2019-12-27T18:11:19.117Z",
  "data_removed": true,
  "error": "error",
  "input": {
    "foo": "bar"
  },
  "model": "model",
  "output": {},
  "status": "starting",
  "urls": {
    "cancel": "https://example.com",
    "get": "https://example.com",
    "web": "https://example.com",
    "stream": "https://example.com"
  },
  "version": "string",
  "completed_at": "2019-12-27T18:11:19.117Z",
  "deadline": "2019-12-27T18:11:19.117Z",
  "deployment": "deployment",
  "logs": "logs",
  "metrics": {
    "total_time": 0
  },
  "started_at": "2019-12-27T18:11:19.117Z"
}