Documentation

Predictions

Predictions

Methods

cancel(, ):
post/predictions/{prediction_id}/cancel

Cancel a prediction that is currently running.

Example cURL request that creates a prediction and then cancels it:

# First, create a prediction
PREDICTION_ID=$(curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "prompt": "a video that may take a while to generate"
    }
  }' \
  https://api.replicate.com/v1/models/minimax/video-01/predictions | jq -r '.id')

# Echo the prediction ID
echo "Created prediction with ID: $PREDICTION_ID"

# Cancel the prediction
curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  https://api.replicate.com/v1/predictions/$PREDICTION_ID/cancel
create(, ):
post/predictions

Create a prediction for the model version and inputs you provide.

Example cURL request:

curl -s -X POST -H 'Prefer: wait' \
  -d '{"version": "replicate/hello-world:5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa", "input": {"text": "Alice"}}' \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H 'Content-Type: application/json' \
  https://api.replicate.com/v1/predictions

The request will wait up to 60 seconds for the model to run. If this time is exceeded the prediction will be returned in a "starting" state and need to be retrieved using the predictions.get endpoint.

For a complete overview of the predictions.create API check out our documentation on creating a prediction which covers a variety of use cases.

Parameters
input: unknown

Body param: The model's input as a JSON object. The input schema depends on what model you are running. To see the available inputs, click the "API" tab on the model you are running or get the model version and look at its openapi_schema property. For example, stability-ai/sdxl takes prompt as an input.

Files should be passed as HTTP URLs or data URLs.

Use an HTTP URL when:

  • you have a large file > 256kb
  • you want to be able to use the file multiple times
  • you want your prediction metadata to be associable with your input files

Use a data URL when:

  • you have a small file <= 256kb
  • you don't want to upload and host the file somewhere
  • you don't need to use the file again (Replicate will not store it)
version: string

Body param: The identifier for the model or model version that you want to run. This can be specified in a few different formats:

  • {owner_name}/{model_name} - Use this format for official models. For example, black-forest-labs/flux-schnell. For all other models, the specific version is required.
  • {owner_name}/{model_name}:{version_id} - The owner and model name, plus the full 64-character version ID. For example, replicate/hello-world:9dcd6d78e7c6560c340d916fe32e9f24aabfa331e5cce95fe31f77fb03121426.
  • {version_id} - Just the 64-character version ID. For example, 9dcd6d78e7c6560c340d916fe32e9f24aabfa331e5cce95fe31f77fb03121426
stream?: boolean

Body param: This field is deprecated.

Request a URL to receive streaming output using server-sent events (SSE).

This field is no longer needed as the returned prediction will always have a stream entry in its urls property if the model supports streaming.

webhook?: string

Body param: An HTTPS URL for receiving a webhook when the prediction has new output. The webhook will be a POST request where the request body is the same as the response body of the get prediction operation. If there are network problems, we will retry the webhook a few times, so make sure it can be safely called more than once. Replicate will not follow redirects when sending webhook requests to your service, so be sure to specify a URL that will resolve without redirecting.

webhook_events_filter?: Array<"start" | "output" | "logs" | 1 more...>

Body param: By default, we will send requests to your webhook URL whenever there are new outputs or the prediction has finished. You can change which events trigger webhook requests by specifying webhook_events_filter in the prediction request:

  • start: immediately on prediction start
  • output: each time a prediction generates an output (note that predictions can generate multiple outputs)
  • logs: each time log output is generated by a prediction
  • completed: when the prediction reaches a terminal state (succeeded/canceled/failed)

For example, if you only wanted requests to be sent at the start and end of the prediction, you would provide:

{
  "version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
  "input": {
    "text": "Alice"
  },
  "webhook": "https://example.com/my-webhook",
  "webhook_events_filter": ["start", "completed"]
}

Requests for event types output and logs will be sent at most once every 500ms. If you request start and completed webhooks, then they'll always be sent regardless of throttling.

cancelAfter?: string

Header param: The maximum time the prediction can run before it is automatically canceled. The lifetime is measured from when the prediction is created.

The duration can be specified as string with an optional unit suffix:

  • s for seconds (e.g., 30s, 90s)
  • m for minutes (e.g., 5m, 15m)
  • h for hours (e.g., 1h, 2h30m)
  • defaults to seconds if no unit suffix is provided (e.g. 30 is the same as 30s)

You can combine units for more precision (e.g., 1h30m45s).

The minimum allowed duration is 5 seconds.

Prefer?: string

Header param: Leave the request open and wait for the model to finish generating output. Set to wait=n where n is a number of seconds between 1 and 60.

See https://replicate.com/docs/topics/predictions/create-a-prediction#sync-mode for more information.

Returns
id: string
created_at: string
(format: date-time)

The time that the prediction was created

data_removed: boolean

Whether the prediction output has been deleted

error: string | null

An error string if the model status is "failed"

input: Record<string, unknown>

The prediction input

model: string

The name of the model that created the prediction

The prediction output, which can be any JSON-serializable value, depending on the model

status: "starting" | "processing" | "succeeded" | 2 more...
urls:

URLs for working with the prediction

version: | "hidden"

The ID of the model version that created the prediction

completed_at?: string
(format: date-time)

The time that the model completed the prediction and all outputs were uploaded

deadline?: string
(format: date-time)

The absolute time at which the prediction will be automatically canceled if it has not completed

deployment?: string

The name of the deployment that created the prediction

logs?: string

The log output from the model

metrics?:

Additional metrics associated with the prediction

started_at?: string
(format: date-time)

The time that the model began the prediction

Request example
200Example
get(, ):
get/predictions/{prediction_id}

Get the current state of a prediction.

Example cURL request:

curl -s \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu

The response will be the prediction object:

{
  "id": "gm3qorzdhgbfurvjtvhg6dckhu",
  "model": "replicate/hello-world",
  "version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
  "input": {
    "text": "Alice"
  },
  "logs": "",
  "output": "hello Alice",
  "error": null,
  "status": "succeeded",
  "created_at": "2023-09-08T16:19:34.765994Z",
  "data_removed": false,
  "started_at": "2023-09-08T16:19:34.779176Z",
  "completed_at": "2023-09-08T16:19:34.791859Z",
  "metrics": {
    "predict_time": 0.012683
  },
  "urls": {
    "web": "https://replicate.com/p/gm3qorzdhgbfurvjtvhg6dckhu",
    "get": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu",
    "cancel": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu/cancel"
  }
}

status will be one of:

  • starting: the prediction is starting up. If this status lasts longer than a few seconds, then it's typically because a new worker is being started to run the prediction.
  • processing: the predict() method of the model is currently running.
  • succeeded: the prediction completed successfully.
  • failed: the prediction encountered an error during processing.
  • canceled: the prediction was canceled by its creator.

In the case of success, output will be an object containing the output of the model. Any files will be represented as HTTPS URLs. You'll need to pass the Authorization header to request them.

In the case of failure, error will contain the error encountered during the prediction.

Terminated predictions (with a status of succeeded, failed, or canceled) will include a metrics object with a predict_time property showing the amount of CPU or GPU time, in seconds, that the prediction used while running. It won't include time waiting for the prediction to start. The metrics object will also include a total_time property showing the total time, in seconds, that the prediction took to complete.

All input parameters, output values, and logs are automatically removed after an hour, by default, for predictions created through the API.

You must save a copy of any data or files in the output if you'd like to continue using them. The output key will still be present, but it's value will be null after the output has been removed.

Output files are served by replicate.delivery and its subdomains. If you use an allow list of external domains for your assets, add replicate.delivery and *.replicate.delivery to it.

list(, ): <>
get/predictions

Get a paginated list of all predictions created by the user or organization associated with the provided API token.

This will include predictions created from the API and the website. It will return 100 records per page.

Example cURL request:

curl -s \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  https://api.replicate.com/v1/predictions

The response will be a paginated JSON array of prediction objects, sorted with the most recent prediction first:

{
  "next": null,
  "previous": null,
  "results": [
    {
      "completed_at": "2023-09-08T16:19:34.791859Z",
      "created_at": "2023-09-08T16:19:34.907244Z",
      "data_removed": false,
      "error": null,
      "id": "gm3qorzdhgbfurvjtvhg6dckhu",
      "input": {
        "text": "Alice"
      },
      "metrics": {
        "predict_time": 0.012683
      },
      "output": "hello Alice",
      "started_at": "2023-09-08T16:19:34.779176Z",
      "source": "api",
      "status": "succeeded",
      "urls": {
        "web": "https://replicate.com/p/gm3qorzdhgbfurvjtvhg6dckhu",
        "get": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu",
        "cancel": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu/cancel"
      },
      "model": "replicate/hello-world",
      "version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
    }
  ]
}

id will be the unique ID of the prediction.

source will indicate how the prediction was created. Possible values are web or api.

status will be the status of the prediction. Refer to get a single prediction for possible values.

urls will be a convenience object that can be used to construct new API requests for the given prediction. If the requested model version supports streaming, this will have a stream entry with an HTTPS URL that you can use to construct an EventSource.

model will be the model identifier string in the format of {model_owner}/{model_name}.

version will be the unique ID of model version used to create the prediction.

data_removed will be true if the input and output data has been deleted.

Domain types

Prediction{…}
PredictionOutput = unknown

The prediction output, which can be any JSON-serializable value, depending on the model

PredictionRequest{…}