Predictions
predictions
Methods
Cancel a prediction that is currently running.
Example cURL request that creates a prediction and then cancels it:
# First, create a prediction
PREDICTION_ID=$(curl -s -X POST \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"input": {
"prompt": "a video that may take a while to generate"
}
}' \
https://api.replicate.com/v1/models/minimax/video-01/predictions | jq -r '.id')
# Echo the prediction ID
echo "Created prediction with ID: $PREDICTION_ID"
# Cancel the prediction
curl -s -X POST \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
https://api.replicate.com/v1/predictions/$PREDICTION_ID/cancel
Create a prediction for the model version and inputs you provide.
Example cURL request:
curl -s -X POST -H 'Prefer: wait' \
-d '{"version": "replicate/hello-world:5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa", "input": {"text": "Alice"}}' \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H 'Content-Type: application/json' \
https://api.replicate.com/v1/predictions
The request will wait up to 60 seconds for the model to run. If this time is exceeded the prediction will be returned in a "starting"
state and need to be retrieved using the predictions.get
endpoint.
For a complete overview of the predictions.create
API check out our documentation on creating a prediction which covers a variety of use cases.
Get the current state of a prediction.
Example cURL request:
curl -s \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu
The response will be the prediction object:
{
"id": "gm3qorzdhgbfurvjtvhg6dckhu",
"model": "replicate/hello-world",
"version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
"input": {
"text": "Alice"
},
"logs": "",
"output": "hello Alice",
"error": null,
"status": "succeeded",
"created_at": "2023-09-08T16:19:34.765994Z",
"data_removed": false,
"started_at": "2023-09-08T16:19:34.779176Z",
"completed_at": "2023-09-08T16:19:34.791859Z",
"metrics": {
"predict_time": 0.012683
},
"urls": {
"web": "https://replicate.com/p/gm3qorzdhgbfurvjtvhg6dckhu",
"get": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu",
"cancel": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu/cancel"
}
}
status
will be one of:
starting
: the prediction is starting up. If this status lasts longer than a few seconds, then it's typically because a new worker is being started to run the prediction.processing
: thepredict()
method of the model is currently running.succeeded
: the prediction completed successfully.failed
: the prediction encountered an error during processing.canceled
: the prediction was canceled by its creator.
In the case of success, output
will be an object containing the output of the model. Any files will be represented as HTTPS URLs. You'll need to pass the Authorization
header to request them.
In the case of failure, error
will contain the error encountered during the prediction.
Terminated predictions (with a status of succeeded
, failed
, or canceled
) will include a metrics
object with a predict_time
property showing the amount of CPU or GPU time, in seconds, that the prediction used while running. It won't include time waiting for the prediction to start. The metrics
object will also include a total_time
property showing the total time, in seconds, that the prediction took to complete.
All input parameters, output values, and logs are automatically removed after an hour, by default, for predictions created through the API.
You must save a copy of any data or files in the output if you'd like to continue using them. The output
key will still be present, but it's value will be null
after the output has been removed.
Output files are served by replicate.delivery
and its subdomains. If you use an allow list of external domains for your assets, add replicate.delivery
and *.replicate.delivery
to it.
Get a paginated list of all predictions created by the user or organization associated with the provided API token.
This will include predictions created from the API and the website. It will return 100 records per page.
Example cURL request:
curl -s \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
https://api.replicate.com/v1/predictions
The response will be a paginated JSON array of prediction objects, sorted with the most recent prediction first:
{
"next": null,
"previous": null,
"results": [
{
"completed_at": "2023-09-08T16:19:34.791859Z",
"created_at": "2023-09-08T16:19:34.907244Z",
"data_removed": false,
"error": null,
"id": "gm3qorzdhgbfurvjtvhg6dckhu",
"input": {
"text": "Alice"
},
"metrics": {
"predict_time": 0.012683
},
"output": "hello Alice",
"started_at": "2023-09-08T16:19:34.779176Z",
"source": "api",
"status": "succeeded",
"urls": {
"web": "https://replicate.com/p/gm3qorzdhgbfurvjtvhg6dckhu",
"get": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu",
"cancel": "https://api.replicate.com/v1/predictions/gm3qorzdhgbfurvjtvhg6dckhu/cancel"
},
"model": "replicate/hello-world",
"version": "5c7d5dc6dd8bf75c1acaa8565735e7986bc5b66206b55cca93cb72c9bf15ccaa",
}
]
}
id
will be the unique ID of the prediction.
source
will indicate how the prediction was created. Possible values are web
or api
.
status
will be the status of the prediction. Refer to get a single prediction for possible values.
urls
will be a convenience object that can be used to construct new API requests for the given prediction. If the requested model version supports streaming, this will have a stream
entry with an HTTPS URL that you can use to construct an EventSource
.
model
will be the model identifier string in the format of {model_owner}/{model_name}
.
version
will be the unique ID of model version used to create the prediction.
data_removed
will be true
if the input and output data has been deleted.
Domain types
The prediction output, which can be any JSON-serializable value, depending on the model