Deployments
Deployments
Methods
Create a new deployment:
Example cURL request:
curl -s \
-X POST \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "my-app-image-generator",
"model": "stability-ai/sdxl",
"version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
"hardware": "gpu-t4",
"min_instances": 0,
"max_instances": 3
}' \
https://api.replicate.com/v1/deployments
The response will be a JSON object describing the deployment:
{
"owner": "acme",
"name": "my-app-image-generator",
"current_release": {
"number": 1,
"model": "stability-ai/sdxl",
"version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
"created_at": "2024-02-15T16:32:57.018467Z",
"created_by": {
"type": "organization",
"username": "acme",
"name": "Acme Corp, Inc.",
"avatar_url": "https://cdn.replicate.com/avatars/acme.png",
"github_url": "https://github.com/acme"
},
"configuration": {
"hardware": "gpu-t4",
"min_instances": 1,
"max_instances": 5
}
}
}
Delete a deployment
Deployment deletion has some restrictions:
- You can only delete deployments that have been offline and unused for at least 15 minutes.
Example cURL request:
curl -s -X DELETE \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
https://api.replicate.com/v1/deployments/acme/my-app-image-generator
The response will be an empty 204, indicating the deployment has been deleted.
Get information about a deployment by name including the current release.
Example cURL request:
curl -s \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
https://api.replicate.com/v1/deployments/replicate/my-app-image-generator
The response will be a JSON object describing the deployment:
{
"owner": "acme",
"name": "my-app-image-generator",
"current_release": {
"number": 1,
"model": "stability-ai/sdxl",
"version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
"created_at": "2024-02-15T16:32:57.018467Z",
"created_by": {
"type": "organization",
"username": "acme",
"name": "Acme Corp, Inc.",
"avatar_url": "https://cdn.replicate.com/avatars/acme.png",
"github_url": "https://github.com/acme"
},
"configuration": {
"hardware": "gpu-t4",
"min_instances": 1,
"max_instances": 5
}
}
}
Get a list of deployments associated with the current account, including the latest release configuration for each deployment.
Example cURL request:
curl -s \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
https://api.replicate.com/v1/deployments
The response will be a paginated JSON array of deployment objects, sorted with the most recent deployment first:
{
"next": "http://api.replicate.com/v1/deployments?cursor=cD0yMDIzLTA2LTA2KzIzJTNBNDAlM0EwOC45NjMwMDAlMkIwMCUzQTAw",
"previous": null,
"results": [
{
"owner": "replicate",
"name": "my-app-image-generator",
"current_release": {
"number": 1,
"model": "stability-ai/sdxl",
"version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
"created_at": "2024-02-15T16:32:57.018467Z",
"created_by": {
"type": "organization",
"username": "acme",
"name": "Acme Corp, Inc.",
"avatar_url": "https://cdn.replicate.com/avatars/acme.png",
"github_url": "https://github.com/acme"
},
"configuration": {
"hardware": "gpu-t4",
"min_instances": 1,
"max_instances": 5
}
}
}
]
}
Update properties of an existing deployment, including hardware, min/max instances, and the deployment's underlying model version.
Example cURL request:
curl -s \
-X PATCH \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"min_instances": 3, "max_instances": 10}' \
https://api.replicate.com/v1/deployments/acme/my-app-image-generator
The response will be a JSON object describing the deployment:
{
"owner": "acme",
"name": "my-app-image-generator",
"current_release": {
"number": 2,
"model": "stability-ai/sdxl",
"version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
"created_at": "2024-02-15T16:32:57.018467Z",
"created_by": {
"type": "organization",
"username": "acme",
"name": "Acme Corp, Inc.",
"avatar_url": "https://cdn.replicate.com/avatars/acme.png",
"github_url": "https://github.com/acme"
},
"configuration": {
"hardware": "gpu-t4",
"min_instances": 3,
"max_instances": 10
}
}
}
Updating any deployment properties will increment the number
field of the current_release
.
Predictions
Deployments.Predictions
Methods
Create a prediction for the deployment and inputs you provide.
Example cURL request:
curl -s -X POST -H 'Prefer: wait' \
-d '{"input": {"prompt": "A photo of a bear riding a bicycle over the moon"}}' \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H 'Content-Type: application/json' \
https://api.replicate.com/v1/deployments/acme/my-app-image-generator/predictions
The request will wait up to 60 seconds for the model to run. If this time is exceeded the prediction will be returned in a "starting"
state and need to be retrieved using the predictions.get
endpoint.
For a complete overview of the deployments.predictions.create
API check out our documentation on creating a prediction which covers a variety of use cases.
Path param: The name of the user or organization that owns the deployment.
Path param: The name of the deployment.
Body param: The model's input as a JSON object. The input schema depends on what model you are running. To see the available inputs, click the "API" tab on the model you are running or get the model version and look at its openapi_schema
property. For example, stability-ai/sdxl takes prompt
as an input.
Files should be passed as HTTP URLs or data URLs.
Use an HTTP URL when:
- you have a large file > 256kb
- you want to be able to use the file multiple times
- you want your prediction metadata to be associable with your input files
Use a data URL when:
- you have a small file <= 256kb
- you don't want to upload and host the file somewhere
- you don't need to use the file again (Replicate will not store it)
Body param: This field is deprecated.
Request a URL to receive streaming output using server-sent events (SSE).
This field is no longer needed as the returned prediction will always have a stream
entry in its urls
property if the model supports streaming.
Body param: An HTTPS URL for receiving a webhook when the prediction has new output. The webhook will be a POST request where the request body is the same as the response body of the get prediction operation. If there are network problems, we will retry the webhook a few times, so make sure it can be safely called more than once. Replicate will not follow redirects when sending webhook requests to your service, so be sure to specify a URL that will resolve without redirecting.
Body param: By default, we will send requests to your webhook URL whenever there are new outputs or the prediction has finished. You can change which events trigger webhook requests by specifying webhook_events_filter
in the prediction request:
start
: immediately on prediction startoutput
: each time a prediction generates an output (note that predictions can generate multiple outputs)logs
: each time log output is generated by a predictioncompleted
: when the prediction reaches a terminal state (succeeded/canceled/failed)
For example, if you only wanted requests to be sent at the start and end of the prediction, you would provide:
{
"input": {
"text": "Alice"
},
"webhook": "https://example.com/my-webhook",
"webhook_events_filter": ["start", "completed"]
}
Requests for event types output
and logs
will be sent at most once every 500ms. If you request start
and completed
webhooks, then they'll always be sent regardless of throttling.
Header param: The maximum time the prediction can run before it is automatically canceled. The lifetime is measured from when the prediction is created.
The duration can be specified as string with an optional unit suffix:
s
for seconds (e.g.,30s
,90s
)m
for minutes (e.g.,5m
,15m
)h
for hours (e.g.,1h
,2h30m
)- defaults to seconds if no unit suffix is provided (e.g.
30
is the same as30s
)
You can combine units for more precision (e.g., 1h30m45s
).
The minimum allowed duration is 5 seconds.
Header param: Leave the request open and wait for the model to finish generating output. Set to wait=n
where n is a number of seconds between 1 and 60.
See https://replicate.com/docs/topics/predictions/create-a-prediction#sync-mode for more information.