Deployments
Deployments
Methods
Create a new deployment:
Example cURL request:
curl -s \
-X POST \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "my-app-image-generator",
"model": "stability-ai/sdxl",
"version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
"hardware": "gpu-t4",
"min_instances": 0,
"max_instances": 3
}' \
https://api.replicate.com/v1/deployments
The response will be a JSON object describing the deployment:
{
"owner": "acme",
"name": "my-app-image-generator",
"current_release": {
"number": 1,
"model": "stability-ai/sdxl",
"version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
"created_at": "2024-02-15T16:32:57.018467Z",
"created_by": {
"type": "organization",
"username": "acme",
"name": "Acme Corp, Inc.",
"avatar_url": "https://cdn.replicate.com/avatars/acme.png",
"github_url": "https://github.com/acme"
},
"configuration": {
"hardware": "gpu-t4",
"min_instances": 1,
"max_instances": 5
}
}
}
The SKU for the hardware used to run the model. Possible values can be retrieved from the hardware.list
endpoint.
The maximum number of instances for scaling.
The minimum number of instances for scaling.
The full name of the model that you want to deploy e.g. stability-ai/sdxl.
The name of the deployment.
The 64-character string ID of the model version that you want to deploy.
The name of the deployment.
The owner of the deployment.
Delete a deployment
Deployment deletion has some restrictions:
- You can only delete deployments that have been offline and unused for at least 15 minutes.
Example cURL request:
curl -s -X DELETE \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
https://api.replicate.com/v1/deployments/acme/my-app-image-generator
The response will be an empty 204, indicating the deployment has been deleted.
Get information about a deployment by name including the current release.
Example cURL request:
curl -s \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
https://api.replicate.com/v1/deployments/replicate/my-app-image-generator
The response will be a JSON object describing the deployment:
{
"owner": "acme",
"name": "my-app-image-generator",
"current_release": {
"number": 1,
"model": "stability-ai/sdxl",
"version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
"created_at": "2024-02-15T16:32:57.018467Z",
"created_by": {
"type": "organization",
"username": "acme",
"name": "Acme Corp, Inc.",
"avatar_url": "https://cdn.replicate.com/avatars/acme.png",
"github_url": "https://github.com/acme"
},
"configuration": {
"hardware": "gpu-t4",
"min_instances": 1,
"max_instances": 5
}
}
}
Get a list of deployments associated with the current account, including the latest release configuration for each deployment.
Example cURL request:
curl -s \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
https://api.replicate.com/v1/deployments
The response will be a paginated JSON array of deployment objects, sorted with the most recent deployment first:
{
"next": "http://api.replicate.com/v1/deployments?cursor=cD0yMDIzLTA2LTA2KzIzJTNBNDAlM0EwOC45NjMwMDAlMkIwMCUzQTAw",
"previous": null,
"results": [
{
"owner": "replicate",
"name": "my-app-image-generator",
"current_release": {
"number": 1,
"model": "stability-ai/sdxl",
"version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
"created_at": "2024-02-15T16:32:57.018467Z",
"created_by": {
"type": "organization",
"username": "acme",
"name": "Acme Corp, Inc.",
"avatar_url": "https://cdn.replicate.com/avatars/acme.png",
"github_url": "https://github.com/acme"
},
"configuration": {
"hardware": "gpu-t4",
"min_instances": 1,
"max_instances": 5
}
}
}
]
}
Update properties of an existing deployment, including hardware, min/max instances, and the deployment's underlying model version.
Example cURL request:
curl -s \
-X PATCH \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"min_instances": 3, "max_instances": 10}' \
https://api.replicate.com/v1/deployments/acme/my-app-image-generator
The response will be a JSON object describing the deployment:
{
"owner": "acme",
"name": "my-app-image-generator",
"current_release": {
"number": 2,
"model": "stability-ai/sdxl",
"version": "da77bc59ee60423279fd632efb4795ab731d9e3ca9705ef3341091fb989b7eaf",
"created_at": "2024-02-15T16:32:57.018467Z",
"created_by": {
"type": "organization",
"username": "acme",
"name": "Acme Corp, Inc.",
"avatar_url": "https://cdn.replicate.com/avatars/acme.png",
"github_url": "https://github.com/acme"
},
"configuration": {
"hardware": "gpu-t4",
"min_instances": 3,
"max_instances": 10
}
}
}
Updating any deployment properties will increment the number
field of the current_release
.
Predictions
Deployments.Predictions
Methods
Create a prediction for the deployment and inputs you provide.
Example cURL request:
curl -s -X POST -H 'Prefer: wait' \
-d '{"input": {"prompt": "A photo of a bear riding a bicycle over the moon"}}' \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H 'Content-Type: application/json' \
https://api.replicate.com/v1/deployments/acme/my-app-image-generator/predictions
The request will wait up to 60 seconds for the model to run. If this time is exceeded the prediction will be returned in a "starting"
state and need to be retrieved using the predictions.get
endpoint.
For a complete overview of the deployments.predictions.create
API check out our documentation on creating a prediction which covers a variety of use cases.