Long-Running Operations
# Problem
When the client needs the API response while the API request may take a long time to complete, it is often a poor user experience to simply let the client wait there until the task is done. What's worse is the client may even cannot get the response back, because the network connection could be closed or timeout before the task is finished.
# Solution
We designed and implemented a long-running operations (LRO) API framework to address the problem. With this framework, we will return a long-running operation status endpoint when the client calls a time-consuming API, then the client can get the API response asynchronously by polling the status endpoint periodically.
# Use cases
We can see that this workflow is a little complex comparing to the synchronous API, so we will not apply this framework to all APIs.
The APIs adopting this framework will meet both of the below requirements:
- The API is time-consuming
- The client needs the API response
If the API finishes quickly, it's unnecessary to introduce the additional complexity for the API. Generally speaking, if one API could potentially take more than 10 seconds to complete, it can be considered as a time-consuming API.
If the API is time-consuming but the client doesn't need the API response, we still don't need this framework. In this case, we will execute the API operation in the server asynchronously and simply return HTTP status code 202 (Accepted)
without operation status URL.
# REST API
# Create Long-Running Operation
Please check out ExtremeCloud IQ API Swagger UI (opens new window), every API with [LRO]
prefix in API summary supports the long-running operation framework.
For example:
GET /devices
POST /devices/:cli
POST /devices/{id}/:reset
...
2
3
4
# Request
This is exactly the same as a normal API request, except supports async
request parameter.
- If there is no
async
parameter orasync=false
, the API server will return the API response synchronously, which is exactly the same as a normal synchronous API. - If the API supports long-running operation framework and the client explicitly set
async=true
in request parameters, this API will convert to asynchronous API and the API server will return an operation status URL immediately. The client should follow the long-running operation workflow to get the API response.
NOTE
- All long-running operation APIs support both synchronous and asynchronous modes.
- The clients can freely choose the mode they need, but they should be aware of the limitations of the different modes.
- By default all long-running operation APIs are in synchronous mode, i.e.
async=false
, the clients need to explicitly setasync=true
to enable asynchronous mode. - The synchronous mode is convenient for testing, but it's not recommended for production use.
# Response
Instead of directly returning the API response, we will respond with an HTTP 202 (Accepted)
status code and additional HTTP response headers indicates the location and frequency that the client should use to poll the operation status.
Below are the additional HTTP response headers:
Header | Description |
---|---|
Location | A URL the client should poll for the operation status and API response |
Retry-After | The estimated time in seconds when the operation will be done, the client should wait for or sleep the given time before sending another Get Long-Running Operation Status API request. This header is designed to prevent polling clients from overwhelming the API server with retries. |
NOTE
- The API service will validate the request to be performed before starting the long-running operation.
- If the request is invalid, the server will reply immediately with an error code such as
HTTP 400 (Bad Request)
. - The client may not honor the
Retry-After
header when sending polling requests, but we recommend that the client should wait a minimum500ms
before sending the next request, otherwise the server may reject the request by rate limiting.
# Get Long-Running Operation Status
# Request
Sending GET request to the Location
header value in the Long-Running Operation API response:
GET /operations/{operationId}
# Response
The Long-Running Operation response will be in the following format:
Field | Type | Mandatory | Description |
---|---|---|---|
id | String | ✅ | The operation's unique identifier. |
metadata | OperationMetadata | ✅ | See below for the details. |
done | Boolean | ✅ | Indicate if the operation is done. |
response | Any | ❌ | The API response of the associated operation if the status is SUCCEEDED . The type is exactly the same as the corresponding synchronous API response. |
error | XiqError | ❌ | The error details if the status is FAILED or CANCELED . |
NOTE
- If the
done
isfalse
, the client should keep polling the operation status until it becomestrue
. - When the
done
istrue
, the client could- Get the API response from
response
field whenmetadata.status
isSUCCEEDED
, or - Get the error details from
error
field whenmetadata.status
isFAILED
orCANCELED
.
- Get the API response from
# OperationMetadata format
Field | Type | Mandatory | Description |
---|---|---|---|
status | String | ✅ | The current operation status. See Operation Status for the details |
cancelable | Boolean | ✅ | The flag indicates if the client can send a cancellation request to the operation in the current status. |
percentage | Integer | ❌ | The progress in percentage ranges from 0 to 100. This is not guaranteed to be accurate. This is an optional field, only valid when the backend ever set this value. Only needed for very long operations that support percentages calculation. |
step | String | ❌ | The optional step name for multiple steps operations when the operation is running. This is an optional field, only valid when the backend ever set this value. Only needed for very long operations with multiple steps for better user experience. |
create_time | Timestamp | ✅ | The create time of the operation. This is the time when the operation is in PENDING status. |
update_time | Timestamp | ✅ | The last update time of the operation. Any operation state change will update the value. |
start_time | Timestamp | ❌ | The start time of the operation. This is the time when the operation is in RUNNING status. This is an optional field, only valid when the operation ever started. |
end_time | Timestamp | ❌ | The end time of the operation. This is the time when the operation is done. This is an optional field, only valid when the operation is done. |
expires_in | Long | ✅ | The number of seconds remaining until the operation expires and is to be deleted. |
# Operation Status
There are 6 operation statuses in a long-running operation lifecycle:
- PENDING
- RUNNING
- CANCELING
- SUCCEEDED
- FAILED
- CANCELED
TIP
- The
RUNNING
andCANCELING
status indicate the operation is in progress. - The
SUCCEEDED
,FAILED
, andCANCELED
status indicate the operation is done.
# Cancel Long-Running Operation
# Request
When the cancelable
is true
in operation metadata
, the clients are allowed to send a cancellation request to the API server to cancel the corresponding long-running operation.
POST /operations/{operationId}/:cancel
NOTE
- When the operation is in
PENDING
status, it can always be canceled. - When the operation is in
RUNNING
status, need to further checkcancelable
flag in operation metadata result. The cancellation request may be rejected ifcancelable
isfalse
. - Simply check
cancelable
flag of operation metadata regardless the operation status is sufficient, the API server will always return correctcancelable
value based on the current status and long-running operation capability. - The cancellation request can still be rejected even the
cancelable
istrue
, because the operation status may be changed when the API server received the cancellation request, for example, the operation is done.
# Response
- The server will start asynchronous cancellation on a long-running operation and responds with
HTTP 200 (OK)
status code if cancellation request is accepted. The server makes its best effort to cancel the operation, but success is not guaranteed. - If the operation doesn't support cancellation, the server will return an error code with
UNIMPLEMENTED
. - If the operation is canceling, the server will ignore the duplicate cancellation requests.
- If the operation is already done, the server will return an error code with
FAILED_PRECONDITION
.
The client can use Get Long-Running Operation Status to check whether the cancellation succeeded or whether the operation is done by ignoring the cancellation request.
WARNING
On successful cancellation, the operation is not deleted; instead, it becomes an operation with status CANCELED
.
# Delete Long-Running Operation
# Request
When the operation is in PENDING
status or is done (in SUCCEEDED
, FAILED
, or CANCELED
status), the clients can send a delete request to delete the operation status.
DELETE /operations/{operationId}
# Response
- The server will start delete the long-running operation status and responds with
HTTP 200 (OK)
status code if deletion is successful. - If the operation is in
RUNNING
orCANCELING
status, the server will return an error code withFAILED_PRECONDITION
.
TIP
The clients don't need to manually delete all operation status, they can be automatically deleted after the expiration time.
WARNING
The server does not automatically cancel the long-running operation when it is running, please call cancellation request first before deleting the operation.