After finishing work on the new pages for browsing and searching storage logs one thing became clear: the retry mechanism for attempting to push videos to storage needed an overhaul as well.
Old Retry Mechanism
Until now any kind of failure in our pipeline (copying, transcoding or storing) triggered a retry every 2 minutes for a period of between 3 and 4 calendar days.
If the recording would not succeed in being copied between our servers, transcoded to .mp4 or pushed to your (S)FTP, S3 or Dropbox storage after many retries it would have been eliminated from the retry queue and marked as failed.
Any failure would thus trigger between 2160 and 2880 retries over the course of between 3 and 4 calendar days generating a lot of CPU and network usage and a lot of log entries when the push to storage attempt failed. There was also the chance of hosts banning our servers’ IPs for too many (S)FTP connection attempts.
New Retry Mechanism
Today we have updated our server side code with a new and revised retry mechanism.
When a recording fails at any point in the pipeline (copy, transcoding or push to storage) the recording will be retried 12 times over 3 days in the following sequence:
- 1st failed (live) attempt
- 2 minutes after 1st failed (live) attempt
- 5 minutes after the last retry
- 10 minutes after the last retry
- 15 minutes after the last retry
- 30 minutes after the last retry
- 60 minutes after the last retry
- 2 hours after the last retry
- 4 hours after the last retry
- 6 hours after the last retry
- 12 hours after the last retry
- 24 hours after the last retry
- 24 hours after the last retry
This will lead to a total of 13 attempts (1 live attempt + 12 retries) over the span of 74 hours and 2 minutes for every process in the pipeline.
If the failure is related to a storage error each recording will have a maximum of 13 logs, one for each attempt. In such cases you can now easily check when the next retry attempt will be made by looking at the date of your last failed (S)FTP, S3 or Dropbox log entry of a specific recording while taking into account the number of failed attempts. For example, if a recording failed to be pushed to (S)FTP at 12:31:06 and it is the 6th time it has failed, the next retry (7th) will be made 60 minutes after, at roughly 13:32.
As an added benefit, the CPU and network usage for our transcoding servers will be lower, resulting in a higher availability for processing newly made recordings.
Our retry mechanism looks for failed recordings to retry at the beginning of each minute so the above retry intervals might vary by +1 minute. For example, if a video fails to be pushed to your FTP at 12:47:02, the 1st retry will be at 12:50:00, 2 minutes and 58 seconds after.
Storage Errors Notification Emails
Because the maximum number of attempts is now just 13, we are now sending an email, to the Pipe account email, for any storage error that occurred for a particular recording while Pipe tried to push the video to the user’s storage.
You’ll now be immediately aware of any kind of issues related to your storage, while easy access to the storage logs helps you quickly pinpoint the issue.
By default, these emails are enabled but you can disable them from the environment settings page in your dashboard.