My understanding is that the downtime we had was limited to the Matrix server. We have been trying for months to upgrade, and ultimately had to bring on a DBA to troubleshoot why database migrations during upgrades were taking days and never completing. In any case, as part of this troubleshooting I believe we ran a few database maintenance steps that locked the Matrix database, taking that service down. This would be unrelated and not connected to email or other services though, and Social was unaffected as well, so if customers had issues with email that would have been something separate and unrelated as far as I know.
In any case, my hope is that soon we will be in a position where we can (finally) upgrade Matrix. As anyone who has managed Matrix knows, the protocol frequently creates backwards-incompatible updates that will break either older servers or older clients. Managing a Matrix server in production means trying to maintain a balance between supporting older clients from users who may not be ready to upgrade, and updating servers so that backwards-incompatible changes don’t isolate you from the rest of the Matrix network.
My hope is that Matrix will stabilize soon (I mean Matrix in general, not just our instance) so that future upgrades don’t break backwards compatibility.