This was almost 20 years ago now so yes and no!We wired up the network monitoring systems which built out the hierarchy of network gear, then used a fairly lightweight filter/rules engine to dedupe and normalize events. For example a Cisco 6500 switch might throw 100 events when an interfere dropped. We could roll 90% of them with the filter. Another device would send a junk “interface down” alert periodically… except an attribute would say “is_down=false” lol
So we pulled in our business-artifact mapping system (this would be ServiceNow today), the on-call rosters, the network topology, runbook/kb, and some other goodies and grabbed the attention of the right person at the right time with specific guidance about what to do.
Basically if a switch, server or critical app failed, we immediately knew what system was impacted, the scope of the impact, who to inform and who to call to resolve. Eventually we expanded it to batch non-critical failures and schedule repairs during outage windows and identify specific dev teams for components of larger apps.
I left after that. It was a fun project and a big break for me, all because I was the only person who had heard of prolog in a happy hour conversation!