Scope of the Case Study
Centralized Monitoring of Data Centers and Retail Stores
Introduction
A global leader in cosmetics and beauty, required an advanced centralized monitoring solution to oversee their expansive IT infrastructure. With 10 data centres and 850 retail store locations relying on seamless network performance, the organization faced the challenge of efficiently monitoring over 2,000 devices across diverse locations. The client partnered with Tetra to deploy and optimize Nagios XI on Azure, ensuring robust monitoring, alerting, and reporting capabilities tailored to their unique business requirements.
Challenges Faced
- Decentralized Monitoring Systems: Existing monitoring lacked a centralized framework, leading to inefficiencies in oversight.
- Complex Network Infrastructure: Monitoring 850 stores, each with unique configurations, and integrating Cisco Meraki devices required customization.
- Alert Overload: Ineffective alerting mechanisms hampered issue prioritization and resolution.
- Limited Dashboard Insights: Absence of a unified and intuitive dashboard for real-time visibility across data centres and retail locations.
- Integration Gap: Lack of integration between monitoring systems and ServiceNow for incident reporting.
Project Scope & Objectives
The primary goals of the project were:
- Deploy Nagios XI on Azure to enable centralized monitoring.
- Integrate Cisco Meraki devices with Nagios for advanced performance insights.
- Establish seamless alerting through ServiceNow integration.
- Create actionable dashboards for management and technical teams.
- Build a secure and scalable monitoring environment.
Key Solutions Delivered by Tetra
1. Deployment of Centralized Monitoring
- Set up a single Nagios XI Enterprise Server on Azure to monitor all data centre and retail store locations.
- Ensured robust SNMP trap implementation for device communication and performance tracking.
2. Integration of Cisco Meraki Devices
Tetra developed custom APIs to integrate Meraki devices into Nagios XI, enabling advanced monitoring capabilities such as:
- Client Metrics: Device connection stats, security events, and latency monitoring..
- Device Performance: Network uplink performance, bandwidth usage, and latency history.
- Wireless Health: Failed connection tracking, latency stats, and connection status monitoring.
- Organizational Metrics: Device status, license state, and traffic analysis across networks.
3. Advanced Dashboards and Reporting
- Created Nagvis dashboards tailored for management and operations teams, providing insights into bandwidth, latency, and device performance.
- Mapped parent-child relationships between network components for better correlation and root cause analysis.
- Enabled Business Process Intelligence (BPI) dashboards for process mapping and real-time tracking.
4. Integration with ServiceNow
- Linked Nagios alerts with ServiceNow for automated incident creation and escalation.
- Mapped escalation matrices for prioritized email notifications to stakeholders.
5. Documentation and Training
- Developed comprehensive documentation, SOPs, and blueprint guides for seamless operation.
- Conducted training sessions and walkthroughs for identified personnel, ensuring a smooth handover.
- Designed secure, hardened environments for dashboards and reports.
6. Post-Deployment Support
- Provided one month of remote support post-deployment to address any operational challenges and fine-tune the setup.
Architecture Overview
- Deployment Platform: Azure-hosted Nagios XI instance.
- Devices Monitored: 2,000+ devices, including Cisco Meraki switches, routers, and firewalls.
- Integration Points: ServiceNow for alerting and incident management, and custom Meraki APIs for detailed device monitoring.
Monitoring Features Enabled
1. Network Monitoring
- Latency and bandwidth tracking for devices and networks.
- Historical data analysis for device performance optimization.
- Parent-child mapping for better device correlation and troubleshooting.
2. Custom Metrics with Cisco Meraki APIs
- Client Metrics: Security events and connection stats.
- Device Metrics: Loss and latency history, uplink performance.
- Wireless Health: Latency stats, failed connections, and client connection stats.
3. Real-Time Alerting and Dashboards
- Integrated alerting through ServiceNow for immediate issue resolution.
- Created dashboards highlighting key metrics such as bandwidth, latency, and wireless health.
Results Achieved
- Centralized Monitoring Efficiency: Unified monitoring of 10 data centres and 850 store locations through a single Nagios XI instance.
- Improved Incident Management: ServiceNow integration streamlined incident reporting and escalation, reducing resolution times.
- Advanced Visibility: Dashboards provided actionable insights into network performance, device health, and organizational metrics.
- Custom Monitoring: Tailored Cisco Meraki integrations enabled detailed tracking of device and network performance.
- Operational Readiness: Comprehensive documentation and training empowered the client team to manage and scale the system effectively.
Conclusion
By deploying a centralized monitoring system for the client, Tetra transformed how the organization managed its IT infrastructure. The integration of Nagios XI with Cisco Meraki and ServiceNow, combined with customized dashboards and automation, enabled the client to maintain high performance and reliability across its data centres and stores.
Partner with Tetra to redefine your IT monitoring strategy. Let us help you streamline your operations, gain actionable insights, and achieve seamless infrastructure management.