ESS-ESO Network Maintenance Policy
The Office of Information Technology, Network Systems and Services Group (NSSG) are responsible for monitoring and maintaining the devices that comprise the ESS-ASB datacenter as well as several satellite locations. This responsibility includes tracking and repairing problems that arise on the ESS-ASB network (and the networks for which NSSG is responsible in the satellite locations), performing installations of new network devices, and performing maintenance on existing devices.
The networks for which NSSG is responsible are composed of a variety of network devices that provide connectivity for customers which allow accessibility to the Rutgers University Business and Student Services housed in ESS-ASB, single network segments in Annex I and the Field office, and the ESS-TD systems and services LANs. These devices include switches, firewalls, VPNs, content services and ssl accelerators. As with all forms of computer equipment, these devices need to be maintained. They require regular maintenance (hardware and software) upgrades, and in the event of a failure, replacement. In addition, certain devices require changes to configurations to support applications housed within the network segments for which ESS-NSSG is responsible (acl changes for example).
ESS-NSSG performs operations that potentially impact the services provided through ESS. In an attempt to minimize disruption in service, NSSG works within announced timeframes/windows when possible. The purpose of this document is to clarify the policies and procedures governing maintenance work on the networks for which ESS-NSSG is responsible and the communication of issues that occur on these networks.
A disruption consists of any event or condition that may negatively impact either performance or service to our customers. Disruptions are classified as one of the following types:
Transparent - Transparent maintenance should not impact network services.
Degradation - Degradation is a reduction in network service to a device or area. Degradation can occur with respect to connectivity or other services that are provided on the network. The service or network is still functioning, but it is functioning in a diminished capacity.
Outage - An outage is a loss in network service to a device or area. An outage can occur with respect to connectivity or other services that are provided on the network. The service or network is not functioning, and there is a total loss in connectivity.
Disruptions can be caused by failure of a network device or server, power outages, failed cables, lost configurations, improper configuration, failure of a service, denial of service attacks, and a variety of other activities that can occur on a network. In all cases, a disruption will result in maintenance activity.
Switch outages - Since the switch is the connecting point for the equipment to the rest of the network, if a switch outage is scheduled, everything that uses the switch will also be impacted.
Firewall outages - Different firewalls within ESS are setup to handle outages differently (see list below). The one common setup for a firewall outage will be access from outside the firewall to a system behind the firewall (a service name or a server) will be denied during the firewall outage.
FWSM firewall outage
- Any traffic from or to the web servers will fail.
- Any traffic from or to any systems on the inside (databases, servers, desktops, etc) to access outside ESS-ASB will fail.
- Any traffic from or to any systems on the inside to other systems on the inside will be unaffected (for example: database to mainframe traffic, or desktop drive mapping authentication
WOLP, Annex I firewall outage
- All traffic will fail during an outage.
ESS-VPN Tunnel firewall outage
- Access from ESS-ASB from or to Annex I will fail.
- Access from Annex I from or to ESS-Hill desktop address space will fail.
Hill firewall outage
- Any traffic from or to the servers that reside behind the firewall will fail.
- Any traffic from or to the workgroup address space will fail.
- Any traffic from or to the address spaces behind the ESS-VPN tunnel will fail.
- Any traffic from or to the Annex I address space will fail.
CSS outages - Any traffic from or to the web servers will fail during the outage. For example, if the test-css is out, all test VIPs and test servers will be unavailable, but all production VIPs and production web servers will be unaffected. Similarly, in the above scenario, all database and desktop traffic will be unaffected. Note that some non web-server applications that reside on both the production and test tiers will also be impacted by a CSS outage. These include:
- CMAN (dbcons-t1.rutgers.edu)
- SFTP (stage.ess.rutgers.edu)
- SMTP (smtp.ess.rutgers.edu)
- Data at Rest (dar.rutgers.edu)
- Mainframe 3270 emulation (rutadmin.rutgers.edu)
- CMAN (dbcons-t2.rutgers.edu)
- Mainframe 3270 emulation (test-rutadmin.rutgers.edu)
- CVS sftp (cvs.ess.rutgers.edu)
All work performed on the networks for which ESS-NSSG is responsible is conducted as a form of maintenance. This work may or may not result in a disruption of service, depending on the scope of the activity. There are two types of maintenance activities:
Scheduled Maintenance - Proactive work to address service enhancements or changes, architecture modifications, infrastructure upgrades, equipment replacement or reconfiguration, etc. Regularly scheduled maintenance occurs as follows:
Production Network Maintenance (devices which could potentially affect production services)
- Tuesday mornings between 0700 and 0800
- Thursday mornings between 0700 and 0800
- Thursday evenings between 1700 and 2200
Development/Test Network Maintenance (devices which could potentially affect Dev/Test and internal ESS systems)
- Tuesday mornings between 0700 and 0800
- Thursday mornings between 0700 and 0800
- Thursday afternoon between 1200 and 1300
Emergency Maintenance - Reactive or proactive work to address an extant service disruption or credible threat thereof. This includes responding to power failures, device failures, security vulnerabilities, etc. While these activities will be announced and consideration given to lead time, remediation of disruptions or mitigation of credible threats will receive precedence.
All disruptions and maintenance activities are communicated to ESS staff. NSSG staff will provide an electronic notification regarding scheduled maintenance on the Friday prior to the week of scheduled maintenance. Scheduled maintenance will also be announced at the regular Weekly Review Meetings (held the first business day of each week). It is the responsibility of the ESS Service Manager(s) to disseminate scheduled maintenance announcements to parties outside ESS if need be (ex. Business users which may be affected).
Announcements attempt to clearly indicate the nature of the event, and where practical, include a summary of the event, network status, start time and end times, and the list of affected devices and/or services. Announcements regarding the activities occurring on the network will take place before and after an action has taken place. The following is a list of announcements that take place prior to activity:
Regularly Scheduled Maintenance Announcement - Made in advance of work to be performed. An initial announcement will be made on the Friday prior to the work date. A second announcement will be at the regular Weekly Review Meetings.
Emergency Maintenance - Made when a credible threat of disruption in service requires work to be performed on a network or service. The announcement will be made upon determining the work necessary to restore the network to normal operation.
The following is a list of announcements that take place once maintenance or a disruption has been resolved:
Resolved Maintenance Announcement - Made when scheduled work has been completed.
Resolved Disruption Announcement - Made when a degradation or outage is resolved. A final notice will be sent to communicate that the network has returned to normal operation.
As with any scheduled maintenance, it is the responsibility of the ESS Service Manager(s) to determine and coordinate testing of services upon completion of the maintenance window(s). NSSG recommends that the testing takes place as soon as practical.