One of the most important activities that any enterprise looks to achieve is keeping their systems running 24×7, also termed as Production Support.
It is an extremely crucial link seen by businesses to ensure efficiency. So it becomes all the more important to run Production Support in an efficient way.
What to Explore
1. Proactive Monitoring & Alerting
Proactive monitoring is important to make sure corrective steps are taken to prevent issues like not meeting SLAs (Service Level Agreements). More focus in this area will certainly reduce downstream issue resolution ticket volume. Monitoring does not necessarily mean keeping eye on each and every job constantly. It is recommended to look for automation like developing an automated script that will send alerts to the support team periodically. By building checkpoints into the job stream, you’ll minimize your monitoring efforts.
2. Prioritize and Permanently Fix Issues
Applying a “permanent fix” is the mantra the team needs to adopt while fixing the abends. Temporary fixes create a system that is “patched together” and susceptible to inefficiencies. Another issue to implementing fixes is managing the prioritization of failed job streams. Priority should be based on the criticality of the streams. Typically upstream applications need to be taken up on priority than down streams. Proactively extrapolating issues encountered to other applications and fixing those as part of the preventive measures before even an abend happens, would be an approach that an experienced team would always take.
3. Provide Timely Status Updates
Effective communication involves all of the stakeholders. This is key to successfully managing production support. Open lines of communication automatically reduces pressure and gives more time to fix issues rather than going into explanatory meetings that can last hours.
A best practice in providing updates on critical abend situations is:
- Acknowledge the issue
- Analyze and communicate ETA
- Send status updates every 15-20mins in case issue fixing takes longer. Standard template(s) usage for these updates is recommended as the message is then clearly conveyed.
- Notify all stakeholders on the resolution
4. Reporting
Reporting needs to be broken in two parts – operational reporting and performance reporting.
Operational reporting would typically include end of job stream statistics, all issues encountered during the run with their status and finally the SLA status (Met/Missed). It is recommended that such operational reports, weekly and daily are automated to bring inefficiencies.
The performance reporting would include metrics such as “quality of resolution” (first time right), job stream stability improvement suggestions/recommendations, number of jobs vs abend percentage vs number of resources (see below samples). Such a report helps in quantifying the value and ROI of the support service.
Production Support Performance Measurement – Sample Metric Reports
5. Standard Operating Procedure (SOP) document
Having a detailed standard operating procedure ensures that support is carried out in an expected manner. It standardizes the processes and provides step-by-step instructions that enable each support team member to perform tasks consistently across the workforce. The SOP document must include:
- Escalation procedures
- A communication framework consisting of communication modes with other support teams and stakeholders
- A listing of critical jobs and job streams with associated SLAs
- The team composition information with associated contact information
- Key infrastructure teams that are needed such as DBAs, Administrators, and Development team leads
- References related to restore/recovery procedures which would come handy especially when there are major installs going
- And basically, anything that could come handy during a crisis situation!
As part of the SOP, it is important to document the exact shift handover process that is conducted between shifts and the adequate time allocated to these handovers. The recommendation is to follow a standard handover template which will ensure information is not lost in transit.
It’s also a good idea to build a knowledge base of the known issues and how those were handled. Using the SOP document and appropriate training becomes a critical aspect of overall production support management. It is the key to a motivated team ready to take on the rigors of production support.
A quick tip: It is very important to revisit SOP on regular basis to keep it current with the practices followed by the team.
Using the Five Keys to Achieve Return on Investment
Bitwise Production Support Framework
With experience and expertise in supporting mid-size to large production environments; Bitwise has laid down a production support efficiency framework that ensures definite ROI covering various aspects where the focus is generating value by “Doing More With Less”.
Below stated aspects focus the ROI purely from a production support services standpoint:
Optimal Staffing Model:
Tailor-made staffing model meeting the specific needs of the client organization ensures cost reductions of about 50% to 75%
Application Availability:
Innovations, optimizations, and automation built through monitoring and altering process, job run times, schedules, etc. can ensure meeting of SLA’s up to 99%
Application Stability:
Continuous focus on fixing the jobs permanently through corrective and preventive measures, proactive planning for system outages limit the abends and can increase the stability of the application from 98% to 99.5% over a period of time
Team Performance:
Team performance is achieved through persistent focus on incorporating the culture of first-time-right, collaborative approach, continual improvements, and rewarding contributions. This ensures an increase in the team’s productivity to take on more responsibilities.
Proactive Risk Identification and Mitigation:
With insight into the environment and proactive approach of the production support team ensures identification of risks and mitigation recommendations in terms of application availability, stability, and performance.
If you are facing any challenges on support, please feel free to contact us. We will be more than happy to help you.
Recommended Content
You Might Also Like
Data Analytics and AI
5 Essential Steps to Assess Your Readiness for Microsoft Fabric Adoption
Learn MoreETL Migration