The Challenge with Outsourcing IT Operations and Workload Automation
March 14, 2018 7:29 am
Many organizations outsource the operations of their IT Operations for various reasons, including lowering operating costs, improving service, and freeing up resources to address more strategic needs in the organization. While many organizations claim to meet the goals of their outsourcing arrangements as a whole, typically several areas within operations don’t meet expectations. One specific area where I’ve seen this happen, more often than not, is in the area of workload automation, particularly in the areas of monitoring and reporting. Why is this area prone to service quality issues in outsourcing arrangements? There are multiple reasons and they tend to be centered around process maturity within the organization and the skill sets of the service provider.
Let’s take a look at these challenges with Workload Automation and some possible solutions.
Challenge Number One: Process Maturity for Workload Automation
Many organizations have made large strides over the last several years to align their IT and business processes. This is a result of software vendors offering tools and services that provide this alignment through an additional layer of abstraction. This happens either within the individual tools themselves, or within a single tool (manager of managers) that resides on top, making sense of all of the complexities within the individual tools and providing a correlated view. This correlated view can be configured to provide various perspectives based on business processes, applications, business units, or other groupings – depending on how the business operates. These perspective views can then be monitored to help identify problems that are impacting overall processing. When problems occur, you can then drill down to the root cause in order to effectively and efficiently resolve the issue through the use of individual domain tools by subject matter experts.
The ability of these individual tools to provide information to this manager of managers is critical. It must have the ability, in turn, to expose the operational data that it is consuming in a way that reflects the underlying perspective that is being used. This allows operators to speak the same language as the subject matter experts. This is what I refer to as the DevOps conundrum: How can you get operations and development speaking the same language? To add to this complexity, companies have outsourced operations (or development as well) to different service providers who are tasked with the challenge of bridging this communication gap.
While many of these operational tools have evolved to “play nicely” within this set up, some have not. Workload Automation (aka job Scheduling) definitely has not. This is due to the legacy nature of the tools themselves, as well as the lack of investment that workload automation vendors have made in their products. These tools were not originally built with the thought of business process alignment in mind. Their data models were built with a focus on speed and reliability, not on business process alignment. They were built to be domain specific, without consideration for adjacent tools and the greater business process. It’s also due to the growing complexity of the processes that have been built with these products, over time. After many years of organic growth, these processes have become nearly unmanageable. With little or no ability to dissect hundreds or thousands of jobs that make up these job scheduling processes, customers and service providers are left playing a guessing game, trying to find root causes and fix newly surfaced problems on the fly. Basically, developers are flying blind because there are little or no capabilities to design, build, test, and analyze their workload processes. When something invariably does go wrong, it’s like finding the proverbial needle in the haystack. As if this wasn’t difficult enough, imagine then what happens when these processes are turned over to a third party to manage. The results are sub-optimal, to say the least.
Challenge Number Two: Service Provider Skills for Workload Automation
Another major challenge I’ve seen through my interactions with customers involves knowledge and skills. The knowledge base and skill set around workload automation is definitely shrinking. Along with that, the traditional skill sets people do have with workload automation haven’t evolved. It’s not all about how to design and build a workload schedule. Nor is it all about how to manage it either.
Specific product knowledge for the market leading workload automation products from companies like IBM, CA, and BMC has definitely declined over the last decade. These three companies probably make up at least 70% of workload products being used in the market. I think the decline of knowledge is mostly due to aging in the workforce and the lack of “glamour” associated with the workload market and its tools. From an aging perspective, many of the people who chose a career as a specialist in these technologies are retiring or leaving the workforce. To compound that, being a domain expert in workload automation isn’t considered to be glamorous, so the younger workforce is either not drawn to it at all, or simply look at it as a stepping stone to something they consider bigger and better. Most of the people that I know who are considered specialists have been involved with workload automation for over 20 years. For others, the role is a secondary responsibility they’ve been tasked with because there is nobody else to address it.
A lot of companies think they can address this situation by simply outsourcing it, along with the rest of operations and/or development. Guess what, the service providers face exactly these same challenges with skill sets and knowledge. They also cannot train people overnight on the knowledge they need to manage a workload environment. Additive to this challenge is the lack of business and workload alignment I mentioned earlier that exists between development and operations – especially when you’ve outsourced one area and not the other. Even outsourcing both, the challenge still exists because you’ve simply moved the “situation” to another team – one which likely also lacks the requisite underlying knowledge and skills.
What’s the solution?
While I know there are other factors that contribute to the challenges I describe here, these two are the most common that I’ve experienced in my thirty plus years in this space. While I’ve seen several innovative companies solve this (with much pain and expense), the majority that I see continue to struggle. My personal opinion about the ones that struggle is that they actually don’t really understand the problem. Then there are others who do understand, but don’t have the resources to address it – and then eventually an outage or other catastrophe occurs with the business. Usually, that’s when my phone rings!
Education usually tops the list when it comes to addressing this part of the problem. This includes educating on both the technology and the business processes. However, even the educated continue to struggle, due to constant change, the legacy of the workloads, and the lack of a system to understand and manage their workloads’ dynamic nature and complexity. The workload products being used today do very little, if anything, to address this underlying problem, and so are themselves a large part of the problem. There is also a large gap in understanding between the business’ operational requirements and business requirements. This gap primarily emanates from how these workload processes were designed, some of them many years ago, mainly with operational requirements in mind. The organization’s lack of understanding of their own workload processes is also due to a lack of consistency in defining procedures, as well as turnover within the organization that is responsible for managing them. Typically, there is little documentation, or the documentation that exists is outdated.
As I mentioned above, the deployed technology itself contributes significantly to these problems. The technology I am referring to is the Job Scheduling products that are used to develop, monitor, and manage the workload processes. Many of these products are decades old, and the model that is built into them is based on outdated operational methodologies, with little or no consideration for business requirements. The vendors can’t change these models because they are embedded deeply into the products and their architectures. Changing them would require a large investment and a painful migration process for the thousands of customers who use them. Instead, many customers continue to use these outdated products, often along with inhouse developed band-aids that sometimes help, but also usually create even more complexity. All the while, these products continue to run on the “edge of their seat” on a daily basis
Our company, Terma Software, helps companies address these widespread workload problems through an approach which uses both our optimization methodology and our proven analytics platform. Terma has developed a methodology called the Workload Service Model, that is based on years of experience working with customers using enterprise workload automation products. This methodology is supported through our market leading Workload Analytics platform, Terma Analytics. We don’t replace your job scheduling product like many of the workload vendors propose you do, swapping one set of problems out for another. We provide a layer of business intelligence over your legacy job scheduler that allows you to apply both a business and operational model to your workload. Our methodology addresses the needs for both operations and development, as well as the end-users and customers who they support. Some of the capabilities within the business intelligence layer we provide include predictive monitoring, SLA definition and management, workload modeling and analysis, and reporting. We at Terma like to say that we help customers modernize their workload products and the processes that these products support. I think our many customers would agree with this statement.