Agile teams need agile infrastructures
Agility, DevOps and Continuous Delivery creates new dependencies between software development, application life-cycle management and IT Service Management (ITSM). These innovations therefore come with new requirements on the infrastructure as well as on related ITSM teams. Agile teams need agile infrastructures.
The technology stack for Continuous Delivery has become a highly complex piece of “machinery”. It includes everything from source code management via build systems and artifactory repositories, to automated testing and deployment staging, from development and test environments to production. Each element must be optimally controlled.
If “happy hackers” are left to deal with these challenges on their own, the solution will be geared towards the needs of a specific software project or team. To create repeatable deployment channels for re-use across your portfolio, ITSM experts must participate from an early stage.
In fact, today software packages increasingly constitute an infrastructure universe of their own. If developers package applications in Docker containers there will be an impact on ITSM. The impact will be greater still if you deploy containers via a Cluster Management solution such as Kubernetes. You achieve auto-scaling, service discovery, load balancing, rolling upgrades and the like, and thereby automate what previously required a manual ITSM procedure.
To support such scenarios, you have to establish a well-oiled and fine-tuned production line. From the first stages of system development to the final stages of service delivery, you must achieve automated quality assurance. And this is not possible unless your ITSM team works closely with your developers. Without a well-informed and actively involved ITSM team, agile development will rarely reach its full potential.
It is about people, organization in addition to technology
The transformation to Continuous Delivery impacts not only your developers and ITSM staff but the entire IT domain, its management, the DBAs, the business analysts, the user experience designers, the quality assurance experts – all the different competencies involved in the production of your software solution. If your current engineering, QA and operations work in isolated silos, there will be cultural barriers as well. While developers focus on churning out new features – i.e. to achieve change – your operations team strive to keep systems stable – which means avoiding change by all means.
The goal of your cultural revolution is to achieve repeatability and to shorten the cycles it takes to deploy new features into production.
From Continuous Integration to Continuous Delivery
The shift was introduced in the 1990s with Continuous Integration and test-driven and agile software development. By breaking down feature lists into a backlog of manageable tasks, agile development models strive to complete functional versions of their software at the end of every sprint. Sprint cycles typically last a week or two, sometimes more. This means Development can hand-over a new version of the software at the end of every sprint to your QA and/or Operations teams.
By and by, Continuous Integration evolved with more automation. By enhancing build systems such as Jenkins or TeamCity developers automate unit testing and functional testing on-the-fly. With such features in place, a vision of Continuous Delivery seems within reach.
However, most organizations were and are still not ready for this change. So far, only the first part – Development – has become truly agile. Between Development, QA and Operations, most organizations still have solid walls. This leads to a continued waterfall model between development and production stages. Every error found during Integration testing generates yet another sprint iteration of one, two or more weeks.
Release management still important
Continuous Delivery means you are potentially able to release new versions every day, as developers check in their code changes. However, this is rarely what you would like to happen.
A release needs to meet other criteria as well. You need to consider the organization’s marketing strategy, possible contractual constraints and other non-technical aspects. Continuous Delivery simply means you have the ABILITY to release every time developers check in their code into the trunk.
In Continuous Delivery nirvana you furthermore must collect data and provide metrics to control your flow of new features into production. Measurable quality and sufficient meta-data help your non-technical teams determine by when a continuously delivered piece of software is ready for release.
At Data Ductus we have been successful in synchronizing our system development and IT service management teams into new engineering practices to achieve a faster and more secure service delivery. We are happy to share our experience with you. In fact, we have it ready packaged this in a DevOps workshop offering.
Do you need expert help with your DevOps and Continuous Delivery channels?
SIAM and the art of multi-sourcing
Ten years ago, IT organizations debated whether to outsource and what to outsource. Today, this question makes no sense. In the era of cloud computing, all IT organizations do multi-sourcing in one way or the other. And this has a profound effect on how you manage your IT service delivery. Service Integration and Management (SIAM) offers a relevant new governance model.
Best practices and process description frameworks, such as ITIL (Information Technology Infrastructure Library), matured over many years. However, this was at a time when organizations had one or a few major outsourcing partners. Today, organizations employ new multi-provider sourcing models, while changes in IT consumer behavior have changed considerably.
ITIL Strategic Partners
Diagram 1 illustrates the ITIL section Supplier Categorization And Contracts Database (part of ITIL Service Design). Strategic suppliers involve “…senior managers and the sharing of confidential information for long-term plans…”. In the past, these were typically outsourcing partners with whom the customer built long-term relationships. In the age of cloud computing, changes are taking place, especially with regard to the strategic partner relations.
From processes to automation
Today, organizations are shifting base services from former strategic partners to new IaaS or SaaS providers. The explicit goal is to gain the benefits of a competitive market. It allows a more frequent re-negotiation of contract conditions and, if needed, a transition to providers that better meet the organization’s changing requirements.
Rather than cultivating the relation with one or a small number of strategic partners, organizations can shop services from whatever provider offers the best deal at any point in time.
And indeed, from an ITIL perspective, the shift to cloud services has a much broader impact. In many respects, ITIL helps establish governance in manual processes. By comparison, cloud-based business models strive to replace these manual processes with automation. And while new automated processes also require governance, the approach is different.
The chaining of SLAs is complicated as is in ITIL terms, as the second diagram illustrates. Third-parties and subcontractors are involved on many levels. Replicated across many service providers, the SLA chaining model becomes overwhelming. A new model for multi-sourcing management – or rather a complementary layer building on an existing model - is required. The good news is that such models are emerging.
Service Integration and Management (SIAM)
Service Integration and Management (SIAM) is a governance layer based on ITIL. It explicitly addresses the challenge of multi-sourcing management. Diagram three provides an overview*:
* The image is based on an illustration in the White Paper “An introduction to Service Integration and Management and ITIL®” by Kevin Holland, available at the web site of Axelos, the organization behind ITIL.
Between internal service consumers and related service providers, a centralized and separate management layer is introduced: the Service Integration and Management (SIAM) function. Thus, the organization’s service consumers don’t have to interact with multiple service providers for their service requests, support issues, incident or problem management procedures, etc. Instead, the SIAM function takes on the responsibility of orchestrating the services for the organization.
Innovation leadership becomes an important side-effect. To improve and innovate a service combining multiple service providers, you need to have a holistic perspective. From the perspective of the individual supplier, you simply do not have sufficient overview to understand how the organization’s future needs will evolve. The SIAM function, however, has this perspective and a self-interest to ensure innovation helps simplify service consumption while adding business value.
Cloud operations introduce new risks
Beyond SIAM and ITIL, IT governance frameworks also have gaps when it comes to cloud services. The use of cloud providers means a greater dependency on third parties which, for instance, means:
- Issues in a cloud provider’s external interfaces may propagate to an incident in your organization
- An attack on other tenancies in the provider’s data centers may impact your organization
- Organizational or technical failures in the provider’s (possibly immature) operations become your problem
- Auditing and assurance require the assistance of an independent assurance process
The dynamic nature of cloud services also means that you can expect:
- The location of data processing facilities change dynamically due to autoscaling or load balancing
- As a result, data processing may take place across national boundaries
- The legal framework becomes difficult to understand since multiple jurisdictions may apply
Regulatory compliance can also become an issue if:
- Privacy sensitive information crosses country borders
- Your organization’s contractual obligations to third parties conflict with the provider’s business model
Finally, the dependence on the internet means business critical services are exposed to vulnerabilities in a public infrastructure outside your control.
Weak standards and best practice frameworks for cloud service governance
All of these risks are difficult to capture in SLAs. The ISO/IEC standards 17788, 17789 and 19086:1-4 help define the terminology and frame of reference. But they do not yet provide a relevant assurance framework. For instance, you cannot demand that your cloud provider becomes certified against these standards and then rest assured that everything is under control.
The trend towards hybrid cloud computing models requires a considerable amount of professional common sense. Data Ductus prides itself with the ability to offer a good portion of it. You can read more about this in our boutique cloud technology services section. If we can be of any help, do not hesitate to let us know.
What you buy is what you get (but not always what you want)
When you plan to migrate from a self-hosted to a cloud-hosted infrastructure you probably first of all focus on service providers’ technologies and business models. But the migration also raises a wide range of other urgent questions. As a customer of a cloud infrastructure services, you take on new types responsibilities which your organization may not have prepared for. The shift means innovation leadership and application life-cycle management, especially for homegrown applications, will have to adapt.
The dynamics and the speed with which technical change elevates IT is difficult to capture in Service Level Agreements (SLA). The SLAs, by nature, are static, while applications and services dynamically evolve. Agreements intended to provide assurance that clients get what they pay for can even cause an unintentional conservation of status quo.
You get not what you ask for but what you easily measure through SLAs: server uptime, service request response times, the time needed to fix an incident or a problem, etc. Do these controls ensure you enjoy the benefits of innovation and optimization? Or are they incitements for providers to continue business as usual although the world has already changed?
Application life-cycle management and cloud infrastructures challenges
The figure illustrates how cloud computing models, whether they provide infrastructure (IaaS), platforms (PaaS) or software (SaaS) as a service, leave the client with responsibilities in relation to service consumers. The cloud service provider concentrates on meeting SLA requirements. But this does not necessarily cater for all end-user needs. They may need further support to become productive contributors within their organizations.
Thus, the client’s IT organization still needs to make sure end-users can access and use the service as intended. And if this service involves contributions from multiple vendors, they all need to be coordinated.
The IT department that disappeared
In the “good old days” – or “horrible days” – whatever you prefer, when the organization’s own IT department delivered IT services, end-user’s called “IT” to resolve issues. If the IT staff was service-minded and knowledgeable, this helped solve problems without much bureaucracy. Furthermore, it helped the IT organization to understand the needs of users and the CIO to push for innovation.
Those who remember the horrible times may object that innovation rarely was possible. The IT department was not sufficiently flexible and innovative, or it lacked the necessary resources. Nonetheless, replacing the IT department with an SLA-driven XaaS vendor leads to a cultural shift.
You cannot outsource your own innovation leadership
In our experience, the greater the customer’s responsibility, the greater the challenge. Does the organization run homegrown applications on its IaaS provider’s infrastructure? This typically leads to new requirements on application life-cycle management. The IaaS vendor meets SLA requirements but does nothing to ensure the application is duly maintained. The client’s in-house or contracted developers must play a role in the service delivery chain. Thus, 1st line, 2nd line and 3rd line support procedures require many parties to cooperate.
Innovative and proactive application life-cycle management therefore has to integrate with and complement the cloud service.
At Data Ductus, we are used to situations like these. In fact, we have a whole service area focusing on application life-cycle management both from an ITSM and a systems development perspective. And we are always looking at new ways to ensure clients get the benefits of innovation. And we go beyond the narrow scope of SLA check boxes. If you are interested in exchanging experiences, do not hesitate to let us know.
(Re-) Introducing a Service Desk on-prem
After years of outsourcing Service Desks to service providers, large organizations invest in walk-up support counters. To fully understand this trend, you need to look beyond the Service Desk itself.
For years, in-house Service Desk staffs became redundant in cost reduction efforts as large organizations outsourced their help desk services to external service providers. Some were in offshore locations, available only via email or by phone. But now, of Gartner’s hype cycle for ITSM, 2017 puts “Walk-Up Support” and an “IT Support Live Chat” at the peak of its curve.
Why the Service Desk was outsourced
Did the outsourcing partners do such a poor job? Not necessarily. Of course there are many stories of irate customers struggling with Service Desk staff who have little if no domain knowledge. But many organizations have been satisfied with the services provided by their “long-distance” partners. Thus, a lack of partner trust is not the primary driver motivating this change.
Instead, we should probably rather see this shift in the context of other related trends. Commonly mentioned drivers include:
- End-users expectation levels. Technology savvy end-users with a considerable set personal gadgets expect to be treated with at least the same level of courtesy that they get as private consumers.
- Mobility. With the increased use of mobiles and tablets, the work-force has become more mobile and more exposed to error-prone user interfaces.
- BYOD. The driver many analysts mention is the bring-your-own-device trend which also entails aspects of the above bullet points.
- With digitalization our service requirements change and our demands on the Service Desk staff increase. Business processes that become completely dependent on digitized automation need the support of a staff with considerable local domain knowledge.
Cloud services and multi-sourcing
Although all of the above may be true, we believe another reason is equally important: the changing sourcing and cloud strategies of large organizations. The walk-up counter is not a replacement of other Service Desk channels – it is complementary. Today, our work-place environments often depend on a large numbers of services from a multiple providers – in-house, outsourced or in the cloud. Therefore, at times you have to consult many different Service Desks to solve even a simple problem. It therefore pays off to shield end-users from the complexities of the organization’s service delivery.
A Service Desk as a single point of contact for complex support chains
In organizations with a highly mobile work-force using services from multiple providers, even simple questions, such as password resets or WiFi connectivity problems, become out-of-reach for an external Service Desk provider.
Nonetheless, the walk-up support staff itself remains dependent on the organization’s sourcing partners’ and cloud service providers’ respective service desks.
Thus, we believe this trend can only be understood in the context of multi-sourcing and cloud migrations. These are the same trends which call for a Service Integration and Management (SIAM) layer within the organization. Modern organizations need ways to orchestrate services which in turn depend on many levels of first line, second line and third line support. Providing end-users with the convenience of a single point of contact then becomes key. The on-premise Service Desk thus help large organizations in the orchestration of services from multiple vendors.
How does this relate to your experience? Do you have a different interpretation of this trend? Would you like to compare notes?
Network and application monitoring from an end-user perspective
In any type of assessment, you need to consider the WYMIWYG aspect (“What you measure is what you get”). Thus, in network and application monitoring, your primary focus should not (only) be on what can be easily measured. Typically it’s things like: Is the server running? Can users access the access points? Do users get a 200 success code in response to HTTP request?
However, what you should ideally be measuring are things such as:
Is the user experience satisfactory? Do we meet business requirements? Can we become more proactive instead of spending efforts on incident response?
In other words you should monitor from an end-user perspective.
Network and application monitoring practices typically suffer from two problems:
- Too many false positives. With a focus on what we can measure rather than what we need to measure, we generate huge amounts of alarms that need some kind of (costly) follow-up, even if the problem requires no action.
- Too few accurate problem indicators. In spite of this huge flow of alarms, we fail to verify the most important aspects. Is the user experience acceptable? Do we meet business requirements?
Cloud services and mobility increase the network and application monitoring complexity
In addition, the rapid shift to cloud computing, new sourcing models and new consumption patterns lead to new monitoring needs. Does the Software as a Service (SaaS) provider really deliver up-times as promised? When a server is unavailable, is my Infrastructure as a Service (IaaS) at fault or do I call the network service provider? If end-users complain about homegrown business applications, do they experience flaws in the responsive GUI on their (BYOD) tablets or do we have a back-end problem?
Focus on the user experience
Modern monitoring technologies offer a broad variety of options. We can shift from a focus on protocols and IT components and instead target the user experience and overall business requirements. With agents deployed on clients we can measure what users measure, namely the application performance on their screens. Measuring the key performance indicators that matter from a business perspective we can better control our cloud services, and many other applications and services.
Focus on business criticalities
Legacy monitoring techniques offer great value as long as the objectives are clearly defined. Examples of these include:
- Network components running with factory settings often represent a severe security hazard. If you simply monitor their availability, rather than their availability in a desired configured state, your monitoring creates a false sense of security rather than an adequate security alarm.
- Monitoring server up-time is crucial in an IaaS environment. Yet, it may not be the primary concern from a business perspective. A service running clustered across multiple servers is not necessarily severely impacted by one single server going down. The application service itself, however, must be performing without flaws. Thus, application-level monitoring becomes key, whereas alarms related to the performance of individual servers – rather than service clusters – may be counter-productive.
- A focus on the end-user experience, rather than the health of individual technical components, helps improve performance in IT service delivery.
- By reducing the amount of false positives in ever-growing streams of alarms, IT organizations save costs and deliver more value to the business.
- Well-configured monitoring tools help IT organizations identify threats ahead of time rather than spending time on incident management.
At Data Ductus our experienced teams offer operational and advisory services in this field. If you’re interested, don’t hesitate to get in touch.