Identifying your organization’s critical business functions is foundational to all kinds of strategic planning initiatives. Every organization has a definitive list that they review at least annually, right? Not so much, it turns out.
Most lists of critical business functions are too high-level to be actionable. In a world where billions of assets are connected to the internet and each other, the definition of critical business functions must include dependencies that could contribute to a larger failure if they were damaged or disrupted. This is especially true of operational technology (OT) and Internet of Things (IoT) assets that control physical and digital processes, which often are omitted entirely from such lists.
A thorough discussion of critical business functions would include the importance of performing a business impact analysis (BIA) and creating business continuity and disaster recovery plans — all essential practices beyond the scope of this post. Here, we address the more immediate problem of knowing where to begin. We define critical business functions for industrial and cyber-physical environments, including critical infrastructure, and the importance of identifying and prioritizing them so you can allocate adequate resources to protect them.
What Are Critical Business Functions in OT/IoT?
Every organization has its crown jewels — the core processes and activities that enable them to generate revenue or otherwise fulfill their mission. These are the assets that need the most protection. In industrial organizations, OT and IoT assets manage these crown jewel processes, often autonomously. They control processes that, if compromised, could significantly impact the organization’s operations, reputation, financial health or very existence.
Critical business functions are defined by each organization’s unique purpose; they can’t be dictated externally. The most recent version of the TSA Security Directive for Pipelines, TSA SD Pipeline-2021-01D, illustrates this point. Among other minor updates, it adds this vague definition of what it calls “business critical functions,” perhaps in response to calls for clarification:
“Business critical functions” means the Owner/Operator’s determination of capacity or capabilities to support functions necessary to meet operational needs and supply chain expectations.
Itemizing and ranking your critical business functions is a start, but it won’t get you very far toward the goal of protecting them. Even when conducting a BIA, too often organizations stop at a very high level, without collecting detailed information and mapping functions to specific services, applications, hardware and employees.
For example, in a manufacturing organization, critical functions typically include production, quality control and distribution. To be meaningful, many more details must be known about each of these functions and their dependencies. Within production alone, can you answer:
- What specific machines, processes and facilities are critical?
- What programmable logical controllers (PLCs) are used?
- What power supply units cannot fail?
- What engineering workstations are vulnerable to attack?
- What sensors, cameras, wireless and other IoT devices enable robots to operate and communicate autonomously?
Downtime Impact: A Proxy for Criticality
It only takes one thorough tour of an industrial plant to understand why downtime is verboten. Especially in industries where continuous production is the norm, unexpected downtime (whether due to a natural event, technology failure, cyberattack or anything other than scheduled maintenance) can be costly. This is the case in pharmaceuticals, chemical manufacturing, paper and steel mills, oil and gas refineries, and more.
If a continuous production process goes down, you can’t just resume operations with the flip of a switch. In addition to evaluating safety risks, operators must recalibrate the machinery, conduct test runs and gradually increase production rates (which produces waste) while monitoring critical parameters such as temperature, pressure and speed until consistent output is resumed. Don’t forget that some system components are decades old and have bespoke, “as-operated” configurations that exist only in a veteran employee’s head.
IoT devices present similar challenges. Connected to the internet by definition, they collect and exchange data with other devices in their environments to automate processes without human intervention. If disrupted they usually revert to their factory settings and require reconfiguring and recalibrating in a specific order to resume functioning as desired, sometimes painstakingly.
Determine Criticality Based on Recovery Time Objective
A useful metric for determining critical business functions is recovery time objective (RTO): the maximum acceptable time that a function can be unavailable or disrupted before it causes significant damage to your organization. This calculation is a key part of a formal BIA, but even a guesstimate is helpful. If recovery time to restore the function is long and the tolerance for an outage is low, the function is critical. Typically, critical functions are essential for life safety, to fulfill legal obligations or to maintain core business operations. As such, they must be resumed immediately or within a few hours. Per above, the time it takes to restore OT and IoT devices to optimal conditions may exceed RTO parameters.
Borrow From the Technical Resilience Navigator
If you’re ready to get serious about identifying your critical business functions, the Federal Energy Management Program (FEMP) offers a step-by-step approach to doing so as part of resilience planning. While designed for energy and water organizations, the FEMP Technical Resilience Navigator (TRN) is suited to any organization that manages critical services.
“Site-Level Planning Action 4: Identify Critical Functions” offers practical instructions for mapping functions to facilities and weighting their criticality. A downloadable resource document provides dozens of examples. It doesn’t call out where OT and IoT dependencies exist, but we’ve extracted a few entries below and added obvious dependencies. Can you think of others?
Identify Your Critical Business Functions While Also Protecting Them
Automating your OT and IoT asset inventory is the first step in cyber resilience; you can’t protect what you can’t see. It’s also the only way to uncover all components of your critical business functions. The Nozomi Networks platform is purpose-built for OT and IoT environments. It enables you to:
- Discover assets you didn’t know you have. Typically, organizations that go from a manual inventory process to a real-time automated asset inventory discover they have at least 30% more assets than they thought they had, most of them OT and IoT. You gain complete visibility into the true extent of your asset infrastructure, including which ones are interrelated or interdependent.
- Collect granular information on every asset. This includes device type and function; hardware manufacturer, model, serial number; operating system or firmware version; network address, Mac address and host name; communication protocols used; applications running; and vulnerabilities and risks.
- Understand traffic patterns. Continuous monitoring captures what devices are talking to each other, in or outside of policies, so you can see where segmentation may be needed, especially between OT and IT networks. It also shows which assets are connected to the internet and therefore vulnerable.
- See who’s connecting and from where. Industrial environments depend on a steady stream of third-party vendors who service their own equipment, much of which touches the crown jewels. Technicians increasingly remote in to perform maintenance or upgrades, often without secure connections.
- Baseline normal behavior and detect anomalies. Before you can determine whether anomalous behavior is caused by a cyber incident, misconfiguration or other irregularity, you must establish a baseline of normal activity. This is a laborious undertaking if done manually but is ideally suited for machine learning (ML). Our network, wireless and endpoint sensors use ML to learn the normal behavior of OT and IoT process variables collected from network traffic. Once these baselines are established, the platform uses high-speed behavior analytics to monitor and detect anomalies.
Protect Your Critical Business Functions, and Everything Else
Identifying your critical business functions gets to the heart of cyber resilience. It’s not enough to list them and check the box; you need to identify all OT, IoT and IT assets that enable those functions and understand their context.
The good news is that automated tools don’t care whether an asset or process supports a critical business function or not. The Nozomi Networks platform automates the continuous discovery, visualization, monitoring and threat detection for all assets in industrial control networks — including those that may or may not enable critical business functions. It’s up to you to factor organizational risk tolerance into your policies and tuning of alerts. That alone takes work. But it’s a lot easier when you can see what’s happening in your environment.