Measuring What Matters: OS as a Foundation for Modern KPIs

Overview

Capital projects consistently underperform despite advancements in planning, digital tools, and enterprise reporting. One key contributor is the continued reliance of conventional metrics, originally designed for administrative oversight, to manage and control production. These metrics fail to reflect the true nature of execution: variability, flow, bottlenecks, and system dynamics. This paper proposes an alternative measurement framework based on Operations Science (OS). Drawing from proven principles in manufacturing and complex project delivery, the framework emphasizes leading, prescriptive, and production-oriented metrics. It presents an approach for metric selection, analysis, and control supported by digital modeling tools that enables teams to predict, visualize, and optimize project production behavior in real time. This framework empowers capital project teams to transition from reactive management to proactive control, improving performance, reducing risk, and enabling better outcomes.

Keywords: Operations Science; Measurement Framework; Production Systems

Authors

Chet Carlson

Chet Carlson

Strategic Project Solutions, Inc.

Chet Carlson is the Lead Production System Analyst for Factory Physics Inc. and its parent company, Strategic Project Solutions, Inc. He is responsible for leading Production System Analysis and Optimization efforts to map, model, simulate, analyze, and optimize permanent and temporary production systems. Chet has worked directly with numerous organi ...

Paper

Measuring What Matters: OS as a Foundation for Modern KPIs

Introduction

Measurement is foundational to how organizations understand, evaluate, and improve their operations. In any system, each data point reduces uncertainty and provides insight into underlying dynamics. As Doug Hubbard explains, measurement does not eliminate uncertainty, but it reduces it, improving the quality of decisions even with imperfect information [1]. Accurate, timely, and relevant metrics align strategic objectives with organizational activities, guide behavior through feedback, and enable both proactive and corrective control.

In this sense, measurement serves three essential functions in any enterprise. First, it informs decision-making, offering visibility into system states, performance gaps, and comparing alternatives. Second, it enables control, particularly in production environments, by enabling teams to observe deviations from a target and adjust actions in response. Third, and perhaps most significantly, measurement influences human behavior which creates a ripple effect with production system behavior. Metrics act as proxies for performance expectations, shaping how individuals and teams act. What gets measured gets managed, but unfortunately, what gets measured also gets manipulated. Metrics act as proxies for organizational expectations, and people will naturally optimize to meet them, even at the expense of larger goals. This dynamic is captured by Goodhart’s Law, which warns that when a measure becomes a target, it ceases to become a good measure. The proxy loses its informational value and instead drives behavior that may be misaligned with the enterprise’s true objectives. [2]

Despite widespread advancements in project planning, digital dashboards, and enterprise controls, capital projects remain chronically underperforming. These large, complex and capital-intensive endeavors continue to suffer from schedule delays, budget overruns, and execution inefficiencies. One root cause lies in the metrics used to manage them. Most project environments still rely on legacy performance metrics that were never designed to control or optimize dynamic production systems.

These traditional indicators serve a legitimate purpose for reporting scope, schedule, and budget performance. However, they are retrospective and often counterproductive for managing the real-time behavior of work-in-process (WIP), capacity constraints, variability, and system interactions. They tell us what happened, but not what is happening or what to go do about it. Consider the scenario that the 6 project teams in Figure 1 found themselves in. Clearly each project was facing some significant issues at the point in time these snapshots were taken. However, from these measurements alone, it is unclear what exactly the teams must do to correct course. If this were clear, the deviation would correct itself, not continue.

Figure 1: Conventional Metrics Tell Us Something is Wrong but Not What to Go Do About It

This paper analyzes how commonly used project metrics can misalign with organizational goals and unintentionally encourage dysfunctional behavior. It introduces an Operations Science-based measurement framework, intended to supplement existing metrics such as those used in project controls by offering an operational lens rooted in how work actually flows. Rather than discarding traditional KPIs, this paper argues for a broader measurement philosophy. One that balances administrative reporting with actionable control shifting from purely retrospective observation to real-time intervention when it matters most.

The Persistent use of Administrative Metrics for Project Production

Capital projects remain among the most complex and risk-laden undertakings. Despite decades of investment in planning tools, oversight mechanisms, and digital dashboards, project outcomes have remained stubbornly poor. Cost overruns and schedule delays are still the norm, not the exception. Major studies by IPA, CII, and McKinsey highlight recurring surface-level issues such as poor front-end loading, stakeholder misalignment, and weak continuity. But a deeper and often overlooked cause is the continued reliance on measurement systems that track activity rather than control production. Legacy metrics like Earned Value (EV), Schedule and Cost Performance Indices (SPI, CPI), Percent Complete and Labor Productivity, remain valuable for reporting against a baseline and contract alignment, but are not tailored for production system design and control.

The Role and Limits of Traditional Project Metrics

Conventional project management metrics play a legitimate role in tracking scope, schedule, and budget. They help owners, contractors, and financiers monitor scope, cost, and schedule adherence, particularly in complex contractual environments. These tools support standardized reporting, enable contractual accountability, and offer a shared language across stakeholders.

Table 1. Conventional Project Management Metrics

However, their limitation lies in their application to production. These metrics are fundamentally descriptive; they reflect what has already occurred. They do not model or manage the dynamic behaviors of production systems. They assume that work packages are uniform, linear, and sequenced properly. In reality, earned value accrual is based on accounting conventions, not flow completion, and can give the illusion of the right progress even when key deliverables remain uncoordinated or unready [3]. While essential for administration, they are insufficient for execution.

Why Conventional Metrics Persist

The persistence of these metrics is due not only to utility but also to their deep embedment in contracts, enterprise systems, and reporting tools. They are institutional defaults: familiar and widely accepted. Project Controls departments are evaluated on their ability to deliver EV reports. Stakeholders (owners, financiers, executives) are trained to read status dashboards built from legacy KPIs. Benchmarking practices further reinforce them by promoting what’s common as what’s best—even when these metrics fail to control or optimize performance. As Shenoy notes in [4], thirty years of benchmarking have not produced breakthrough improvements in project performance. Benchmarking identifies common / standard industry practices, but it is not designed to reveal the physics of execution.

Misaligned Incentives and the Gaming of Progress

A key input into EVM is Percent Complete, often governed by negotiated “Rules of Credit,” where partial activities, such as piping installed without flanges or instrumentation installed without testing, may be credited as 80–90% complete. While these conventions standardize progress reporting and facilitate milestone-based payments, they can inadvertently distort reality. Teams are incentivized to meet credit criteria rather than deliver readiness for downstream trades. Activities that are “in progress” or “complete” according to the system may still be blocked by unresolvable constraints, engineering decisions, or missing upstream deliverables, creating invisible “unprogressible” packages. In many projects, this causes the classic S-curve distortion, where 90% of reported progress is achieved by midpoint, but the final 10% drags on for months due to incomplete integration, punchlist resolution, and rework [3]. These behaviors often stem from pay structures, contract milestones, and dashboard expectations, encouraging local optimization at the expense of global flow. As Kerr (1975) warned: “We reward A while hoping for B.” [5]

WIP Inflation and the Illusion of Efficiency

A damaging behavior driven by traditional metrics is the inflation of Work in Process (WIP) between trades. Subcontractors often push for large buffers of completed work before starting their tasks, claiming it improves efficiency: “Allows us to get in and get out”, “Let’s us go fast once we start.” While this may boost their own individual productivity, it causes systemic drag, creating idle time, rework, and mismatches across trades. Capital is tied up in unfinished work, and delays propagate as sequencing falters. In one project, upstream WIP accumulation caused field crews to appear productive, but downstream trades faced saturation, rework, and sequence mismatches. Contractors were busy, but the system was stalled [6].

These behaviors are often reinforced by the structure of contracts. Under lump-sum contracts, contractors are incentivized to protect their own scope and margin, not to optimize for overall flow. This leads to push-based work practices, delayed issue reporting, and completion of work out of context, just to meet payment triggers. The cascading effect is what the authors in [7] call “the work being pushed from one party to another,” often leading to fragmentation and less transparency in the production system.

The Psychology Behind the Persistence

Even when flaws are known, traditional metrics remain sticky due to human biases. Status quo bias leads decision-makers to prefer familiar tools. Overconfidence bias reinforces the belief that metrics used in past projects are still valid. Social proof and illusion of causality (“this metric was used on a successful project, so it must work”) discourage critical evaluation. Managers feel secure using tools that appear objective, even if those tools obscure real constraints. These cognitive biases, documented by Kahneman and others [8] [5], help explain why organizations defend failing metrics. When pay, promotion, or contract fulfillment depend on these KPIs, their use becomes even harder to challenge.

A Gap in Education

One of the most overlooked reasons traditional metrics persist is the way project professionals are educated. This includes both hands-on education (e.g., a professional that did not have the opportunity to secure a university title but learned going through project by project) and professional education (e.g., through undergraduate, graduate degrees and certifications). Most university programs and certification bodies, including those following PMBOK, teach EVM, CV and SV, and Percent Complete as foundational tools and even going as far as to state that operations management is outside the scope of project management because it is seen as a body of knowledge just for the management of how to operate an asset. While these concepts serve an important purpose, students are rarely exposed to the fundamentals concepts of production that govern the projects they will go on to run, such as WIP, cycle time, capacity utilization, and variability, that govern system behavior. This creates a knowledge gap: graduates can track progress but not control production because they don’t know how. As PPI notes, “Era 2 thinking” treats project planning and management as administrative processes, rather than dynamic systems to be optimized. As stated above, those who do not learn conventional project management while in school are often trained within the work environment. There are often project engineers who may be responsible for providing data for EVM without knowing what EVM is.

Changing this dynamic requires enhancing prevailing mental models with an understanding of how production systems behave. To effectively manage and control project execution, organizations must shift from Project Administration and associated controls to Production Systems and associated control. That means rethinking what is measured, why it is measured, and how measurement drives behavior. The next section introduces a set of metrics grounded in Operations Science, that are designed to describe project outcomes and to actively control, optimize, and improve the flow of work across complex project environments.

One fundamental jump that needs to be mentioned is the ability to see project as production systems. This is not a concept readily accepted by those in construction as well as in manufacturing (see Schmenner).

Developing an Operations Science Based Measurement Framework

The first and most important step in designing an effective measurement system is to begin with the end in mind: What are we trying to achieve? Metrics must support operational and strategic goals, not just reporting obligations. Too often, organizations inherit metrics from financial systems, industry templates, or compliance frameworks without questioning their relevance. When measurement is misaligned with purpose, it will most certainly drive the wrong behaviors. For example, rewarding task starts instead of work completions inflates activity and work front openings and earns credit, but slows overall progress as work-in-process (WIP) increases and cycle times extend rapidly (take us longer to complete certain scope of work). Defining the right production-based measurements requires clarity of intent, an understanding of production dynamics, and discipline in aligning metrics to decision-making and control.

Step 1: Select Measurements Based on Decisions

To define meaningful measurements, organizations should start by answering four key questions:

  1. What does success look like, operationally and financially?
  2. What variables control those outcomes?
  3. What behavior do we want to encourage? What are we optimizing for?
  4. What decisions need to be made, how quickly, and by whom?

Next, define the boundaries of what should be measured. Boundaries should be defined based on project or business objectives. In capital projects, boundaries may span the entire scope from engineering release through field installation across multiple trades or focus on a small segment, such as foundation work or module assembly. Clearly defining start/end points, control scope, and system interfaces (e.g., material readiness, inspections, handoffs) ensures metrics are aligned and relevant.

Furthermore, the metrics in this framework, satisfy the following criteria. First, they are scientifically validated relationships that describe the behavior of production systems. Second, each measurement should trigger a decision or action that supports planning and execution. This includes deciding not to change or act on anything, say for when the metrics are within range. Importantly, measurement utility exists on a continuum. As described by Thomas Davenport and formalized in the Gartner Analytics Maturity Model [9] [10], metrics evolve from descriptive (what happened), to diagnostic (why it happened), to predictive (what might happen) and finally to prescriptive (what should be done). Descriptive metrics serve as a foundation for understanding system behavior, but they become significantly more powerful when they guide decisions or influence proactive action. An effective measurement system actively shapes behavior and stabilizes performance when faced with variability.

Figure 2: Metric Maturity Model

Third, they enable optimization to enhance performance in the context of the organization’s objectives. The measurements selected should be able to answer questions like: Is demand being met? How long will the rest of the work take? How much work can/should we commit to today? Where / what is the bottleneck? Is the system overloaded? Is flow stable or erratic? Do we have too many workfronts open? Are we working on too any deliverables at the same time?

Balancing Leading and Lagging Metrics

An effective measurement system requires a balance of leading and lagging indicators. Leading indicators are predictive in nature and change before the outcome or events they are meant to signal. They enable proactive intervention. Think of leading indicators as thermostats; they sense and trigger responses before conditions get too hot or cold. Lagging metrics are retrospective in nature. They reflect an outcome after it has already occurred. They are used for reporting, accountability, or confirmation. Think of lagging indicators as thermometers; they tell you what has already happened (e.g., meaning you already got the fever). By integrating leading indicators, project teams can gain early warning signals and take action before performance deteriorates. The table below summarizes the key differences between these indicators.

Table 2: Characteristics of Leading and Lagging Indicators

The following is a non-exhaustive list of metrics that support planning, execution, and optimization of production systems.

Table 3: Production-Based Metrics

While each of these metrics supports production planning, execution, and optimization, their utility depends on how and when they are measured, and whether they are embedded within a production model or simply observed post hoc. Many of these metrics such as process time variation, throughput, and service level are based on recorded performance, making them inherently retrospective. They are emergent properties of execution, making them observational rather than predictive.

However, some of these metrics play a critical role in proactive control. Take variability. While it reflects historical behavior, it reveals the patterns and probabilities that drive future instability. For example, repeated variation in process durations may indicate the need for increased buffer sizing; frequent arrival mismatches may signal problems with readiness or upstream coordination. In this sense, variability metrics serve as diagnostic inputs that support better decision-making particularly when linked to upstream planning levers like release control, batching strategy, or crew sequencing.

Moreover, in environments that leverage Production System Modeling, the boundary between lagging and leading indicators becomes more fluid. By embedding historical variability data into production system models or digital twins, organizations can predict likely performance outcomes, test mitigation scenarios, and establish dynamic control limits. In this context, variability metrics shift from being passive reflections to active predictors of system behavior. They retain their observational nature, but when connected to a production system model, they gain leading utility.

Figure 3: Leading, Lagging, Descriptive, and Prescriptive Metrics

Thus, while some metrics are technically lagging, they are essential to enabling leading behaviors. When embedded within a predictive framework, they help teams buffer proactively, avoid overcommitment, and maintain execution stability in the face of uncertainty.

Another insight from Operations Science is that metrics must reflect controlling variables, not just outcomes. For example, controlling WIP and observing the outcome of throughput and cycle time. Throughput is limited by the system’s bottleneck and by how much work has been released to it. Measuring execution progress without tracking constraint status or buffer availability often leads to overloading, rework, and schedule collapse [11]. In this view, WIP, buffering strategy, and batching, to name a few, represent “control knobs” of the system.

Finally, defining the right measurements also requires anticipating unintended consequences. As noted by behavioral economists and reinforced by E. M. Goldratt, “Tell me how you will measure me, and I will tell you how I will behave”, every metric shapes behavior. A rigorous metric selection process should include scenario testing: “If we reward this metric, how might teams respond? Could it drive dysfunction?”

Step 2: Collect and Analyze Metrics

Collecting production information should be done real-time or close to real-time. Longer time horizons between data capturing events reduces the quantity of data available and reduces the responsiveness of the system to make changes when needed. Variability occurs as work is executed daily. Collecting information in time horizons that span a week or more, miss important root causes of variability affecting the system. In critical production environments, decision cadence may be needed at daily or shift-based time intervals.

In practice, capital project environments generate substantial volumes of production data ranging from daily work logs and material availability to crew productivity and actual completion times. While collecting this data is foundational, the real value lies in an organization’s ability to analyze, synthesize, and act upon it. Otherwise, why capture it?

Raw data alone does not improve performance; actionable insight does. In complex production systems, especially those with interdependent trades, constrained resources, and high variability, manually piecing together trends, bottlenecks, and optimization opportunities can overwhelm even experienced teams. That’s why leading organizations increasingly leverage specialized production system modeling and control solutions. These systems not only provide hindsight (what happened) and insight (what is happening), but more importantly, they offer foresight: the ability to anticipate constraints, delays, and variability before they impact outcomes.

These platforms are designed to:

  • Convert granular event data into actionable performance metrics
  • Visualize flow across time and space
  • Simulate the impact of specific production strategies and / or proposed changes to existing ones
  • Drive real-time collaborative decision-making

By embedding this analytical intelligence, teams gain faster feedback, better visibility, and greater agility to address emerging issues before they escalate into schedule or cost overruns. The transition from reactive reporting to proactive production control often starts with the right tools. When deployed properly, production modeling and control solutions do not replace project expertise, they amplify it.

Step 3: Monitor, Control, and Improve

With real-time insights and production-focused metrics in place, the production team must now shift from passive observation to active control. As defined by PPI, control is the action of regulating or limiting the behavior of processes in a production system [12]. Enabling control in a production environment is the entire purpose of this measurement framework. This phase involves continuously comparing actual system performance to expected behavior and making targeted adjustments to steer the project toward desired outcomes.

Establish performance targets and control limits. Each metric should have defined control limits that reflect expected variability and target thresholds based on desired system performance. These boundaries enable teams to detect early signs of deviation or instability, identify the erosion of buffers, and monitor the production system’s responsiveness and resilience. These limits must remain stable unless there is a significant change to the system’s demand, mix, or configuration. Frequent changes are ill-advised as they introduce noise into the production system, increase variability and degrade performance.

Use feedback loops to interpret metric behavior. If cycle time is increasing, is WIP accumulating or is throughput dropping? If service levels are falling, is there a constraint on capacity or an increase in variability upstream? If WIP is increasing with no increase in completions, is the release policy too aggressive? These relationships enable diagnosis rather than guesswork.

The effectiveness of control depends on frequency. Daily or shift-based monitoring is often required in volatile environments to detect instability early. Control limits should be calibrated using historical system variability or through production system models, accounting for both process noise and event-driven exceptions. Aggregating or measuring metrics too infrequently (e.g., weekly) can hide underlying issues that manifest much faster.

To illustrate this, consider the following production process used to install pipe spools into a new pipe rack. Before the work began, a production system model was built to analyze the performance of the current design strategy and calculate an optimal control limit for WIP (referred to as CONWIP in Figure 4). This limit is greater than the minimum WIP needed to meet demand (Min WIP in Figure 4) and less than average expected WIP with no control policy (Push WIP in Figure 4). The purpose of this is to maintain short cycle times relative to the current production strategy while maintaining enough throughput to meet demand. Quantitatively, Min WIP, CONWIP, and Push WIP are equivalent to 295, 310, and 452 pipe spools respectively and result in cycle times of 49, 51, and 75 days respectively. This reveals an opportunity to reduce cycle times by 32% at no additional cost.

Site teams must control WIP to capture this value. This requires site teams to monitor the quantity of WIP (i.e. number of pipe spool installation work fronts started and not yet finished) and allow or stop the release of new work into the system. The longer this team goes between WIP measurements, the less responsive they will be to changes in the system. Weekly measurements, for example, means the team will go a week at a time between knowing if cycle times are growing or throughput is falling.

Figure 4: Calculating and Using Optimal WIP as a Control Policy

When data is synthesized and visualized properly, field teams can prioritize work in the most effective way, adjust batch sizes and release timing dynamically, reallocate crews / equipment to where they have the greatest impact and escalate production-based risks with clarity and evidence. This creates a self-regulating control system.

With production system modeling tools in place, teams can go beyond “what happened” to testing “what if” as well as get access to robust predictions. Examples may include gaining insights and implications from testing production system designs and strategies such as what if we reduce batch sizes by 50%. What if we delayed release until prerequisites were 100% ready? What if we preassemble this component before installing it? What if we added one more crew at the bottleneck? Simulating these options before implementing them in the field, lowers risk for experimentation and provides evidence for process changes. The key shift in this phase is from observation to intervention.

Conclusion

The way we measure performance fundamentally shapes how we manage it. In capital projects, misaligned or incomplete metrics have unintentionally encouraged local optimization, hidden instability, and contributed to chronic underperformance. While traditional project metrics remain valuable for contractual, financial, and reporting purposes, they were never designed to reflect the realities of dynamic production systems. Relying solely on them has constrained our ability to diagnose problems, control flow, and improve outcomes in real time.

Figure 5: Augmented Measurement System with Production-Based Metrics

By adding an Operations Science-based framework, project leaders can equip their teams with metrics that reflect the true behavior of production systems. An OS framework embedded in decision-making and supported by production modeling and control solutions, enables proactive control, faster feedback, and measurable improvements in schedule adherence, project cost, and cash flow. The future of project performance lies in better control. That control begins with a deeper understanding of what drives each project production system.

References

[1] D. W. Hubbard, How to Measure Anything: Finding the Value of "Intangibles" in Business, vol. 3rd, Hoboken, NJ: Wiley, 2014.

[2] M. Strathern, "Improving Ratings: Audit in the British University System," European Review, vol. 5, no. 3, pp. 305-321, 1997.

[3] T. R. Zabelle and R. J. Arbulu, "Schedules vs. Project Production Systems," The Journal of Project Production Management, vol. 6, 2023.

[4] R. G. Shenoy, "The Limitations of Capital Project Benchmarking," The Journal of Project Production Management, vol. 3, 2018.

[5] S. Kerr, "On the Folly of Rewarding A While Hoping for B," Academy of Management Journal, no. 18, pp. 769-783, 1975.

[6] T. R. Zabelle, H. J. Choo, M. Spearman and E. Pound, "A "Gap" in Current Project Management and the Impact on Project Outcomes," The Journal of Project Production Management, vol. 3, 2018.

[7] G. Fischer, J. Hartung and P. Massih, "Does Contracting Strategy Matter?," The Journal of Project Production Management, vol. 5, 2021.

[8] D. Kahneman, Thinking, Fast and Slow, New York, NY, USA: Farrar, Straus and Giroux, 2011.

[9] T. Davenport and J. Harris, Competing on Analytics: The New Science of Winning, Boston, MA, USA: Harvard Business School Press, 2007.

[10] Gartner Research, "Analytics Maturity Model," 2012. [Online]. Available: https://www.gartner.com/en/documents/2095915.

[11] R. Shenoy, "A Comparison of Lean Construction with Project Production Management," The Journal of Project Production Managament, vol. 2, 2017.

[12] Project Production Institute, [Online]. Available: https://projectproduction.org/resources/glossary/#control.

[13] R. J. Arbulu, "The Unintended Consequences of Excessive Inventory on Capital Projects," Journal of Project Production Management, vol. 7.

[14] W. J. Hopp and M. L. Spearman, Factory Physics, Waveland Press, 2011.

© 2025 Project Production Institute. All Rights Reserved.