The Limits of Reliability: Why Technology Alone Can’t Fix Poor Design

Asset-intensive industries, such as oil and gas, power generation, manufacturing, mining, transportation, and utilities, rely heavily on complex physical assets. The reliability of these assets directly affects safety, productivity, cost and competitiveness.

In recent years, technologies, like condition monitoring, predictive analytics, and artificial intelligence (AI), have promised to revolutionize maintenance and maximize uptime. While these technologies are powerful and have an important role to play, here’s the hard truth: the inherent reliability of equipment is determined by its original design.

No matter how sophisticated your monitoring systems, predictive models, or maintenance strategies, you cannot surpass the reliability ceiling set during the design phase. If the foundation is weak, technology can only help you manage the consequences, it cannot erase poor design decisions.

What Is Inherent Reliability?

Think of inherent reliability as a ceiling—the maximum level of reliability a piece of equipment can achieve, even under perfect operating and maintenance conditions. This baseline is established during design and depends on factors, such as:

  • Component selection and quality;
  • System configuration and redundancy;
  • Material specifications and engineering tolerances;
  • Environmental compatibility and lifecycle considerations.

Design limitations often result from aggressive cost cutting, insufficient engineering analysis, underestimating operational or environmental stresses, ignoring historical failure data, or overlooking maintainability and operability. Once equipment is built, you can only operate within this ceiling. You can harvest inherent reliability, but you can’t exceed it unless you go back to redesign.

Inherent reliability is a cornerstone of reliability theory. It differentiates between failures caused by external factors and those “baked in” during design. As highlighted in reliability-centered maintenance (RCM) literature: “No maintenance strategy or operational discipline can make a system more reliable than it was designed to be.” Understanding this concept is crucial. It defines the ultimate boundary for what technology and maintenance can achieve.

Why Design Matters More Than Technology

According to the foundational RCM studies by F. Stanley Nowlan and Howard F. Heap, and validated across industries, including the oil and gas industry, design deficiencies account for a significant percentage (up to 60 percent) of chronic reliability failures, not maintenance gaps or operational missteps. Note that this percentage may vary by study or industry.

In asset-intensive industries, design issues are a persistent contributor to failures, even with advanced monitoring systems in place. Examples include:

  • Poor material choices for operating conditions;
  • Suboptimal fluid dynamics in pumps and piping;
  • Inadequate corrosion allowances;
  • Lack of redundancy or oversights in system configuration.

Too often, organizations treat these as maintenance challenges, rather than recognizing them as design problems. And design problems cannot be patched forever.

Where Technology Helps and Where It Doesn’t

AI and machine learning (ML) have transformed asset management by analyzing vast amounts of operational data to predict failures and optimize maintenance. Technology can:

  • Provide real-time visibility into asset health;
  • Detect anomalies early;
  • Forecast potential failures and estimate remaining useful life;
  • Recommend maintenance actions.

However, technology cannot fix design flaws. If a pump is undersized, a motor overheats by design or if a system lacks redundancy, no algorithm can change that. Technology can only manage the consequences, not eliminate the root cause.

For example, sensors may monitor pump vibrations, but if the pump’s design inherently causes cavitation due to fluid dynamics errors, the system will still fail, just more predictably. Monitoring systems cannot prevent premature bearing failure if the asset was under designed from the start.

To truly maximize reliability, organizations must combine predictive technologies with a commitment to design review and improvement. Chronic failures flagged by AI should trigger questions like: Is this a process issue or does it reveal a design ceiling requiring engineering intervention?

Executive Accountability

Ultimate accountability for asset reliability starts at the top. Executive decisions shape design philosophies, resource allocation, and risk appetite, cascading through engineering and operations. Decisions on funding, support for frameworks, like design for reliability (DfR), and prioritization of safety and reliability set the tone for an organization’s performance.

Short-term cost pressures, lack of technical insight, or competing business targets often lead to incremental investments in maintenance and monitoring, but not in fundamental redesigns. Over time, these missed opportunities manifest as chronic failures, escalating costs, and even catastrophic events.

Why Understanding Limits Matters

Recognizing the role of inherent reliability allows leaders to:

  1. Set Realistic Expectations – No maintenance strategy or monitoring system can outperform a flawed design.
  2. Guide Smarter Investments – Focus on design reviews, redesigns and reliability-centered design, rather than endlessly buying more sensors or software.
  3. Improve Long-Term Return on Investment (ROI) – Investing up front in reliable design may increase initial costs, but significantly reduces downtime, lifecycle expenses, and safety risks.

Design for Reliability: The Path Forward

To break the cycle of poor reliability, organizations must focus on design by:

  • Integrating DfR Principles – Embed reliability engineering throughout the project lifecycle, from conceptual design to manufacturing. Identify and mitigate potential issues before equipment is built.
  • Conducting Rigorous Design Reviews – Independent, structured reviews uncover potential failure modes and allow proactive correction.
  • Learning From Past Failures – Institutionalize lessons from reliability events into procurement, contracting and design processes to avoid repeating mistakes.
  • Encouraging Cross-Functional Collaboration – Operations, maintenance and engineering teams must work together to identify recurring failures and guide redesigns effectively.
  • Ensuring Executive Empowerment – Reliability is not just technical, it is an executive responsibility. Boards must fund reliability goals, embed them in key performance indicators (KPIs), and treat known design risks as governance issues.
  • Managing Assets Across Their Lifecycle – Design limitations discovered midlife or end of life should feed back into future projects and legacy asset replacement planning. Lifecycle costing must reflect the long-term impact of poor design.
  • Using Monitoring to Inform Design – Condition monitoring and AI are most valuable when trends inform design improvements, triggering engineering investigations and potential redesigns.

Conclusion

Reliability is not just about technology or maintenance—it starts with design. Technology can enhance operations, detect problems early, and optimize maintenance, but it cannot break through the ceiling set by poor design. To achieve true, sustainable reliability, organizations must confront a simple, often uncomfortable truth: sometimes, the only solution is to go back to the drawing board and redesign.