Frontiers in Diagnostics — An Interactive Primer

1

The gap between the textbook and the fleet.

A clean benchmark dataset, a labeled fault library, a stationary operating envelope, unlimited compute, a single asset in a single laboratory. Textbook fault diagnosis lives in that world. Fleets live somewhere else. This monograph is about the gap between them — and the techniques, some mature, some speculative, that try to close it.

The symptoms of the gap are familiar to anyone who has deployed a diagnostic system. A classifier that scored 98% in validation drops to 71% on a new production year. An ECU that runs the algorithm in simulation cannot fit it in 32 KB of flash. A certification body asks “why did the model predict that fault?” and the team does not know. A real asset produces thousands of healthy hours and zero labeled failures. A vehicle has fifty ECUs and no one owns the fault localization problem across all of them.

Fleet heterogeneity 55%

Figure 1 · InteractiveTwelve assets, same specification, same threshold. Increase heterogeneity: units start aging at different rates, under different loads, with different sensor calibrations. The single shared threshold starts to misfire — some units alarm healthy, others cross threshold unnoticed. Fleet variation is the central challenge of deploying diagnostics at scale.

2

Domain adaptation — training here, deploying there.

A classifier trained on the source domain (dyno, lab, simulation, last year's fleet) must run on the target domain (road, customer, this year's fleet). If the distributions are different — and they always are — accuracy drops. Domain adaptation and transfer learning are the machinery that bridges the two.

The unifying idea: learn features that are invariant across source and target, so a classifier trained on source features transfers to target. Formally, minimize source classification loss plus some measure of source/target distribution distance — MMD, adversarial domain classifier loss, or a contrastive alignment objective.

Domain shift 45%

Figure 2 · InteractiveThree panels, same task. Left: the classifier is trained on labeled source data and draws a clean boundary. Middle: the same boundary applied to target data — the distribution has shifted, and the boundary misfires. Right: after domain adaptation, the target features have been realigned and the original boundary recovers accuracy.

What the practitioner picks between

Unsupervised DANo target labels. Learn invariant features via MMD minimization, adversarial domain classifier (DANN), or optimal-transport alignment. The hardest setting, the most common in practice.
Semi-supervised DAA handful of target labels available. Fine-tune a source model on them with heavy regularization. Cheap and effective when a few target failures exist.
Few-shot / meta-learningTrain a model to be quickly adaptable to new domains. Useful for fleets with many segments and few examples per segment.
Test-time adaptationAdjust batch-norm statistics or a small number of parameters at inference, without changing the backbone. Popular because it requires no retraining pipeline.

3

Physics-informed neural networks — physics as regularizer.

A neural network trained on a handful of noisy measurements can fit them perfectly — and predict absolute nonsense in between. Physics-informed neural networks (PINNs) fix this by adding a term to the loss that penalizes violations of known physical laws.

L = Ldata(θ) + λ · Lphysics(θ)

The data loss is the usual — predictions should match observations. The physics loss is a penalty on how badly the network’s outputs violate a known differential equation, conservation law, or constraint. A motor model might demand dψ/dt = v − R·i. A battery model might demand charge conservation. A crack-growth model might demand Paris’ law. The physics term is evaluated via automatic differentiation through the network itself.

The payoff is generalization. A data-only network interpolates noise; a physics-regularized network interpolates reality.

Physics weight λ λ = 0.5 Noise 40%

true y(t) data-only NN PINN (λ-weighted) noisy observation

Figure 3 · InteractiveAn asset whose state obeys a known first-order decay dy/dt = −k·y. Given a handful of noisy observations, a high-capacity NN fits through every point and oscillates wildly between them. Increase λ and the fit smooths onto the physical manifold. The PINN uses the physics prior where data is sparse.

4

Explainable AI — the certification problem.

A deep classifier says “bearing fault.” A certification auditor asks why. “Because the weights in layer 7 produced a logit of 3.4” is not an answer. Explainable AI (XAI) for diagnostics is the discipline of producing human-interpretable justifications for a model’s decisions — justifications that hold up under ISO 26262, DO-178C, and the attention of a skeptical safety engineer.

The core tools are saliency, SHAP values, attention visualization, and counterfactual explanations. Each produces some form of attribution map: which regions of the input drove this decision, and by how much?

Figure 4 · InteractiveSwitch between fault classes. The rust-colored spectrum is what the classifier saw; the teal highlights are the saliency map — which frequency bins the network actually used to decide. Bearing faults light up BPFO tones. Broken rotor bars highlight the (1 ± 2s)f_s sidebands. The model’s reasoning becomes inspectable.

The honest caveats

Saliency methods disagreeDifferent attribution methods (gradient, integrated gradient, SHAP, LIME) on the same model produce different explanations. Some of the methods are now known to be unreliable.
Faithfulness ≠ plausibilityA plausible-looking explanation is not necessarily what the model actually used. Proving faithfulness is itself a research problem.
Post-hoc vs. by-designExplanations generated after training (saliency) are less trustworthy than architectures that are interpretable by construction (attention, prototype-based models, shallow rule-based ensembles).

5

Edge deployment — the 64 KB constraint.

A state-of-the-art diagnostic classifier has a hundred million parameters and runs on a GPU. An automotive microcontroller has 64 KB of flash, 16 KB of RAM, a 200 MHz scalar core, and a 1 ms deadline. Getting the first into the second is the edge deployment problem.

Four techniques dominate production:

QuantizationReplace float32 weights with int8 or int4. Typically 4×–8× size reduction at small accuracy cost. Modern toolchains (TFLite Micro, ONNX Runtime, CMSIS-NN) make this routine.
PruningRemove weights that don’t contribute. Structured pruning (whole channels or filters) is more ECU-friendly than unstructured because it actually shrinks dense matrix operations.
Knowledge distillationTrain a small “student” network to mimic a large “teacher”. Student inherits the teacher’s decision surface at a fraction of the parameter count.
Feature engineeringOften overlooked: hand-designed spectral/statistical features followed by a tiny classifier (decision tree, linear model) are spectacularly efficient and still meet many production accuracy requirements.

Flash

—

RAM

—

Latency

—

Accuracy

—

Compression 1×

Figure 5 · InteractiveAn Infineon AURIX-class ECU with hard budgets: 64 KB flash, 16 KB RAM, 1 ms deadline, 95% target accuracy. The baseline model blows every budget. Slide compression to shrink it — notice that accuracy barely moves until 4× compression, then degrades sharply. Knowledge distillation is a better trade than pure quantization.

6

Digital twins at fleet scale.

A digital twin is a model of a specific asset — calibrated to its individual parameters, updated with its individual data. At fleet scale, every vehicle has its own twin; every twin learns continuously; aggregate patterns across twins become fleet-level insights. This is different from a single lab model in two ways: each twin is personal, and there are thousands of them.

The diagnostic value comes from individualization. A fleet-average model flags anything that looks unusual on average. A per-asset twin flags anything unusual for that asset’s history. A slow drift invisible against population variance becomes obvious against the asset’s own baseline.

Each tile shows one asset’s twin residual — the difference between measured behavior and what its individually-calibrated twin predicts. Healthy assets stay flat. A developing fault drives its own unit’s residual above threshold while the others stay calm.

Figure 6 · InteractiveTwelve assets, twelve digital twins, twelve personal residuals. Play the fleet operation and inject a fault — only Unit-07’s tile flags. Compare this with the fleet-variation hero figure: the same fault, different architecture. Personalization defeats heterogeneity.

7

Anomaly detection without labels.

In the lab, a researcher has labeled faults. In the field, an engineer has thousands of hours of healthy operation and almost no labeled failures. “Thousands of hours” is the data; “we do not know what the failures look like” is the problem. Unsupervised anomaly detection learns the shape of normal and flags everything else.

One-class SVMFits a tight boundary around the healthy cloud in feature space. Classical, interpretable, scales poorly.
Isolation ForestGrows random trees; anomalies are isolated in few splits. Scales well, handles mixed feature types.
Autoencoder reconstructionTrain to reconstruct healthy data. The reconstruction error is the anomaly score. Natural for high-dimensional sensor data.
Deep SVDDNeural generalization of one-class SVM. Learns a feature space in which healthy data is tightly clustered.
Density estimationNormalizing flows or Gaussian mixture models; anomalies are low-probability samples.

Figure 7 · InteractiveThe left panel shows the training set — entirely healthy. The learned boundary (dashed) tightly encloses the healthy cloud. The right panel streams new observations one at a time; each is green if inside the boundary and red if outside. No fault labels were ever provided. The boundary alone is doing the detection.

8

Distributed FDI — when the system is many computers.

A modern vehicle has fifty or more ECUs — each with its own sensors, its own local diagnostics, its own bandwidth-limited view of the rest of the system. No single ECU sees everything. And yet, somehow, the vehicle must localize a fault that cuts across ECUs — a sensor drift that poisons both the brake controller and the stability controller, a CAN-bus glitch that looks like a motor fault from one ECU and a powertrain fault from another.

Distributed FDI is the architecture for this: local residuals computed at each node, sparse communication between nodes, and a consensus or message-passing algorithm that produces a global diagnosis without any node having access to all the data.

Figure 8 · InteractiveA vehicle’s ECU graph — VCU, BMS, four motor drives, ABS, DAS, each a node with local sensing. Inject a fault at a random ECU. Initially only that ECU knows. Propagate rounds: each ECU shares its belief with its graph neighbors, and the network converges on a single localized fault with rising confidence. No central computer ever sees all the data.

What distributed FDI has to handle

Limited bandwidthCAN / Ethernet bus slots are budgeted. Diagnostic messages compete with control traffic. Protocols must be compact and infrequent.
Partial observabilityEach node sees a slice of the system. Local residuals are ambiguous; isolation requires fusion across nodes.
Heterogeneous computationSome ECUs have spare cycles, others are saturated. Diagnostic work must flow to where it fits.
Fault in the diagnostic channel itselfMessages can be lost, nodes can fail silent. The protocol must be Byzantine-robust — one rogue node should not poison the consensus.

9

References & further reading

Domain adaptation & transfer learning

Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., March, M., Lempitsky, V. — “Domain-adversarial training of neural networks.” JMLR, 2016. The DANN paper — the foundational adversarial domain-adaptation method.
Long, M., Cao, Y., Wang, J., Jordan, M. I. — “Learning transferable features with deep adaptation networks.” ICML, 2015. MMD-based domain adaptation.
Wang, J., Chen, Y., Gao, W., Zhao, X., Yu, P. S. — “Generalizing to unseen domains: A survey on domain generalization.” IEEE TKDE, 2022. Modern survey that includes PHM applications.
Li, X., Zhang, W., Ding, Q., Sun, J.-Q. — “Cross-domain fault diagnosis of rolling element bearings using deep generative networks.” IEEE Trans. Industrial Electronics, 2019. One of many DA-for-bearings papers; a good entry point.

Physics-informed neural networks

Raissi, M., Perdikaris, P., Karniadakis, G. E. — “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.” J. Computational Physics, 2019. The foundational PINN paper.
Karniadakis, G. E., Kevrekidis, I. G., Lu, L., Perdikaris, P., Wang, S., Yang, L. — “Physics-informed machine learning.” Nature Reviews Physics, 2021. The current authoritative overview.
Shen, S., Lu, H., Sadoughi, M., Hu, C., Nemani, V., Thelen, A., Webster, K., Darr, M., Sidon, J., Kenny, S. — “A physics-informed deep learning approach for bearing fault detection.” Engineering Applications of Artificial Intelligence, 2021.

Explainable AI for diagnostics

Lundberg, S. M. and Lee, S.-I. — “A unified approach to interpreting model predictions.” NeurIPS, 2017. The SHAP paper.
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D. — “Grad-CAM: Visual explanations from deep networks via gradient-based localization.” ICCV, 2017. The dominant saliency method for CNN-based classifiers.
Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B. — “Sanity checks for saliency maps.” NeurIPS, 2018. An essential critical read — shows that many saliency methods are unreliable.
Rudin, C. — “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.” Nature Machine Intelligence, 2019. The manifesto for by-design interpretability in safety-critical settings.

Edge deployment & efficient inference

Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., Kalenichenko, D. — “Quantization and training of neural networks for efficient integer-arithmetic-only inference.” CVPR, 2018. The int8 inference reference.
Hinton, G., Vinyals, O., Dean, J. — “Distilling the knowledge in a neural network.” NeurIPS workshop, 2015. Knowledge distillation.
Lin, J., Chen, W.-M., Lin, Y., Cohn, J., Gan, C., Han, S. — “MCUNet: Tiny deep learning on IoT devices.” NeurIPS, 2020. Explicitly targeted at MCU-scale deployment.
ARM CMSIS-NN and STMicroelectronics X-CUBE-AI documentation. Production-grade references for actual MCU deployment pipelines.

Digital twins at fleet scale

Tao, F., Zhang, H., Liu, A., Nee, A. Y. C. — “Digital twin in industry: State-of-the-art.” IEEE Trans. Industrial Informatics, 2019. Broad survey across domains.
Grieves, M. and Vickers, J. — “Digital twin: Mitigating unpredictable, undesirable emergent behavior in complex systems.” In Transdisciplinary Perspectives on Complex Systems, 2017. Source of the now-standard “digital twin” terminology.
Kapteyn, M. G., Pretorius, J. V. R., Willcox, K. E. — “A probabilistic graphical model foundation for enabling predictive digital twins at scale.” Nature Computational Science, 2021.

Unsupervised anomaly detection

Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., Williamson, R. C. — “Estimating the support of a high-dimensional distribution.” Neural Computation, 2001. The one-class SVM.
Liu, F. T., Ting, K. M., Zhou, Z.-H. — “Isolation forest.” ICDM, 2008. The canonical tree-based anomaly detector.
Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S. A., Binder, A., Müller, E., Kloft, M. — “Deep one-class classification” (Deep SVDD). ICML, 2018.
Ruff, L. et al. — “A unifying review of deep and shallow anomaly detection.” Proc. IEEE, 2021. The modern survey.

Distributed & networked FDI

Shames, I., Teixeira, A. M. H., Sandberg, H., Johansson, K. H. — “Distributed fault detection for interconnected second-order systems.” Automatica, 2011.
Ferrari, R. M. G., Parisini, T., Polycarpou, M. M. — “Distributed fault detection and isolation of large-scale discrete-time nonlinear systems: An adaptive approximation approach.” IEEE TAC, 2012.
Boem, F., Ferrari, R. M. G., Parisini, T. — “Distributed fault detection and isolation of continuous-time non-linear systems.” European J. Control, 2011.
Daigle, M. J., Bregon, A., Roychoudhury, I. — “Distributed diagnosis in formations of mobile robots.” IEEE TRA, 2007. Multi-agent distributed diagnosis.