The gap between the textbook and the fleet.
A clean benchmark dataset, a labeled fault library, a stationary operating envelope, unlimited compute, a single asset in a single laboratory. Textbook fault diagnosis lives in that world. Fleets live somewhere else. This monograph is about the gap between them — and the techniques, some mature, some speculative, that try to close it.
The symptoms of the gap are familiar to anyone who has deployed a diagnostic system. A classifier that scored 98% in validation drops to 71% on a new production year. An ECU that runs the algorithm in simulation cannot fit it in 32 KB of flash. A certification body asks “why did the model predict that fault?” and the team does not know. A real asset produces thousands of healthy hours and zero labeled failures. A vehicle has fifty ECUs and no one owns the fault localization problem across all of them.
Domain adaptation — training here, deploying there.
A classifier trained on the source domain (dyno, lab, simulation, last year's fleet) must run on the target domain (road, customer, this year's fleet). If the distributions are different — and they always are — accuracy drops. Domain adaptation and transfer learning are the machinery that bridges the two.
The unifying idea: learn features that are invariant across source and target, so a classifier trained on source features transfers to target. Formally, minimize source classification loss plus some measure of source/target distribution distance — MMD, adversarial domain classifier loss, or a contrastive alignment objective.
What the practitioner picks between
- Unsupervised DANo target labels. Learn invariant features via MMD minimization, adversarial domain classifier (DANN), or optimal-transport alignment. The hardest setting, the most common in practice.
- Semi-supervised DAA handful of target labels available. Fine-tune a source model on them with heavy regularization. Cheap and effective when a few target failures exist.
- Few-shot / meta-learningTrain a model to be quickly adaptable to new domains. Useful for fleets with many segments and few examples per segment.
- Test-time adaptationAdjust batch-norm statistics or a small number of parameters at inference, without changing the backbone. Popular because it requires no retraining pipeline.
Physics-informed neural networks — physics as regularizer.
A neural network trained on a handful of noisy measurements can fit them perfectly — and predict absolute nonsense in between. Physics-informed neural networks (PINNs) fix this by adding a term to the loss that penalizes violations of known physical laws.
The data loss is the usual — predictions should match observations. The physics loss is a penalty on how badly the network’s outputs violate a known differential equation, conservation law, or constraint. A motor model might demand dψ/dt = v − R·i. A battery model might demand charge conservation. A crack-growth model might demand Paris’ law. The physics term is evaluated via automatic differentiation through the network itself.
The payoff is generalization. A data-only network interpolates noise; a physics-regularized network interpolates reality.
Explainable AI — the certification problem.
A deep classifier says “bearing fault.” A certification auditor asks why. “Because the weights in layer 7 produced a logit of 3.4” is not an answer. Explainable AI (XAI) for diagnostics is the discipline of producing human-interpretable justifications for a model’s decisions — justifications that hold up under ISO 26262, DO-178C, and the attention of a skeptical safety engineer.
The core tools are saliency, SHAP values, attention visualization, and counterfactual explanations. Each produces some form of attribution map: which regions of the input drove this decision, and by how much?
The honest caveats
- Saliency methods disagreeDifferent attribution methods (gradient, integrated gradient, SHAP, LIME) on the same model produce different explanations. Some of the methods are now known to be unreliable.
- Faithfulness ≠ plausibilityA plausible-looking explanation is not necessarily what the model actually used. Proving faithfulness is itself a research problem.
- Post-hoc vs. by-designExplanations generated after training (saliency) are less trustworthy than architectures that are interpretable by construction (attention, prototype-based models, shallow rule-based ensembles).
Edge deployment — the 64 KB constraint.
A state-of-the-art diagnostic classifier has a hundred million parameters and runs on a GPU. An automotive microcontroller has 64 KB of flash, 16 KB of RAM, a 200 MHz scalar core, and a 1 ms deadline. Getting the first into the second is the edge deployment problem.
Four techniques dominate production:
- QuantizationReplace float32 weights with int8 or int4. Typically 4×–8× size reduction at small accuracy cost. Modern toolchains (TFLite Micro, ONNX Runtime, CMSIS-NN) make this routine.
- PruningRemove weights that don’t contribute. Structured pruning (whole channels or filters) is more ECU-friendly than unstructured because it actually shrinks dense matrix operations.
- Knowledge distillationTrain a small “student” network to mimic a large “teacher”. Student inherits the teacher’s decision surface at a fraction of the parameter count.
- Feature engineeringOften overlooked: hand-designed spectral/statistical features followed by a tiny classifier (decision tree, linear model) are spectacularly efficient and still meet many production accuracy requirements.
Digital twins at fleet scale.
A digital twin is a model of a specific asset — calibrated to its individual parameters, updated with its individual data. At fleet scale, every vehicle has its own twin; every twin learns continuously; aggregate patterns across twins become fleet-level insights. This is different from a single lab model in two ways: each twin is personal, and there are thousands of them.
The diagnostic value comes from individualization. A fleet-average model flags anything that looks unusual on average. A per-asset twin flags anything unusual for that asset’s history. A slow drift invisible against population variance becomes obvious against the asset’s own baseline.
Each tile shows one asset’s twin residual — the difference between measured behavior and what its individually-calibrated twin predicts. Healthy assets stay flat. A developing fault drives its own unit’s residual above threshold while the others stay calm.
Anomaly detection without labels.
In the lab, a researcher has labeled faults. In the field, an engineer has thousands of hours of healthy operation and almost no labeled failures. “Thousands of hours” is the data; “we do not know what the failures look like” is the problem. Unsupervised anomaly detection learns the shape of normal and flags everything else.
- One-class SVMFits a tight boundary around the healthy cloud in feature space. Classical, interpretable, scales poorly.
- Isolation ForestGrows random trees; anomalies are isolated in few splits. Scales well, handles mixed feature types.
- Autoencoder reconstructionTrain to reconstruct healthy data. The reconstruction error is the anomaly score. Natural for high-dimensional sensor data.
- Deep SVDDNeural generalization of one-class SVM. Learns a feature space in which healthy data is tightly clustered.
- Density estimationNormalizing flows or Gaussian mixture models; anomalies are low-probability samples.
Distributed FDI — when the system is many computers.
A modern vehicle has fifty or more ECUs — each with its own sensors, its own local diagnostics, its own bandwidth-limited view of the rest of the system. No single ECU sees everything. And yet, somehow, the vehicle must localize a fault that cuts across ECUs — a sensor drift that poisons both the brake controller and the stability controller, a CAN-bus glitch that looks like a motor fault from one ECU and a powertrain fault from another.
Distributed FDI is the architecture for this: local residuals computed at each node, sparse communication between nodes, and a consensus or message-passing algorithm that produces a global diagnosis without any node having access to all the data.
What distributed FDI has to handle
- Limited bandwidthCAN / Ethernet bus slots are budgeted. Diagnostic messages compete with control traffic. Protocols must be compact and infrequent.
- Partial observabilityEach node sees a slice of the system. Local residuals are ambiguous; isolation requires fusion across nodes.
- Heterogeneous computationSome ECUs have spare cycles, others are saturated. Diagnostic work must flow to where it fits.
- Fault in the diagnostic channel itselfMessages can be lost, nodes can fail silent. The protocol must be Byzantine-robust — one rogue node should not poison the consensus.
References & further reading
Domain adaptation & transfer learning
- Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., March, M., Lempitsky, V. — “Domain-adversarial training of neural networks.” JMLR, 2016. The DANN paper — the foundational adversarial domain-adaptation method.
- Long, M., Cao, Y., Wang, J., Jordan, M. I. — “Learning transferable features with deep adaptation networks.” ICML, 2015. MMD-based domain adaptation.
- Wang, J., Chen, Y., Gao, W., Zhao, X., Yu, P. S. — “Generalizing to unseen domains: A survey on domain generalization.” IEEE TKDE, 2022. Modern survey that includes PHM applications.
- Li, X., Zhang, W., Ding, Q., Sun, J.-Q. — “Cross-domain fault diagnosis of rolling element bearings using deep generative networks.” IEEE Trans. Industrial Electronics, 2019. One of many DA-for-bearings papers; a good entry point.
Physics-informed neural networks
- Raissi, M., Perdikaris, P., Karniadakis, G. E. — “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.” J. Computational Physics, 2019. The foundational PINN paper.
- Karniadakis, G. E., Kevrekidis, I. G., Lu, L., Perdikaris, P., Wang, S., Yang, L. — “Physics-informed machine learning.” Nature Reviews Physics, 2021. The current authoritative overview.
- Shen, S., Lu, H., Sadoughi, M., Hu, C., Nemani, V., Thelen, A., Webster, K., Darr, M., Sidon, J., Kenny, S. — “A physics-informed deep learning approach for bearing fault detection.” Engineering Applications of Artificial Intelligence, 2021.
Explainable AI for diagnostics
- Lundberg, S. M. and Lee, S.-I. — “A unified approach to interpreting model predictions.” NeurIPS, 2017. The SHAP paper.
- Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D. — “Grad-CAM: Visual explanations from deep networks via gradient-based localization.” ICCV, 2017. The dominant saliency method for CNN-based classifiers.
- Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B. — “Sanity checks for saliency maps.” NeurIPS, 2018. An essential critical read — shows that many saliency methods are unreliable.
- Rudin, C. — “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.” Nature Machine Intelligence, 2019. The manifesto for by-design interpretability in safety-critical settings.
Edge deployment & efficient inference
- Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., Adam, H., Kalenichenko, D. — “Quantization and training of neural networks for efficient integer-arithmetic-only inference.” CVPR, 2018. The int8 inference reference.
- Hinton, G., Vinyals, O., Dean, J. — “Distilling the knowledge in a neural network.” NeurIPS workshop, 2015. Knowledge distillation.
- Lin, J., Chen, W.-M., Lin, Y., Cohn, J., Gan, C., Han, S. — “MCUNet: Tiny deep learning on IoT devices.” NeurIPS, 2020. Explicitly targeted at MCU-scale deployment.
- ARM CMSIS-NN and STMicroelectronics X-CUBE-AI documentation. Production-grade references for actual MCU deployment pipelines.
Digital twins at fleet scale
- Tao, F., Zhang, H., Liu, A., Nee, A. Y. C. — “Digital twin in industry: State-of-the-art.” IEEE Trans. Industrial Informatics, 2019. Broad survey across domains.
- Grieves, M. and Vickers, J. — “Digital twin: Mitigating unpredictable, undesirable emergent behavior in complex systems.” In Transdisciplinary Perspectives on Complex Systems, 2017. Source of the now-standard “digital twin” terminology.
- Kapteyn, M. G., Pretorius, J. V. R., Willcox, K. E. — “A probabilistic graphical model foundation for enabling predictive digital twins at scale.” Nature Computational Science, 2021.
Unsupervised anomaly detection
- Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., Williamson, R. C. — “Estimating the support of a high-dimensional distribution.” Neural Computation, 2001. The one-class SVM.
- Liu, F. T., Ting, K. M., Zhou, Z.-H. — “Isolation forest.” ICDM, 2008. The canonical tree-based anomaly detector.
- Ruff, L., Vandermeulen, R., Goernitz, N., Deecke, L., Siddiqui, S. A., Binder, A., Müller, E., Kloft, M. — “Deep one-class classification” (Deep SVDD). ICML, 2018.
- Ruff, L. et al. — “A unifying review of deep and shallow anomaly detection.” Proc. IEEE, 2021. The modern survey.
Distributed & networked FDI
- Shames, I., Teixeira, A. M. H., Sandberg, H., Johansson, K. H. — “Distributed fault detection for interconnected second-order systems.” Automatica, 2011.
- Ferrari, R. M. G., Parisini, T., Polycarpou, M. M. — “Distributed fault detection and isolation of large-scale discrete-time nonlinear systems: An adaptive approximation approach.” IEEE TAC, 2012.
- Boem, F., Ferrari, R. M. G., Parisini, T. — “Distributed fault detection and isolation of continuous-time non-linear systems.” European J. Control, 2011.
- Daigle, M. J., Bregon, A., Roychoudhury, I. — “Distributed diagnosis in formations of mobile robots.” IEEE TRA, 2007. Multi-agent distributed diagnosis.