Mathematical modeling of behavioral risk control under partial observability
DOI:
https://doi.org/10.15407/fmmit2026.42.050Keywords:
математичне моделювання, втрата контролю, навчання з підкріпленням, частково спостережуваний марковський процес, латентний стан, ризик-чутливе керування, рекурентна політика, умовна вартість під ризикомAbstract
This paper addresses the problem of mathematical modeling and prevention of transient control loss in stochastic human-machine systems characterized by a high cost of error. It is argued that classical control approaches based on Markov decision processes (MDP) are fundamentally limited for this task: since the true psychological state of the controlled object is a latent variable, the application of MDP inevitably leads to the problem of perceptual aliasing. To describe the hidden dynamics, a theoretical model is proposed that formalizes the control problem as a partially observable Markov decision process. The framework of recurrent reinforcement learning serves as the algorithmic basis. It is demonstrated that integrating the long short-term memory architecture provides the necessary mechanism for aggregating a sequence of noisy observations into a coherent behavioral trajectory, enabling the agent to infer the hidden risk level. Furthermore, a mathematical model for composite reward shaping is developed, departing from the standard maximization of expected return. By utilizing the conditional value at risk metric, the proposed model optimizes the control policy while accounting for heavy-tailed risks and worst-case scenarios of behavioral escalation. This work establishes a rigorous theoretical foundation for transitioning from static classification systems to algorithms for proactive and adaptive user support under conditions of uncertainty.
References
Bordelon, B., Cotler, J., Pehlevan, C., & Zavatone-Veth, J. A. (2025). Dynamically learning to integrate in recurrent neural networks [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2503.18754
Boucherie, R. J., & van Dijk, N. M. (Eds.). (2017). Markov decision processes in practice. Springer International Publishing. https://doi.org/10.1007/978-3-319-47766-4
Chen, Y. F., Everett, M., Liu, M., & How, J. P. (2017). Socially aware motion planning with deep reinforcement learning. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 1343–1350). https://doi.org/10.1109/IROS.2017.8202312
Chrisman, L. (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI-92) (pp. 183–188).
Chow, Y., Ghavamzadeh, M., Janson, L., & Pavone, M. (2018). Risk-constrained reinforcement learning with percentile risk criteria. Journal of Machine Learning Research, 18(167), 1–51.
Chow, Y. F., Tamar, A., Mannor, S., & Pavone, M. (2015). Risk-sensitive and robust decision-making: A CVaR optimization approach. In Advances in Neural Information Processing Systems 28 (NeurIPS 2015) (pp. 1522–1530). https://papers.neurips.cc/paper/6014-risk-sensitive-and-robust-decision-making-a-cvar-optimization-approach.pdf
Cunningham, P., Cord, M., & Delany, S. J. (2008). Supervised learning. In M. Cord & P. Cunningham (Eds.), Machine learning techniques for multimedia: Case studies on organization and retrieval (pp. 21–49). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-75171-7_2
Dabney, W., Ostrovski, G., Silver, D., & Munos, R. (2018). Implicit quantile networks for distributional reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning (ICML) (Vol. 80, pp. 1096–1105). Proceedings of Machine Learning Research. https://proceedings.mlr.press/v80/dabney18a.html
Garcia, F., & Rachelson, E. (2013). Markov decision processes. In O. Sigaud & O. Buffet (Eds.), Markov decision processes in artificial intelligence (pp. 1–38). Wiley. https://doi.org/10.1002/9781118557426.ch1
Hausknecht, M., & Stone, P. (2015). Deep recurrent Q-learning for partially observable MDPs [Preprint]. arXiv. https://doi.org/10.48550/arXiv.1507.06527
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1–2), 99–134. https://doi.org/10.1016/S0004-3702(98)00023-X
Lieder, F., & Griffiths, T. L. (2020). Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources. Behavioral and Brain Sciences, 43, Article e1. https://doi.org/10.1017/S0140525X1900061X
Liu, B. (2011). Supervised learning. In Web data mining: Exploring hyperlinks, contents, and usage data (pp. 63–132). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-19460-3
Lovejoy, W. S. (1991). A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research, 28(1), 47–65. https://doi.org/10.1007/BF02055574
Mattera, A., Alfieri, V., Granato, G., & Baldassarre, G. (2025). Chaotic recurrent neural networks for brain modelling: A review. Neural Networks, 184, Article 107079. https://doi.org/10.1016/j.neunet.2024.107079
Nasteski, V. (2017). An overview of the supervised machine learning methods. Horizons, 4, 51–62. https://doi.org/10.20544/HORIZONS.B.04.1.17.P05
Ni, X., & Lai, L. (2024). Robust risk-sensitive reinforcement learning with conditional value-at-risk. In Proceedings of the 2024 IEEE Information Theory Workshop (ITW) (pp. 520–525). IEEE. https://doi.org/10.1109/ITW61385.2024.10806953
Puterman, M. L. (1990). Markov decision processes. In D. P. Heyman & M. J. Sobel (Eds.), Stochastic models (Vol. 2, pp. 331–434). Elsevier. https://doi.org/10.1016/S0927-0507(05)80172-0
Rafferty, A. N., Brunskill, E., Griffiths, T. L., & Shafto, P. (2016). Faster teaching via POMDP planning. Cognitive Science, 40(6), 1290–1332. https://doi.org/10.1111/cogs.12290
Ren, X., Wei, W., Xia, L., & Huang, C. (2025). A comprehensive survey on self-supervised learning for recommendation. ACM Computing Surveys, 58(1), 1–38. https://doi.org/10.1145/3746280
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms [Preprint]. arXiv. https://doi.org/10.48550/arXiv.1707.06347
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Олександр Чабан, Володимир Гладун (Автор)

This work is licensed under a Creative Commons Attribution 4.0 International License.