Mathematical modeling of behavioral risk control under partial observability

Authors

  • Олександр Чабан
  • Володимир Гладун

DOI:

https://doi.org/10.15407/fmmit2026.42.050

Keywords:

математичне моделювання, втрата контролю, навчання з підкріпленням, частково спостережуваний марковський процес, латентний стан, ризик-чутливе керування, рекурентна політика, умовна вартість під ризиком

Abstract

This paper addresses the problem of mathematical modeling and prevention of transient control loss in stochastic human-machine systems characterized by a high cost of error. It is argued that classical control approaches based on Markov decision processes (MDP) are fundamentally limited for this task: since the true psychological state of the controlled object is a latent variable, the application of MDP inevitably leads to the problem of perceptual aliasing. To describe the hidden dynamics, a theoretical model is proposed that formalizes the control problem as a partially observable Markov decision process. The framework of recurrent reinforcement learning serves as the algorithmic basis. It is demonstrated that integrating the long short-term memory architecture provides the necessary mechanism for aggregating a sequence of noisy observations into a coherent behavioral trajectory, enabling the agent to infer the hidden risk level. Furthermore, a mathematical model for composite reward shaping is developed, departing from the standard maximization of expected return. By utilizing the conditional value at risk metric, the proposed model optimizes the control policy while accounting for heavy-tailed risks and worst-case scenarios of behavioral escalation. This work establishes a rigorous theoretical foundation for transitioning from static classification systems to algorithms for proactive and adaptive user support under conditions of uncertainty.

References

Bordelon, B., Cotler, J., Pehlevan, C., & Zavatone-Veth, J. A. (2025). Dynamically learning to integrate in recurrent neural networks [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2503.18754

Boucherie, R. J., & van Dijk, N. M. (Eds.). (2017). Markov decision processes in practice. Springer International Publishing. https://doi.org/10.1007/978-3-319-47766-4

Chen, Y. F., Everett, M., Liu, M., & How, J. P. (2017). Socially aware motion planning with deep reinforcement learning. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 1343–1350). https://doi.org/10.1109/IROS.2017.8202312

Chrisman, L. (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI-92) (pp. 183–188).

Chow, Y., Ghavamzadeh, M., Janson, L., & Pavone, M. (2018). Risk-constrained reinforcement learning with percentile risk criteria. Journal of Machine Learning Research, 18(167), 1–51.

Chow, Y. F., Tamar, A., Mannor, S., & Pavone, M. (2015). Risk-sensitive and robust decision-making: A CVaR optimization approach. In Advances in Neural Information Processing Systems 28 (NeurIPS 2015) (pp. 1522–1530). https://papers.neurips.cc/paper/6014-risk-sensitive-and-robust-decision-making-a-cvar-optimization-approach.pdf

Cunningham, P., Cord, M., & Delany, S. J. (2008). Supervised learning. In M. Cord & P. Cunningham (Eds.), Machine learning techniques for multimedia: Case studies on organization and retrieval (pp. 21–49). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-75171-7_2

Dabney, W., Ostrovski, G., Silver, D., & Munos, R. (2018). Implicit quantile networks for distributional reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning (ICML) (Vol. 80, pp. 1096–1105). Proceedings of Machine Learning Research. https://proceedings.mlr.press/v80/dabney18a.html

Garcia, F., & Rachelson, E. (2013). Markov decision processes. In O. Sigaud & O. Buffet (Eds.), Markov decision processes in artificial intelligence (pp. 1–38). Wiley. https://doi.org/10.1002/9781118557426.ch1

Hausknecht, M., & Stone, P. (2015). Deep recurrent Q-learning for partially observable MDPs [Preprint]. arXiv. https://doi.org/10.48550/arXiv.1507.06527

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101(1–2), 99–134. https://doi.org/10.1016/S0004-3702(98)00023-X

Lieder, F., & Griffiths, T. L. (2020). Resource-rational analysis: Understanding human cognition as the optimal use of limited computational resources. Behavioral and Brain Sciences, 43, Article e1. https://doi.org/10.1017/S0140525X1900061X

Liu, B. (2011). Supervised learning. In Web data mining: Exploring hyperlinks, contents, and usage data (pp. 63–132). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-19460-3

Lovejoy, W. S. (1991). A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research, 28(1), 47–65. https://doi.org/10.1007/BF02055574

Mattera, A., Alfieri, V., Granato, G., & Baldassarre, G. (2025). Chaotic recurrent neural networks for brain modelling: A review. Neural Networks, 184, Article 107079. https://doi.org/10.1016/j.neunet.2024.107079

Nasteski, V. (2017). An overview of the supervised machine learning methods. Horizons, 4, 51–62. https://doi.org/10.20544/HORIZONS.B.04.1.17.P05

Ni, X., & Lai, L. (2024). Robust risk-sensitive reinforcement learning with conditional value-at-risk. In Proceedings of the 2024 IEEE Information Theory Workshop (ITW) (pp. 520–525). IEEE. https://doi.org/10.1109/ITW61385.2024.10806953

Puterman, M. L. (1990). Markov decision processes. In D. P. Heyman & M. J. Sobel (Eds.), Stochastic models (Vol. 2, pp. 331–434). Elsevier. https://doi.org/10.1016/S0927-0507(05)80172-0

Rafferty, A. N., Brunskill, E., Griffiths, T. L., & Shafto, P. (2016). Faster teaching via POMDP planning. Cognitive Science, 40(6), 1290–1332. https://doi.org/10.1111/cogs.12290

Ren, X., Wei, W., Xia, L., & Huang, C. (2025). A comprehensive survey on self-supervised learning for recommendation. ACM Computing Surveys, 58(1), 1–38. https://doi.org/10.1145/3746280

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms [Preprint]. arXiv. https://doi.org/10.48550/arXiv.1707.06347

Published

2026-06-18

How to Cite

Чабан, О. ., & Гладун, В. . (2026). Mathematical modeling of behavioral risk control under partial observability. PHYSICO-MATHEMATICAL MODELLING AND INFORMATIONAL TECHNOLOGIES, (42), 50–57. https://doi.org/10.15407/fmmit2026.42.050