Current models of human visual decision making based on sequentially provided samples posit that people make their next decision by unconsciously compensating the statistical discrepancy between measures collected in the long past and those collected very recently. While this proposal is compelling, it does not qualify as a rigorous model of human decision making. In addition, recent empirical evidence suggests that human visual decision making not only balances long- and short-term summary statistics of sequences, but in parallel, it also encodes salient features, such as repetitions, and in addition, it relies on a generic assumption of non-discriminative flat prior of events in the environment. In this study, we developed a normative model that captures these characteristics. Specifically, we built a constrained Bayesian ideal observer with a generative model having features as follows. First, data is generated randomly but not necessarily independently depending on its parameter selection. Second, the system has a memory capacity denoted by a small window size = t, and a world representation denoted by a large window size = T, the latter reflecting the observer’s belief of the volatility of the world, i.e. the extent to which changes should be represented. Third, events can be described with pi appearance probability, which is not constant in time but changes according to a Markovian update, and it has an initial strong peak at 50%. Fourth, observations are noisy so that the observer can collect only limited amount of information (𝛾i) from each sample image. We implemented the above model and training on human data, we determined the optimal parameters for T, t, and inferred the evolving pi for each subject. Our model could capture the behavior of human observers, for example their deviation from binomial distribution based on T, and t, and the negative correlation between recent and past decisions.