We investigated how well people discriminate between different statistical structures in letter sequences. Specifically, we asked to what extent do people rely on feature-based aspects vs. lower-level statistics of the input when it was generated by simple or by more hierarchical processes. Using two symbols, we generated twelve-element sequences according to one of three different generative processes: a biased coin toss, a two-state Markov process, and a hierarchical Markov process, in which the states of the higher order model determine the parameters of the lower order model. Subjects performed sequence discrimination in a 2-AFC task. In each test trial they had to decide whether two sequences originated from the same process or from different ones. We analyzed stimulus properties of the three sets of strings and trained a machine learning algorithm to discriminate between the stimulus classes based either on the identity of the elements in the strings or by a feature vector derived for each string, which used 13 of the most common features split evenly between summary statistics (mean, variance, etc.) and feature-based descriptors (repetitions, alternations). The learning algorithm and subjects were trained and tested on the same sequences to identify the most significant features used by the machine and humans, and to compare the two rankings. Not only there was a significant agreement between the ranks of features for machine and humans, but both used a mixture of feature-based and statistical descriptors. The two most important features for humans were ratio between relative frequencies of symbols and existence of repeating triples. We also found a consistent asymmetry between repetition and alternation as repetitions of length three or higher were consistently ranked higher than alternations of the same length. We found that, without further help, humans did not take into account the complexity of the generative processes.

Leave a Reply

Your email address will not be published. Required fields are marked *