Although some animals such as honeybees (Apis mellifera) are excellent visual learners, little is known about their spontaneously emerging internal representations of the visual environment. We investigated whether learning mechanisms and resulting internal representations are similar across different species by using the same modified visual statistical learning paradigm in honeybees and humans. Observers performed an unrelated discrimination task while being exposed to complex visual stimuli consisting of simple shapes with varying underlying statistical structures. Familiarity tests was used for assessing the emergent internal representation in three conditions exploiting whether each of three different statistics (single shape frequencies, co-occurrence probabilities and conditional probability between neighboring shapes) were sufficient for solving the familiarity task. We found an increasingly complex representation of the visual environment as we moved from honeybees to human infant and to adults. Honeybees automatically learned the joint probabilities of the shapes after extended familiarization, but didn’t show sensitivity to the conditional probabilities and they didn’t learn concurrently the single-element frequencies. As we know from previous studies, infants implicitly learn joint- and conditional probabilities, but they aren’t sensitive to concurrent element frequencies either. Adult results in this study were in line with previous results showing that they spontaneously acquired all three statistics. We found that these results could be reproduced by a progression of models: while honeybee behavior could be captured by a learning method based on a simple counting strategy, humans learned differently. Replicating infant’s behavior required a probabilistic chunk learner algorithm. The same model could also replicate the adult behavior, but only if it was further extended by co-representation of higher order chunk and low-level element representations. In conclusion, we’ve found a progression of increasingly complex visual learning mechanisms that were necessary to account for the differences in the honeybee, human infant- and adult behavioral results.