In contrast with the traditional deterministic view of perception, a number of recent studies have argued that it is best captured by probabilistic computations. A crucial aspect of real-world scenes is that conflicting cues render stimuli ambiguous which results in multiple hypotheses being compatible with the stimuli. Although the effects of perceptual uncertainty have been well-characterized on perceptual decisions, the effects on learning have not been studied. Statistically optimal learning requires combining evidence from all alternative hypotheses weighted by their respective certainties, not only from the most probable interpretation. We tested whether human observers can learn about and make inferences in such situations. We used an unsupervised visual learning paradigm, in which ecologically relevant but conflicting cues gave rise to alternative hypotheses as to how unknown complex multi-shape visual scenes should be segmented. The strength of conflicting segmentation cues, “high-level” statistically learned and “low-level” grouping features of the input, were systematically manipulated in a series of experiments, and human performance was compared to Bayesian model averaging. We found that humans weighted and combined alternative hypotheses of scene description according to their reliability, demonstrating an optimal treatment of uncertainty in learning. These results capture not only the way adults learn to segment new visual scenes, but also the qualitative shift in learning performance from 8-month-old infants to adults. Our results suggest that models of perceptual learning that evaluate a single hypothesis with the “best explanatory power” instead of model averaging, are not sufficient for characterizing human visual learning