Based on “phantom-word” experiments in auditory word-segmentation tasks, two contradicting sets of results and accordingly, two contradicting proposals about human language learning were reported recently. According to the first one, transitional probabilities are weighted higher than word frequencies, but relying purely on statistical information is not sufficient for word learning, additional prosodic cues need to be used. Using the same paradigm but obtaining the opposite result, the other study claims that statistical information is sufficient to segment streams into words and this behavior can be faithfully captured by a chunk-learning model. To resolve this contradiction, we show that an empirically supported probabilistic chunk-learning schema places the original test in a new context: as observers are exposed to the stream, they develop an internal representation that initially favors a simpler description of the environment that phantom-words provide, but with more exposure it switches to preference of true words instead. Therefore, segmentation through statistical learning is possible. In order to decrease the possible biases introduced by one’s native language, we test this explanation with segmentation tasks of non-linguistic noise streams rather than using classical word segmentation tasks. Our preliminary results in this non-linguistic acoustic domain are in line with our proposition.