While studies on visual statistical learning focus on how specific chunks based on co-occurrence of observable elements are learned, they typically neglect exploring the role of knowledge about the higher-level structure of these chunks in learning. We studied this role of structural knowledge by investigating how first being exposed to only horizontal or vertical shape-pairs in scenes affected the subsequent implicit learning of both vertical and horizontal pairs defined by completely novel shapes. In 6 experiments, we found that participants with more explicit knowledge of individual pairs were immediately able to generalize structural knowledge by extracting new pairs with matching orientation better and they kept this ability after both awake and sleep consolidation. In contrast, participants with weaker, more implicit knowledge and without consolidation showed a structural novelty effect, learning better new non-matching pairs. However, after sleep consolidation, this pattern reversed and they showed generalization similar to the “explicit” participants. This reversal did not occur after awake consolidation of the same duration as participants showed strong proactive interference and learned no new pairs. We validated our findings by multiple measures of explicitness both at the participant level (free report) and at the item level (confidence judgments) and by inducing explicitness via instructions. Furthermore, matched sample analysis revealed that the difference between “explicit” and “implicit” participants was not predicted by different strengths of learning in the first exposure phase, but only by the quality of the structural knowledge. Our results show that knowledge of higher structure underlying visual chunks is automatically extracted even in an unsupervised setup and has differential effects depending on the complexity of the extracted knowledge. Moreover, sleep consolidation facilitates transformation of structural knowledge in memory. Overall, these results highlight how momentary learning interacts with already acquired structural knowledge, leading to complex hierarchical knowledge of the visual environment.