While studies on spatial visual statistical learning typically focus on specific co-occurrence-based element chunks, they neglect the role of learning of the structure underlying these chunks. We study this structural learning by investigating the effect of first learning only horizontal or vertical shape-pairs on the subsequent learning of both vertical and horizontal (i.e. matching and non-matching) pairs defined by novel tokens. In 4 experiments, we show that participants with more explicit knowledge of pairs are immediately able to generalise structural knowledge by extracting new pairs with matching orientation better and keep this ability after both awake and sleep consolidation. In contrast, participants with more implicit knowledge and without consolidation show a structural novelty effect, learning better new non-matching pairs. However, after sleep consolidation, this pattern reverses and they show generalisation similar to the explicit participants. This reversal does not occur after awake consolidation of the same duration as participants show strong proactive interference and learn no new pairs. Our results show that knowledge of higher structure underlying visual chunks is extracted in vision and has differential effects depending on the quality of the extracted knowledge. In addition, sleep consolidation facilitates memory transformation to structural knowledge.