The emergence of the concept defining a discrete object in the brain is a fundamental yet poorly understood process. In two statistical learning experiments, we show that humans can form these abstract concepts via purely visual statistics or physical interactions, which nevertheless will generalize across these two modalities. Participants saw a sequence of visual scenes composed of multiple objects. Each object consisted of abstract shapes, where object identity was only defined either by the shape co-occurrences across scenes (Exp. 1), or by the physical effort required to pull the scene apart (Exp. 2). In Experiment 1, observers learned the visual statistical contingencies across the scenes (measured with visual familiarity test), and this knowledge also generalized to their judgments as to how they would pull apart novel scenes (in pulling-apart-object test). In Experiment 2, participants learned haptic statistics across scenes: in the pulling-apart-object test they preferred easier pulling directions as defined by underlying object boundaries. Moreover, this haptic learning also biased participants’ judgments in the purely visual familiarity test. Thus, objects can be extracted solely based on visual or haptic statistics while still retaining an integrated quality that allows generalization across modalities, which is a hallmark of object-like representations.