Recently, we provided evidence that similarly to simple visual stimuli, such as Gabor patches, rich episodic stimuli are also encoded in and recalled from long-term memory with their subjective uncertainty, indicating a probabilistic representation of memory details. However, it is unknown how this probabilistic form of representation and episodic recall accuracy are affected at various input/set sizes in situations with added underlying regularities (compact distribution of possible orientations) and when subjects’ attention levels vary. To address these questions, we conducted multiple memory experiments (N = 180), in which participants first viewed a varying number of individually presented oriented objects and later had to recall the objects and their orientation together with their subjective uncertainty. Probabilistic encoding was indicated by calibratedness—the degree of correlation between memory accuracy and subjective uncertainty. We found that at smaller set sizes, added orientation regularity significantly improved episodic recall, while increased attention modulated the critical set size where this effect appeared. At larger set sizes, calibratedness became biased at lower levels of certainty, but completely disappeared only when subjects failed to recall the stimulus. High attention also modulated the critical set size determining the onset of bias in calibratedness, as with accuracy. Importantly, objects recalled with the highest accuracy remained unaffected by underlying input structure, both regarding the reported orientation and calibratedness. Our results demonstrate that people extract underlying input regularities successfully both with low and high attention in object viewing, although in each case, they seem to prioritize item-based encoding and utilize the underlying structure as a guide. Further, calibratedness remains high whenever object memory is vivid and disappears only when subjects cannot remember anything from the individual objects, suggesting that probabilistic encoding is the default form of representing the details of long-term episodic memories, regardless of attention, set size, and input characteristics.