110k categories of densely packed items in store
SKU-110K images were collected from thousands of supermarket stores around the world, including locations in the United States, Europe, and East Asia. Dozens of paid associates acquired our images, using their personal cellphone cameras. Images were originally taken at no less than five mega-pixel resolution but were then JPEG compressed at one megapixel. Otherwise, phone and camera models were not regulated or documented. Image quality and view settings were also unregulated and so our images represent different scales, viewing angles, lighting conditions, noise levels, and other sources of variability. Bounding box annotations were provided by skilled annotators. We chose experienced annotators over unskilled, Mechanical Turkers, as we found the boxes obtained this way were more accurate and did not require voting schemes to verify correct annotations. We did, however, visually inspect each image along with its detection labels, to filter obvious localization errors