COCO is a large-scale object detection, segmentation, and captioning dataset. It contains over 200 000 annotated images, 1.5 million object instances, 80 object categories, 91 stuff categories and 250 000 people with keypoints. COCO has several features: object segmentation, recognition in context and superpixel stuff segmentation.
For more information, please read Microsoft COCO: Common Objects in Context.
The annotations come as JSON files containing JSON arrays. There is an annotation file for each type of task (object detection, keypoint detection, captioning, stuff segmentation, panoptic segmentation). The annotation's structure is as follows:
The COCO dataset uses the same annotation format as the LVIS dataset. You can find a comprehensive annotation guide for the latter here.
You can download the dataset here. You can choose between a 2014 and a 2017 version of the dataset.
No official model has been provided for this dataset.
No offical benchmarks have been provided for this dataset. However, you can consult the object detection, keypoints detection, stuff segmentation, panoptic segmentation or captioning leaderboards.
The dataset has 5 associated annual challenges, from 2015 to 2020. Those can be consulted in the Tasks tab on the official website. Challenges contain from two up to four tasks, such as object detection, keypoint detection, stuff segmentation and panoptic segmentation.
Dataset licenced under the CC BY 4.0 licence.