This assignment is due on May. 7, 2021 at 11:59pm UTC+8.

Please download training set (images&label) and testing set (only images) from Google Drive

Download the data from Tsinghua Cloud if you can't open the link above.

1. Introduction to the competition

This mini competition consists of two tasks, object detection and pedestrian detection. You should train two models to solve them respectively.

The goal of the object detection task is to recognize objects from a number of visual object classes in street scenes . The detection of foreground objects is among the most critical requirements to facilitate self-driving applications. 6 of the most common object classes have been selected are: 1). car; 2). person; 3). Van; 4). Cyclist; 5). Tram; 6). Truck.

The goal of the pedestrian detection task is to design a detection model robust to crowd scenes. In crowd scenarios, different people occlude with each other with high overlaps and cause great difficulty of crowd occlusion.

Requirements of Assignment 1:

  • If you apply model ensemble, the number of different models are 3.
  • Except CUHK-SYSU, other external dataset is not allowed.
  • You can only submit two times each day.

Finaly Report Submission:

  • Introduce the process of converting CUHK-SYSU dataset.
  • Introduce your data augmentation .
  • Introduce your network architecture and training method .
  • Introduce other strategies of performance improvement .

2. Getting Started

Install MMDetection Github Link.

Download datasets. The datasets are compatible to format of PASCAL VOC 2012. A sample config file is uploaded to Google Drive, too.

Dataset Statistics
Name Training Testing
Object Detection 3000 1500
Pedestrian Detection 3000 1500

3. Add Customized Dataset

Some competition allow the participators to utilize external data to improve the model's generalization ability and avoid over-fitting. In this competition, we provide another pedestrian detection dataset, CUHK-SYSU, for you to extend your training set.

Please refer to the docs in MMDetection and README.txt in the datasets.

4. Data Analysis and Data Augmentation

Is there any problem in data? Possible answers: long-tail problem, occlusion problem, multi-scale problem, etc.

Some data augmentation method can help eliminate these problems. e.g., random erasing (paper link) can improve the model's robustness to occlusion data. Please analysis the characteristic of datasets and implement some data augmentations.

5. Model Designing and Optimizer Setting

MMDetection categorizes model components into 5 types:

  • backbone: usually an FCN network to extract feature maps, e.g., ResNet, MobileNet.
  • neck: the component between backbones and heads, e.g., FPN, PAFPN.
  • head: the component for specific tasks, e.g., bbox prediction and mask prediction.
  • roi extractor: the part for extracting RoI features from feature maps, e.g., RoI Align.
  • loss: the component in head for calculating losses, e.g., FocalLoss, L1Loss, and GHMLoss.

In this competition, we provide the results of Faster R-CNN baseline using default config file. You should try to improve at least one of a). backbone+neck; b). roi extractor; c). loss functions to achieve a better performance or shorter inference time than the baseline. Besides, we encourage the attempt at new paradigm for detection, likes OneNet (paper link) and SparseRCNN (paper link).

6. Model Ensemble

Model ensemble is an effective technology to improve the final performance on machine learning task. A machine learning ensemble consists of a concrete finite set of alternative models. Usually, the more difference among them, the better final performance they achieve.

Please investigate and implement model ensemble or some other technologies often used in competitions.

7. Submitting your work

Your final grade depends on two aspects:

  • You need to submit the detection results based on our provided testing set. We will evaluate the performance (AP&recall) of your model.
  • You need to submit your final code project and corresponding report.