Please upload a txt file in the form of firstname_lastname.txt
. Each line will have your answer to the corresponding question, e.g. Q1: 1
. Note that there can be multiple correct answers per question, in which case separate your answers with a comma, e.g. Q1: 1, 2, 3
. Example file:
Q1: 1
Q2: 1, 2
Q3: 3, 4
...
Upload your answers to this dropbox link by Tuesday, April 28, EOD.
Object detection from an image tackles the task of:
- Predicting wether an object category is present in the image
- Predicting the number of object instances in the image
- Predicting the bounding boxes and object category of all object instances in the image
Semantic segmentation from an image tackles the task of:
- Marking each pixel in the image with foreground vs. background
- Marking each pixel in the image with the object category it belongs to
- Marking each pixel in the image with depth ordering, starting from closest in the camera to furthest
Instance segmentation from an image tackles the task of:
- Detecting all objects in the image and predicting their silhouette
- Predicting the transformation of an object silhouette to fit a target mask
- Predicting the instances in an image that could be replaced by other similar shapes
R-CNN trains a CNN to classify object region candidates into C
classes, which are produced by a bottom-up region proposal pipeline.
- True
- False
For an image I
with M
object candidates, R-CNN will make this number of forward passes with the CNN:
M/2
M
- 1
Faster R-CNN trains a region proposal network (RPN), which shares the same network with the object detector:
- True
- False
In Faster R-CNN, the RPN predicts object regions which are:
- Category agnostic
- Category specific
In Faster R-CNN, the RPN places anchors at each location in the image of different aspect ratio and sizes. How many anchors per location are there:
- 3
- 9
- 128
Mask R-CNN introduces an ROIAlign layer which pools features from an input feature map for each proposed region by:
- Rounding the region box coordinates to integer grid locations and then sampling the features using bilinear interpolation
- Sampling features from a region using bilinear interpolation and without any rounding of the box coordinates.
ROIAlign mantains a one-to-one alignment between:
- The color of the predicted object and the object in the image
- The pooled region features and the corresponding feature to the image feature map
- The depth ordering of the predicted object and the object in the image
In FCN, the final output of predicted the semantic masks has the same resolution as the image input
- True
- False
In FCN, the Deconvolution layer (or Transpose Convolution) with stride s
results in an output of resolution
s
times the one of the input1/s
times the one of the input+s
of the input resolution
Consider the following little net: nn.Conv2d(3, 8, 3, stride=2) -> F.relu -> nn.Conv2d(8, 128, 3, stride=2) -> F.relu -> nn.ConvTranspose2d(128, 16, 3, stride=2)
. If the input to the net is [B, 3, H, W]
. Then the output shape is
[B, 128, H, W]
[B, 16, H/2, W/2]
[B, 16, H/4, W/4]
[B, 128, H/2, W/2]
[B, 128, H/4, W/4]
[B, 16, H, W]
Your next project mandates that you automatically predict how many chairs there are in any unseen image. What do you do?
- You run a FCN model pretrained on COCO
- You run a Mask R-CNN model pretrained on COCO
- Neither of the above
Your next project mandates that you automatically predict how many buffalos there are in any unseen image. What do you do?
- You run a FCN model pretrained on COCO
- You run a Mask R-CNN model pretrained on COCO
- Neither of the above