[Paper Review] A brief Two-stage instance segmentation review

공부/paper review

[Paper Review] A brief Two-stage instance segmentation review

Parkchanmin 2022. 12. 2. 11:47

Instance segmenation에서는 two stage, One stage, multi stage로 구별이 가능하다.

Two stage instance segmentation

Top down instance segmentation
- Backone에서 featuer를 가지고 instance segmenation을 하는 방법을 말한다.
- R-CNN
  - 아주 일반적으로 알고 있는 network로 Two stage의 전형적인 방법 중에 하나이다. Region Proposal Network (RPN)을 통해서 featuer에서 region을 뽑아내어 해당 영역에 bbox를 뽑아내기도 하며 이를 개선된 버전의 경우 Mask branch를 추가한 Mask-RCNN이 있으며 에측된 Mask를 IOU기준으로 score를 넣어서 만든 Maks-RCNN-score도 있다.
- Contour information Although Mask R-CNN
  - Mask RCNN에서는 방향성이 없어서 point of view가 약간만 달라져도 contour information이 달라져서 mask가 정확하지 않는 점이 있다 이를 개선하기 위해서 direction도 학습을 하여 만든 MaskLab이라는 방법도 있다.
  - 또한 polar coordinates를 사용해서 내부 centroid를 기준으로 contour를 얻어 instance mask를 구하는 경우와 chebysheve polynoimal를 사용하거나 snake algorithm를 사용해서정확한 contour를 얻는 방법 또한 제안이 된다.
- Dense sliding window
  - 대표적으로 Deep Mask, SharpMask가 있으며 FCN에서 featuer mask를 만드는 방법을 말하며 이를 skip connection처럼 주어서 개선한 형태를 sharpMask이다.
- Multi level feature
  - 위와 같은 network에서 좀더 다양한 layer에 있는 feauter를 뽑아서 만든 네트워크는 PANet이라고 부르며 밑의 첨부한 그림과 같이 되어진다.
- 이점 및 단점
  - network가 상당히 simple하면서도 robust한 성능이 나오는 점이 있으며 mask의 branch만 추가해주면 되니 쉽게 적용도 가능하다.
  - 하지만 결국 obejct detection에 좀더 의존하는 경향이 있다보니 이전의 network의 성능이 떨어지면 instance segmenation의 성능도 많이 떨어질수 밖에없다. seg에서는 찾아지는점이 obejct detection에 대해서는 못찾을수도 있기 때문에 아쉬운점이 있따.

Bottom up instance segmentation
- The main difference among different bottom-up instance segmentation methods is on how to perform pixel-level semantic projection and aggregate semantic projection into different object instance
- 위와 같은 방법은 바로 pixel level에서 다른 instance를 구별하는 방법을 말한다.
- 물론 two stage방법이긴 다른 방식으로 접근을 한다.
- Obejct Box cropping
  - 위와같은 방법은 pixel level과 obejct detection을 통해서 semantic segmenation과 CRF를 적용한 pixel wise classification을 통해서 instance segmenation을 한 후 다시 CRF 해서 최종적인 결과를 내는 DeepCRFs가 있으며
  - 좀 더 end-to-end로 개선시킨 방법인 shape term과 globl term branch를 추가시켜 만든 Dynamically Instantiated Network가 있다.
- Contour inforation
  - 다른 접근 방법은 semantic segmentation mask를 기반으로 Watershed를 적용시킨 방법도 있다. (Deep Watershed Transform)
- Pixel center catergorization & Depth information
- Clustering Based on off-the-shelf semantic segmentation architectures
  - use the pixel affinity information as the clustering clue to distinguish object instances.
- 단점
  - First, a robust semantic segmentation network backbone is needed to project each pixel into high-level dimensional space. Second, the postprocessing method has poor generalization ability and cannot handle complex cases,

Multi-stage method

cascade network
- Instance 유무 -> mask 생성 -> classificaion (Multi task-network cascade)
- RPN에서 bbox생성 -> original feature map과 이전 bbox의 IOU로 좀더 좋은 bbox생성으로 instance seg 향상 (cascade RCNN)