Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. EPFL thesis
  4. Fast and Future: Towards Efficient Forecasting in Video Semantic Segmentation
 
doctoral thesis

Fast and Future: Towards Efficient Forecasting in Video Semantic Segmentation

Courdier, Evann Pierre Guy  
2024

Deep learning has revolutionized the field of computer vision, a success largely attributable to the growing size of models, datasets, and computational power. Simultaneously, a critical pain point arises as several computer vision applications are deployed on low-power embedded devices, necessitating real-time processing capabilities. This challenge intensifies for semantic segmentation, a dense prediction task demanding substantial memory and computational resources. This thesis explores techniques to streamline real-time segmentation networks, enhance their efficiency, and deal with potential ambiguity.

First, we introduce a latency-aware segmentation metric, a measure that combines the mean Intersection over Union with the network processing time, providing a practical metric for applied settings. Emphasis is placed on the concept of "anticipation" in real-time networks - these systems should be capable of predicting future input segmentation. Consequently, we then design an anticipatory convolutional network incorporating an inventive convolution layer. This novel layer reduces computation by reusing features from previous video frame computations, exploiting their temporal coherence. Next, we present a method to accelerate transformer-based segmentation networks called `patch-pausing'. This technique halts the processing of image patches deemed to be already correctly segmented by assessing the network's confidence in its prediction. Remarkably, our experimental results indicate that more than half of the patches can be paused early in the process, with a minimal impact on segmentation accuracy. This study concludes with the introduction of a discrete diffusion model for segmentation. This model allows for the sampling of multiple potential segmentations for a given input while accurately following the training data distribution. Combining this diffusion model within an autoregressive scheme, we successfully showcase its capacity to generate long-term future predictions of segmentation.

The implementation and evaluation of these approaches contribute to the ongoing efforts to improve real-time segmentation networks and facilitate more efficient deployment of computer vision applications on low-power devices.

  • Files
  • Details
  • Metrics
Loading...
Thumbnail Image
Name

EPFL_TH9858.pdf

Type

N/a

Access type

openaccess

License Condition

copyright

Size

35.19 MB

Format

Adobe PDF

Checksum (MD5)

1b0f9217c67d8b7102f3f31889c20cf4

Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés