Repository logo

Infoscience

  • English
  • French
Log In
Logo EPFL, École polytechnique fédérale de Lausanne

Infoscience

  • English
  • French
Log In
  1. Home
  2. Academic and Research Output
  3. Conferences, Workshops, Symposiums, and Seminars
  4. 18 GHz Ultraviolet Astrocomb via Chip-Integrated Harmonic Generation
 
Loading...
Thumbnail Image
conference paper

18 GHz Ultraviolet Astrocomb via Chip-Integrated Harmonic Generation

Ludwig, Markus
•
Ayhan, Furkan  
•
Voumard, Thibault
Show more
Oh, A.
•
Neumann, T.
Show more
2023
European Quantum Electronics Conference, EQEC 2023 in Proceedings Conference on Lasers and Electro-Optics/Europe, CLEO/Europe 2023 and European Quantum Electronics Conference EQEC 2023, Part of Conference on Lasers and Electro-Optics/Europe, CLEO/Europe 2023 and European Quantum Electroni
European Quantum Electronics Conference

Current machine learning models for vision are often highly specialized and limited to a single modality and task. In contrast, recent large language models exhibit a wide range of capabilities, hinting at a possibility for similarly versatile models in computer vision. In this paper, we take a step in this direction and propose a multimodal training scheme called 4M. It consists of training a single unified Transformer encoder-decoder using a masked modeling objective across a wide range of input/output modalities - including text, images, geometric, and semantic modalities, as well as neural network feature maps. 4M achieves scalability by unifying the representation space of all modalities through mapping them into discrete tokens and performing multimodal masked modeling on a small randomized subset of tokens. 4M leads to models that exhibit several key capabilities: (1) they can perform a diverse set of vision tasks out of the box, (2) they excel when fine-tuned for unseen downstream tasks or new input modalities, and (3) they can function as a generative model that can be conditioned on arbitrary modalities, enabling a wide variety of expressive multimodal editing capabilities with remarkable flexibility. Through experimental analyses, we demonstrate the potential of 4M for training versatile and scalable foundation models for vision tasks, setting the stage for further exploration in multimodal learning for vision and other domains.

  • Details
  • Metrics
Type
conference paper
Scopus ID

2-s2.0-85214922084

Author(s)
Ludwig, Markus
•
Ayhan, Furkan  
•
Voumard, Thibault
•
Wildi, Thibault
•
Gaafar, Mahmoud A.
•
Grassani, Davide
•
Obrzud, Ewelina
•
Schmidt, Tobias
•
Bouchy, François
•
Villanueva, Luis Guillermo  
Show more
Editors
Oh, A.
•
Neumann, T.
•
Globerson, A.
•
Saenko, K.
•
Hardt, M.
•
Levine, S.
Corporate authors
the Eurofusion Tokamak Exploitation team
Date Issued

2023

Publisher

Optical Society of America

Published in
European Quantum Electronics Conference, EQEC 2023 in Proceedings Conference on Lasers and Electro-Optics/Europe, CLEO/Europe 2023 and European Quantum Electronics Conference EQEC 2023, Part of Conference on Lasers and Electro-Optics/Europe, CLEO/Europe 2023 and European Quantum Electroni
ISBN of the book

9781713884156

Book part number

5

Series title/Series vol.

Advances in Neural Information Processing Systems; 36

ISSN (of the series)

1049-5258

Article Number

624

Start page

58363

End page

58408

Peer reviewed

REVIEWED

Written at

EPFL

EPFL units
NEMS  
Event nameEvent acronymEvent placeEvent date
European Quantum Electronics Conference

Munich, Germany

2023-06-26 - 2023-06-30

FunderFunding(s)Grant NumberGrant URL

Hanlin Goh and Elmira Amirloo Abolfathi

Available on Infoscience
January 26, 2025
Use this identifier to reference this record
https://infoscience.epfl.ch/handle/20.500.14299/244957
Logo EPFL, École polytechnique fédérale de Lausanne
  • Contact
  • infoscience@epfl.ch

  • Follow us on Facebook
  • Follow us on Instagram
  • Follow us on LinkedIn
  • Follow us on X
  • Follow us on Youtube
AccessibilityLegal noticePrivacy policyCookie settingsEnd User AgreementGet helpFeedback

Infoscience is a service managed and provided by the Library and IT Services of EPFL. © EPFL, tous droits réservés