nano-JEPA: Democratizing Video Understanding with Personal Computers

The Video Joint Embedding Predictive Architecture (V-JEPA) has shown great promise in self-supervised video representation learning. However, its substantial computational demands, often necessitates powerful GPU clusters, limit accessibility for many researchers. We introduce nano-JEPA, a streamlin...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Rostagno, Adrián, Iparraguirre, Javier, Ermantraut, Joel, Tobio, Lucas, Foissac, Segundo, Aggio, Santiago, Friedrich, Guillermo Rodolfo
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2024
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/176281
Aporte de:
Descripción
Sumario:The Video Joint Embedding Predictive Architecture (V-JEPA) has shown great promise in self-supervised video representation learning. However, its substantial computational demands, often necessitates powerful GPU clusters, limit accessibility for many researchers. We introduce nano-JEPA, a streamlined adaptation of V-JEPA designed to run efficiently on resource-constrained personal computers, even those with only CPUs. Additionally, we present the nano-datasets repository, facilitating the creation of manageable subsets from large public video datasets. Our work aims to democratize research in this field, enabling broader participation and experimentation with V-JEPA-like models. We demonstrate that nano-JEPA, trained on smaller datasets and hardware, can still achieve reasonable performance on downstream tasks, opening doors for further exploration and innovation.