nano-JEPA: Democratizing Video Understanding with Personal Computers

The Video Joint Embedding Predictive Architecture (V-JEPA) has shown great promise in self-supervised video representation learning. However, its substantial computational demands, often necessitates powerful GPU clusters, limit accessibility for many researchers. We introduce nano-JEPA, a streamlin...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Rostagno, Adrián, Iparraguirre, Javier, Ermantraut, Joel, Tobio, Lucas, Foissac, Segundo, Aggio, Santiago, Friedrich, Guillermo Rodolfo
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2024
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/176281
Aporte de:
id I19-R120-10915-176281
record_format dspace
spelling I19-R120-10915-1762812025-02-07T20:05:00Z http://sedici.unlp.edu.ar/handle/10915/176281 nano-JEPA: Democratizing Video Understanding with Personal Computers Rostagno, Adrián Iparraguirre, Javier Ermantraut, Joel Tobio, Lucas Foissac, Segundo Aggio, Santiago Friedrich, Guillermo Rodolfo 2024-10 2024 2025-02-07T16:57:48Z en Ciencias Informáticas feature prediction unsupervised learning visual representations video joint-embedding predictive architecture The Video Joint Embedding Predictive Architecture (V-JEPA) has shown great promise in self-supervised video representation learning. However, its substantial computational demands, often necessitates powerful GPU clusters, limit accessibility for many researchers. We introduce nano-JEPA, a streamlined adaptation of V-JEPA designed to run efficiently on resource-constrained personal computers, even those with only CPUs. Additionally, we present the nano-datasets repository, facilitating the creation of manageable subsets from large public video datasets. Our work aims to democratize research in this field, enabling broader participation and experimentation with V-JEPA-like models. We demonstrate that nano-JEPA, trained on smaller datasets and hardware, can still achieve reasonable performance on downstream tasks, opening doors for further exploration and innovation. Red de Universidades con Carreras en Informática Objeto de conferencia Objeto de conferencia http://creativecommons.org/licenses/by-nc-sa/4.0/ Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) application/pdf 94-103
institution Universidad Nacional de La Plata
institution_str I-19
repository_str R-120
collection SEDICI (UNLP)
language Inglés
topic Ciencias Informáticas
feature prediction
unsupervised learning
visual representations
video
joint-embedding predictive architecture
spellingShingle Ciencias Informáticas
feature prediction
unsupervised learning
visual representations
video
joint-embedding predictive architecture
Rostagno, Adrián
Iparraguirre, Javier
Ermantraut, Joel
Tobio, Lucas
Foissac, Segundo
Aggio, Santiago
Friedrich, Guillermo Rodolfo
nano-JEPA: Democratizing Video Understanding with Personal Computers
topic_facet Ciencias Informáticas
feature prediction
unsupervised learning
visual representations
video
joint-embedding predictive architecture
description The Video Joint Embedding Predictive Architecture (V-JEPA) has shown great promise in self-supervised video representation learning. However, its substantial computational demands, often necessitates powerful GPU clusters, limit accessibility for many researchers. We introduce nano-JEPA, a streamlined adaptation of V-JEPA designed to run efficiently on resource-constrained personal computers, even those with only CPUs. Additionally, we present the nano-datasets repository, facilitating the creation of manageable subsets from large public video datasets. Our work aims to democratize research in this field, enabling broader participation and experimentation with V-JEPA-like models. We demonstrate that nano-JEPA, trained on smaller datasets and hardware, can still achieve reasonable performance on downstream tasks, opening doors for further exploration and innovation.
format Objeto de conferencia
Objeto de conferencia
author Rostagno, Adrián
Iparraguirre, Javier
Ermantraut, Joel
Tobio, Lucas
Foissac, Segundo
Aggio, Santiago
Friedrich, Guillermo Rodolfo
author_facet Rostagno, Adrián
Iparraguirre, Javier
Ermantraut, Joel
Tobio, Lucas
Foissac, Segundo
Aggio, Santiago
Friedrich, Guillermo Rodolfo
author_sort Rostagno, Adrián
title nano-JEPA: Democratizing Video Understanding with Personal Computers
title_short nano-JEPA: Democratizing Video Understanding with Personal Computers
title_full nano-JEPA: Democratizing Video Understanding with Personal Computers
title_fullStr nano-JEPA: Democratizing Video Understanding with Personal Computers
title_full_unstemmed nano-JEPA: Democratizing Video Understanding with Personal Computers
title_sort nano-jepa: democratizing video understanding with personal computers
publishDate 2024
url http://sedici.unlp.edu.ar/handle/10915/176281
work_keys_str_mv AT rostagnoadrian nanojepademocratizingvideounderstandingwithpersonalcomputers
AT iparraguirrejavier nanojepademocratizingvideounderstandingwithpersonalcomputers
AT ermantrautjoel nanojepademocratizingvideounderstandingwithpersonalcomputers
AT tobiolucas nanojepademocratizingvideounderstandingwithpersonalcomputers
AT foissacsegundo nanojepademocratizingvideounderstandingwithpersonalcomputers
AT aggiosantiago nanojepademocratizingvideounderstandingwithpersonalcomputers
AT friedrichguillermorodolfo nanojepademocratizingvideounderstandingwithpersonalcomputers
_version_ 1845116777788342272