nano-JEPA: Democratizing Video Understanding with Personal Computers
The Video Joint Embedding Predictive Architecture (V-JEPA) has shown great promise in self-supervised video representation learning. However, its substantial computational demands, often necessitates powerful GPU clusters, limit accessibility for many researchers. We introduce nano-JEPA, a streamlin...
Guardado en:
| Autores principales: | , , , , , , |
|---|---|
| Formato: | Objeto de conferencia |
| Lenguaje: | Inglés |
| Publicado: |
2024
|
| Materias: | |
| Acceso en línea: | http://sedici.unlp.edu.ar/handle/10915/176281 |
| Aporte de: |
| Sumario: | The Video Joint Embedding Predictive Architecture (V-JEPA) has shown great promise in self-supervised video representation learning. However, its substantial computational demands, often necessitates powerful GPU clusters, limit accessibility for many researchers. We introduce nano-JEPA, a streamlined adaptation of V-JEPA designed to run efficiently on resource-constrained personal computers, even those with only CPUs. Additionally, we present the nano-datasets repository, facilitating the creation of manageable subsets from large public video datasets. Our work aims to democratize research in this field, enabling broader participation and experimentation with V-JEPA-like models. We demonstrate that nano-JEPA, trained on smaller datasets and hardware, can still achieve reasonable performance on downstream tasks, opening doors for further exploration and innovation. |
|---|