Sampling RTB transactions in an online machine learning setting

We (the machine learning team at Jampp) strive to predict click-through rates (CTR) and conversion rates (CVR) for the real-time bidding (RTB) online advertising market by means of an in-house online machine learning platform based on a state-of-the-art stochastic gradient descent estimator. Our est...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autor principal: Pita, Carlos
Formato: Articulo
Lenguaje:Inglés
Publicado: 2017
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/135156
https://publicaciones.sadio.org.ar/index.php/EJS/article/view/20
Aporte de:
Descripción
Sumario:We (the machine learning team at Jampp) strive to predict click-through rates (CTR) and conversion rates (CVR) for the real-time bidding (RTB) online advertising market by means of an in-house online machine learning platform based on a state-of-the-art stochastic gradient descent estimator. Our estimation framework has already been covered in a previous paper, so here we want to focus on some peripheral aspects of our platform that, in spite of being of a somewhat ancillary nature, nevertheless tend to dominate development efforts and overall system complexity; namely, in order to feed the learning system we first need to sample a very high-volume stream of out-of-order and scattered-in-time events and consolidate them into a sequence of observations representing the underlying market transactions, each observation composed of a set of features and a response, from which the estimator is ultimately able to learn. This paper is written in a down-to-earth fashion: we describe a number of particular difficulties the general problem of sampling in an online high-volume setting poses and then we present our concrete answers to those difficulties based on real, hands-on, experience.