Sampling RTB transactions in an online machine learning setting

We (the machine learning team at Jampp) strive to predict click-through rates (CTR) and conversion rates (CVR) for the real-time bidding (RTB) online advertising market by means of an in-house online machine learning platform based on a state-of-the-art stochastic gradient descent estimator. Our est...

Descripción completa

Detalles Bibliográficos
Autor principal: Pita, Carlos
Formato: Objeto de conferencia
Lenguaje:Inglés
Publicado: 2016
Materias:
Acceso en línea:http://sedici.unlp.edu.ar/handle/10915/56845
http://45jaiio.sadio.org.ar/sites/default/files/AGRANDA-11.pdf
Aporte de:SEDICI (UNLP) de Universidad Nacional de La Plata Ver origen
Descripción
Sumario:We (the machine learning team at Jampp) strive to predict click-through rates (CTR) and conversion rates (CVR) for the real-time bidding (RTB) online advertising market by means of an in-house online machine learning platform based on a state-of-the-art stochastic gradient descent estimator. Our estimation framework has already been covered in a previous paper, so here we want to focus on some peripheral aspects of our platform that, in spite of being of a somewhat ancillary nature, nevertheless tend to dominate development efforts and overall system complexity; namely, in order to feed the learning system we first need to sample a very high-volume stream of out-of-order and scattered-in-time events and consolidate them into a sequence of observations representing the underlying market transactions, each observation composed of a set of features and a response, from which the estimator is ultimately able to learn. This paper is written in a down-to-earth fashion: we describe a number of particular difficulties the general problem of sampling in an online high-volume setting poses and then we present our concrete answers to those difficulties based on real, hands-on, experience.