Publication: Design and development of a worldwide-scale measurement methodology and its application in network measurements and online advertising auditing
Loading...
Identifiers
Publication date
2020-09
Defense date
2020-09-08
Authors
Advisors
Tutors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
La publicidad online ha evolucionado hasta convertirse en un componente clave del Internet que conocemos hoy en día. Es un ecosistema muy complejo, que logra llegar a billones de usuarios en un corto período de tiempo. Tiene cobertura global, y es capaz de llegar a audiencias específicas basadas en aspectos demográficos, geográficos y de comportamiento. Las capacidades que ofrece el ecosistema de la publicidad online han abierto una nueva era en la investigación que ha atraído el interés de la comunidad científica.
Esta tesis, aprovecha la naturaleza de la publicidad online y construye una novedosa
metodología capaz de insertar código JavaScript en un anuncio, que se ejecuta cada vez que se muestra en el dispositivo de un usuario. Esta metodología abre nuevas oportunidades para realizar medidas. En concreto, esta metodología se aplica para dos propósitos diferentes en esta tesis: (1) Realizar medidas de red desde la perspectiva del usuario final, y (2) Auditar la transparencia del ecosistema de la publicidad online desde la perspectiva de los anunciantes.
En el contexto de las medidas de Internet, esta metodología se implementa en una soluciónllamada AdTag. Se discute y evalúa su diseño -incluyendo factores técnicos, de despliegue y económicos- y su potencial para analizar una amplia gama de aspectos de la conectividad a Internet desde el navegador. Se realizan varios experimentos que prueban la capacidad de AdTag para llegar a millones de nodos en un corto período de tiempo. Además, también se demuestra la posibilidad de seleccionar los nodos de medidas en función de su ubicación geográfica. En esta tesis, mostramos la utilidad de AdTag para realizar medidas de red en dos casos de uso específicos.
Primero, estudiamos la infraestructura DNS, uno de los sistemas más críticos de Internet.
Nuestro análisis aborda cuestiones como comprender la verdadera infraestructura DNS configurada por los ISP, y entender las opciones DNS de los usuarios finales, ya sea que utilicen los resolvers de los ISP privados o establezcan DNS resolvers de terceros, para mejorar la seguridad y el rendimiento de la web. Aprovechando la escala que ofrece el ecosistema de la publicidad online, se han lanzado dos campañas de publicidad que han conseguido más de 3 millones de resoluciones DNS, que permiten la identificación y el estudio de más de 76k DNS resolvers cubriendo más de 25k ASes en 178 países. El análisis de los datos proporciona nuevos conocimientos sobre la infraestructura DNS, como las preferencias de los usuarios con respecto a terceros. Nuestros resultados indican que el 13% de los usuarios utilizan proveedores DNS de terceros (como Google, OpenDNS, Level 3, y Cloudflare). Además, esta investigación detecta diferentes decisiones de despliegue de muchos ISP, que proporcionan acceso a redes tanto móviles como fijas, que separan
la infraestructura DNS que sirve a cada tipo de red de acceso.
El segundo caso de uso considerado consiste en analizar el escenario del mercado de navegadores mediante medidas activas. Aprovechamos AdTag para desarrollar una plataforma de medidas activa para obtener la marca y la versión del dispositivo que recibe el anuncio. Demostramos que la muestra obtenida con nuestra metodología es muy similar a la que ofrecen las técnicas de vanguardia basadas en medidas pasivas. Sin embargo, nuestra solución presenta algunas ventajas con respecto a las soluciones pasivas: la capacidad de llevar a cabo medidas dirigidas geográfica y demográficamente, además de su accesibilidad a un grupo más amplio de científicos
y profesionales. El rendimiento, la precisión y las capacidades de esta metodología se analizan a través de experimentos reales que, en total, produjeron más de 6M de medidas.
La falta de transparencia en el ecosistema de la publicidad online motiva la segunda parte de esta tesis. En particular, hemos desarrollado Q-Tag, una novedosa metodología que sirve para auditar las métricas de calidad de la publicidad online para que los anunciantes puedan obtener información fiable sobre el desempeño real de sus campañas publicitarias.
La primera versión de Q-Tag fue desplegada en Google AdWords. Los resultados revelan que AdWords parece proporcionar información incompleta a los anunciantes. En particular, muestran que: (i) AdWords no informó sobre el 57% de los publishers, en los que se mostraron impresiones de anuncios de nuestras campañas, (ii) AdWords informa sobre una gran fracción de impresiones contextualmente significativas, basadas en criterios (no revelados) distintos del tema de los publishers, (iii) una mayor inversión en CPM no conduce a que las impresiones se entreguen a publishers más populares, (iv) AdWords no ofrece un control predeterminado sobre el frequency
cap (límite de impresiones por usuario), (v) alrededor del 10% de las impresiones de anuncios en dos de las campañas se entregaron a IPs de Data Centers.
La segunda versión de Q-Tag fue desarrollada para medir la métrica de viewability. Esta métrica estándar sirve para evaluar si una impresión de un anuncio ha sido vista o no por un usuario. Q-Tag ha sido desplegado en producción por un Demand Side Platform (DSP) (Plataforma del lado de la demanda) para medir el índice de visibilidad de las campañas publicitarias.
Aprovechando la infraestructura de este DSP, se ha comparado el rendimiento de Q-Tag con una solución comercial. Ambas técnicas informan de una viewability global similar del 50% (es decir, el 50% de las impresiones cumplen con el estándar de viewability y por lo tanto se consideran vistas). Sin embargo, Q-Tag es capaz de medir la métrica de viewability en el 93% de los anuncios servidos por el DSP a diferencia del 74% de los anuncios medidos por la solución comercial.
En resumen, la investigación realizada en esta tesis muestra el potencial de la metodología de medidas a gran escala basada en anuncios, que ofrece un mayor rango de posibilidades más allá de las presentadas en esta tesis. Una metodología que puede desentrañar diferentes aspectos de la infraestructura y el rendimiento de Internet desde la perspectiva del usuario final, así como proporcionar una herramienta independiente para que los anunciantes midan la calidad de sus campañas publicitarias.
Online advertising has evolved into a key component of the Internet we know today. It is a very complex ecosystem that accomplishes to reach billions of users in a short period of time. It has global coverage, and it is able to target specific audiences based on demographic, geographic, and behavioral aspects. The capabilities offered by the online advertising ecosystem have opened a new era in research that has attracted the interest of the scientific community. This thesis leverages the nature of online advertising and builds a novel methodology capable of inserting JavaScript code into an ad that runs every time it is displayed on a user’s device. This methodology opens up new measurement opportunities. Specifically, this methodology is applied for two different purposes in this thesis: (1) Performing network measurements from the end-user perspective, and (2) Auditing the transparency of the online advertising ecosystem from the advertisers’ perspective. In the context of Internet measurements, this methodology is implemented in a solution referred to as AdTag. Its design - including technical, deployability, and economic factors – and its potential to analyze a wide range of aspects of Internet connectivity from the browser are discussed and evaluated. Several experiments are performed that prove the ability of AdTag to reach millions of nodes in a short period of time. Furthermore, the possibility of selecting the measurement nodes based on its geographical location is also demonstrated. In this thesis, we showcase the utility of AdTag to conduct network measurements in two specific use cases. First, we study the DNS infrastructure, one of the most critical Internet systems. Our analysis addresses issues such us grasping the real DNS infrastructure configured by the ISPs, and understanding the end-users DNS choices, whether they use private ISPs’ resolvers or establish third-party DNS resolvers, to improve security and web performance. Harnessing the scale offered by the online advertising ecosystem, two ad campaigns have been launched, triggering more than 3M DNS lookups, which allow the identification and study of more than 76k recursive DNS resolvers supporting more than 25k eyeball ASes in 178 countries. The data analysis provides new insights into the DNS infrastructure, such as user preferences towards third-parties. Our results indicate that 13% of users use third-party DNS providers (such as Google, OpenDNS, Level 3, and Cloudflare). Besides, this research detects different deployment decisions of many ISPs that provide both mobile and fixed access networks to separate the DNS infrastructure that serves each access technology type. The second considered use case consists of analyzing the browser market landscape with active measurements. We leverage AdTag to develop an active measurement platform to obtain the brand and the version of the device receiving the ad. We prove that the landscape picture obtained with our methodology is very similar to that offered by state-of-the-art techniques based on passive measurements. However, our solution presents some advantages over passive solutions: the ability to conduct geographically and demographically targeted measurements and its accessibility to a larger group of scientists and practitioners. The performance, accuracy, and capabilities of this methodology are analyzed through real experiments that, in total, produced more than 6M measurements. The lack of transparency in the online advertising ecosystem motivates the second part of this thesis. In particular, we have developed Q-Tag, a novel methodology that serves to audit reported quality metrics so that advertisers can obtain trustable information about the real performance of their advertising campaigns. The first version of Q-Tag was deployed in Google AdWords. The results reveal that AdWords seems to provide incomplete information to advertisers. In particular, they show that: (i) AdWords did not report 57% of the publishers where ad impressions from our campaigns were delivered, (ii) AdWords reports a large fraction of contextually significant impressions based on (undisclosed) criteria other than publisher’s theme, (iii) higher CPM investment does not lead to impressions being delivered to more popular publishers, (iv) AdWords does not offer default control of frequency cap (limit of impressions per user), (v) about 10% of ad impressions in two of the campaigns were delivered to IPs from Data Centers. The second version of Q-Tag was developed to measure the viewability metric. This standard metric serves to assess whether an ad impression was viewed or not by a user. Q-Tag has been deployed in production by a Demand Side Platform (DSP) to measure the viewability rate of the ad campaigns. Taking advantage of the infrastructure of this DSP, the performance of Q-Tag has been compared with a commercial solution. Both techniques report a similar overall viewability rate of 50% (i.e.,, 50% of the ad impressions meet the viewability standard and thus are considered viewed). However, Q-Tag is able to measure the viewability metric in 93% of the ads served by the DSP, unlike 74% of the ads measured by the commercial solution. In summary, the research conducted in this thesis showcases the potential of the proposed large-scale ad-based measurement. It offers a wider range of possibilities beyond those presented in this thesis. A methodology that can unravel different aspects of the Internet infrastructure and performance from the user perspective as well as provide an independent tool for advertisers to measure the quality of their advertising campaigns.
Online advertising has evolved into a key component of the Internet we know today. It is a very complex ecosystem that accomplishes to reach billions of users in a short period of time. It has global coverage, and it is able to target specific audiences based on demographic, geographic, and behavioral aspects. The capabilities offered by the online advertising ecosystem have opened a new era in research that has attracted the interest of the scientific community. This thesis leverages the nature of online advertising and builds a novel methodology capable of inserting JavaScript code into an ad that runs every time it is displayed on a user’s device. This methodology opens up new measurement opportunities. Specifically, this methodology is applied for two different purposes in this thesis: (1) Performing network measurements from the end-user perspective, and (2) Auditing the transparency of the online advertising ecosystem from the advertisers’ perspective. In the context of Internet measurements, this methodology is implemented in a solution referred to as AdTag. Its design - including technical, deployability, and economic factors – and its potential to analyze a wide range of aspects of Internet connectivity from the browser are discussed and evaluated. Several experiments are performed that prove the ability of AdTag to reach millions of nodes in a short period of time. Furthermore, the possibility of selecting the measurement nodes based on its geographical location is also demonstrated. In this thesis, we showcase the utility of AdTag to conduct network measurements in two specific use cases. First, we study the DNS infrastructure, one of the most critical Internet systems. Our analysis addresses issues such us grasping the real DNS infrastructure configured by the ISPs, and understanding the end-users DNS choices, whether they use private ISPs’ resolvers or establish third-party DNS resolvers, to improve security and web performance. Harnessing the scale offered by the online advertising ecosystem, two ad campaigns have been launched, triggering more than 3M DNS lookups, which allow the identification and study of more than 76k recursive DNS resolvers supporting more than 25k eyeball ASes in 178 countries. The data analysis provides new insights into the DNS infrastructure, such as user preferences towards third-parties. Our results indicate that 13% of users use third-party DNS providers (such as Google, OpenDNS, Level 3, and Cloudflare). Besides, this research detects different deployment decisions of many ISPs that provide both mobile and fixed access networks to separate the DNS infrastructure that serves each access technology type. The second considered use case consists of analyzing the browser market landscape with active measurements. We leverage AdTag to develop an active measurement platform to obtain the brand and the version of the device receiving the ad. We prove that the landscape picture obtained with our methodology is very similar to that offered by state-of-the-art techniques based on passive measurements. However, our solution presents some advantages over passive solutions: the ability to conduct geographically and demographically targeted measurements and its accessibility to a larger group of scientists and practitioners. The performance, accuracy, and capabilities of this methodology are analyzed through real experiments that, in total, produced more than 6M measurements. The lack of transparency in the online advertising ecosystem motivates the second part of this thesis. In particular, we have developed Q-Tag, a novel methodology that serves to audit reported quality metrics so that advertisers can obtain trustable information about the real performance of their advertising campaigns. The first version of Q-Tag was deployed in Google AdWords. The results reveal that AdWords seems to provide incomplete information to advertisers. In particular, they show that: (i) AdWords did not report 57% of the publishers where ad impressions from our campaigns were delivered, (ii) AdWords reports a large fraction of contextually significant impressions based on (undisclosed) criteria other than publisher’s theme, (iii) higher CPM investment does not lead to impressions being delivered to more popular publishers, (iv) AdWords does not offer default control of frequency cap (limit of impressions per user), (v) about 10% of ad impressions in two of the campaigns were delivered to IPs from Data Centers. The second version of Q-Tag was developed to measure the viewability metric. This standard metric serves to assess whether an ad impression was viewed or not by a user. Q-Tag has been deployed in production by a Demand Side Platform (DSP) to measure the viewability rate of the ad campaigns. Taking advantage of the infrastructure of this DSP, the performance of Q-Tag has been compared with a commercial solution. Both techniques report a similar overall viewability rate of 50% (i.e.,, 50% of the ad impressions meet the viewability standard and thus are considered viewed). However, Q-Tag is able to measure the viewability metric in 93% of the ads served by the DSP, unlike 74% of the ads measured by the commercial solution. In summary, the research conducted in this thesis showcases the potential of the proposed large-scale ad-based measurement. It offers a wider range of possibilities beyond those presented in this thesis. A methodology that can unravel different aspects of the Internet infrastructure and performance from the user perspective as well as provide an independent tool for advertisers to measure the quality of their advertising campaigns.
Description
Mención Internacional en el título de doctor
Keywords
Online advertising auditing, Measurement, JavaScript code, AdTag