Online ads, exclusive online communities, and the potential for adverse impacts from big data analytics
Online advertising today is a big data analytic marvel, deciding in the time it takes to load a web page which ad, among billions, to deliver to which of billions of domains. Large and fast for sure, but what are the ramifications of some of the decisions ad networks are making?
Big Data Analytics
Consumer differentiation is at the heart of online advertising. Delivering ads based on such differentiation is the very idea behind it. For example, if young women with children tend to purchase baby products and retired men with bass boats tend to purchase fishing supplies, and you know the viewer is one of these two types, then it is more efficient to offer ads for baby products to the young mother and fishing rods to the fisherman, not the other way around. While some differentiation among consumers may be beneficial, it may not always be desirable. Societies have identified groups of people – sometimes referred to in the law as “protected classes” – to help protect from specific forms of discrimination. In the United States, these groups include, at minimum, classes based on race, color, religion, sex, or national origin. Laws to guard against discrimination include the Equal Credit Opportunity Act, the Americans with Disabilities Act, the Fair Housing Act, and Title VII of the Civil Rights Act. To determine whether a protected group experiences illegal discrimination or is being treated unfairly, government entities often measure whether practices, intentional or not, have a disproportionate adverse impact on the protected group.
There are many ways online ads get delivered these days. Consider a big data ad network that mediates between advertisers and websites without tracking the behavior of users. An advertiser provides the ad network with possible ads to deliver, keywords that describe desired websites to display the ads, and a bid to pay based on a reader viewing or clicking the delivered ad. The ad network may operate a real-time auction across bids having similar criteria based on a “quality score” for each bid. A quality score includes many factors such as the past performance of the ad and characteristics of the company’s website . The ad having the highest quality score appears first on the web page, the second-highest second, and so on, and the ad network may elect not to show any ad if it considers the bid too low or if showing the ad exceeds a threshold (For example, a maximum account total for the advertiser). Could different groups of people, including “protected classes,” see entirely different ads? If the offer and group are subject to legal protections, could the result have a disproportionate adverse impact? Even if they are not subject to legal protection, can some ads be offensive or harmful to some audiences?
Capturing Online Ads
Google's online ad network is a big data analytic machine that delivers 30 billion ads per day . MixRank is a commercial company that captures appearances of online ads delivered by Google's ad network each day, taking care to do so without behavioral advertising or re-targeting effects . For each ad, the MixRank archive stores the appearance of the ad, the domain on which the ad appeared, the date and time of the ad appearance, and any keywords the domain may have used to describe itself. The domain is the wugster.com, wugster.org, or similar part of a webpage address (or URL). MixRank focuses heavily on the top advertisers of Google, so ad campaigns by smaller advertisers may not appear. Further, MixRank cannot possibly survey all web domains, so some domains, specifically many less visited domains, may be underrepresented.
Omega Psi Phi: An Example of Online Ad Delivery
In 2011, Omega Psi Phi, a high profile historically black fraternity celebrated its 100th year anniversary. The domain, omegapsiphi2011.com, hosted information about the event. The website boasts that among members of the fraternity are many prominent black men, including Bill Cosby, Congressman Clyburn, Gov. Wilder, Michael Jordan, Shaquille O'Neal, presidents of colleges and universities and many others. See Figure 1 for an archived copy of the website.
According to MixRank, more than 2000 different ads appeared on omegapsiphi2011.com. There were lots of different kinds of ads, including ads for graduate degree programs (such as Figure 2a) and vacations (such as Figure 2b). Ads offering credit cards appeared (such as Figure 2c). Ads suggestive that the audience had an arrest record also appeared (Figure 2d).
The Instant Checkmate ad (Figure 2d bottom) has attention-grabbing animation that flashes in the background among bright high contrast colors. For the audience of this website, the flashing neon-like appearance of “Your” arrest records may carry unwanted presumptions. (I have done prior research on the offensive and suggestive nature of some Instant Checkmate ads .) While this ad may be offensive to the audience of this domain, there may be other websites where it would be less likely to offend. For example, the ad might be more clearly relevant if displayed on a website that attempts to help you locate government records about yourself. Are online advertising analytic engines unable to distinguish?
The texts of the credit card ads (Figure 2c) do not appear to carry presumptions about the audience, but not all credit card offers are the same. Since financial services are the largest segment of online advertisers  and credit card offers to some groups have some legal protections, examining the credit card ads that appeared on omegapsiphi2011.com may help us move from the anecdotal example to more generalizable knowledge about online ads. Are there domains specific to demographic groups? If so, how popular were these domains? What kinds of credit card ads appeared on them?
Figure 1. Archived image of omegapsiphi2011.com. Source: http://archive.today/www.omegapsiphi2011.com
Figure 2. Sample of ads that appeared on omegapsiphi2011.com. Source: mixrank.com.
This past summer we launched the Summer Research Fellows Program at the Federal Trade Commission (FTC); see earlier Tech@FTC post Save the World. Krysta Dummit, James Graves, Paul Lisker, Jinyan Zang are the four fellows who worked with me during the summer to explore research directions for the FTC. This blog describes one of the summer projects. More results from summer work will appear shortly. While all fellows contributed to all efforts, Jinyang and I primarily worked to gain generalizable insights from the omegapsiphi2011.com example.
Exploration #1 Exclusive Audiences
Can domains specific to demographic groups be found?
ComScore data provides a broad overview of the browsing behavior of some Americans similar to Nielsen ratings for television . The service recruited 46,000 households in the United States and tracked web browsing from those machines for a year. Associated with each household are the demographics: race, ethnicity, head of household education, head of household age, household income, and whether children are present. The resulting data for 2013 recorded more than 150 million different websites visited. Even though the comScore data may not necessarily reflect the overall viewing habits of Americans, we can use the comScore data to determine whether domains exist that may be visited more by one demographic group than any other.
We defined a group’s exclusivity index for a domain as the percentage of members of the group visiting the domain divided by the sum of the percentages of members of all comparable groups visiting the domain . The higher the ratio, the more exclusive the domain is to the group.
For example, the ratio of the percentage of Latino households to the sum of the percentages of all racial groups visiting univision.com is the Latino exclusivity index for univision.com. The number of Latino households visiting univision.com is 2099. The total number of Latino households in the comScore panel is 7457. The percentage is 2099 divided by 7457 (or 0.2815). For whites, the percentage visiting univision.com is 205 divided by 19525 (or 0.0105), for blacks is 137 divided by 9900 (or 0.0138), and for Asians is 28 divided by 2724 (or 0.0103). So, the exclusivity index of univision.com for Latinos is 0.2815 divided by 0.0105 + 0.0138 + 0.0103 + 0.2815, or 0.89.
We computed exclusivity indices for the top thousand domains visited by each combination of demographic groups within the comScore data. This gave a total of 7975 distinct domains. We consider any domain whose exclusivity index is more than two standard deviations from the mean of the exclusivity indices for those groups across all popular domains to be “highly exclusive.” Two standard deviations above the mean of exclusivity indices for race across all popular domains is 0.47, so univision.com is highly exclusive to Latinos at 0.89.
Figure 3 shows exclusivity scores by race/ethnicity for the most popular domains for race/ethnicity groups in the comScore data. The further left a domain appears, the more popular it was among demographic combinations of comScore households (i.e., the better its popularity rank). The leftmost triangle above the cut-off line is univision.com. Other highly exclusive domains for Latinos include taringa.net, musica.com, and terra.com. For black households in the comScore data, highly exclusive domains included worldstarhiphop.com, footlocker.com, datpiff.com and bet.com. For Asian households, indiatimes.com, youku.com, leagueoflegends.com, and baidu.com were highly exclusive domains. The most highly exclusive domain for whites was legacy.com.
Figure 3. Popular domains exclusivity scores by race. Domains with an exclusivity score that is more than two standard deviations above the mean (0.47 above) are considered highly exclusive to a group. Data source: comScore .
Exploration #2 Popularity of Domains having Exclusive Audiences
Are domains with exclusive audiences just among the least popular domains?
It may be difficult to guess whether any particular domain has an exclusive audience. For example, footlocker.com is a website that sells sneakers and shoes and was highly exclusive to black households in the comScore data. A sufficiently greater percentage of black households visited the domain than did any other race. The website's keywords and content do not signal the exclusivity of this site to blacks. Is there a relationship between the popularity of the domain and the likelihood of having an exclusive audience?
Alexa.com surveys Internet traffic and ranks the popularity of the most popular billion domains. The most popular domain is ranked 1 and the least popular would be one billion. We fetched the Alexa rank for each of the 7975 domains that were among the thousand most popular domains for each demographic combination in the comScore data, not just race. Any domains that were not found were assigned a rank of a billion one. Figure 4 shows the percentage of highly exclusive domains within ranges of ranks. Exclusive audiences appear in all ranks, including highly popular and less popular domains. So, a domain of any popularity rank can have an exclusive audience.
Figure 4. Popularity of domains with highly exclusive audiences. Popularity rank based on Alexa retrievals September 2014.
Exploration #3 Credit Card Ads Delivered to Exclusive Audiences
What kinds of credit card ads appeared on domains having exclusive audiences?
The financial industry is the largest advertiser online. Laws and regulations that have provisions for protected groups govern many financial instruments, including credit cards. Credit card ads appeared on omegapsiphi2011.com. What was the nature of those offers?
We began by searching “best credit cards” and “worst credit cards” on the Google search engine. We collected the first 25 credit cards for each category, as mentioned in these third-party sources. Figure 5 shows the results for each group in alphabetical order. American Express Blue was the most highly praised card and First Premier Bank cards were the most harshly criticized card. Not all highly praised cards require stellar credit (e.g., Capital One Secured Card). Some of the card issuers –namely, Capital One, Chase, and Citi, have cards on both lists.
Figure 5. List of most highly praised and criticized credit cards compiled from top ranked online articles of “best” and “worst” credit cards using Google search results. Retrieved August 2014. Listed alphabetically.
Credit card ads that appeared on omegapsiphi2011.com were offers for criticized cards or were generic. No ads naming any of the praised cards appeared. It is important to note that credit card ads did not dominate the online experience at the domain. Still, this anecdotal example does raise questions. How do domains where praised ads appeared compare to those that showed ads for criticized cards?
Mixrank reported 244 distinct ads that explicitly offered one of the praised or criticized cards and 5298 domains publishing those ads (omegapsiphi2011.com being one of the publishers). Figure 6 shows a sample of domains that published the single most frequently captured ad for the most praised card (American Express Blue) and similarly for the most frequently captured ad for the most criticized card (First Premier). The samples are drawn from a total of 268 and 182 publishers, respectively. Visual inspection suggests the advertisement for the praised card had an educational focus. The list of publishers of the most popular ad for the criticized card does not seem to have over-arching foci.
Figure 6. Samples of domains that published (a) the single most frequently captured ad for the most praised card and (b) the single most frequently captured ad for the most criticized card. Source: Mixrank.com retrieved August 2014.
We fetched the Alexa rank for each domain that published an ad for a highly praised or criticized card (a total of 5298 publishers). Figure 7 shows that ads for criticized cards appeared across the popularity ranks of domains, with greater percentages at the most and least popular domains. Publishers of ads for praised cards appeared more often in the mid-range of popular domains.
Figure 7. Popularity rank of domains publishing ads for the most praised or criticized credit cards (a total of 5298 publishers). Alexa rank for the domain as retrieved in September 2014.
Ads for praised cards did appear in domains having exclusive audiences. Domains exclusive to Asians (e.g., visajourney.com and deals2buy.com) published ads for praised Capital One, Citi, and Discover cards. The domain, seekingalpha.com, is exclusive to households with income greater than 100K in the comScore data; it published an ad for a praised card. Ads for the Capital One Secured Card appeared on domains exclusive to ages 25-29 and another exclusive to ages over 65.
In summary, some websites have audiences that are more exclusive to one demographic group than to other groups. A sufficiently higher percentage of members from that group visits the site than do percentages of members from any other groups. Online ad delivery may take website popularity and keywords in mind, but may have little or no knowledge of audience exclusivity. As a result, perils may exist for big data online ad delivery. Ad copy that may seem appropriate for one audience may carry unwanted presumptions and inappropriate content to another audience. Depending on the websites, audiences, and distribution of ads across domains, disproportionate adverse impacts involving legally protected groups may be possible. This work does not attempt to show that a disproportionate adverse impact resulted, only that it could be occurring.
What Do You Think
This inquiring mind wants to know what you think. Perhaps you have your own approach to describe, an experiment to report, or a comment to make.
For more information about this research, watch our presentation at FTC's Workshop, Big Data: A Tool for Inclusion or Exclusion? A copy of the presentation, Digging into the Data, is also available.
1. Google AdSense; http://google.com/adsense.
2. Koetsier J. 30 billion times a day, Google runs an ad (13 million times, it works). VentureBeat Insight. October 25, 2012. Based on an analysis by Larry Kim of Wordstream of a sample of Google Data. http://venturebeat.com/2012/10/25/30-billion-times-a-day-google-runs-an-ad-13-million-times-it-works/
3. Email and phone conversations with officials at mixrank.com. August-September 2014.
4. Sweeney L. Discrimination in Online Ad Delivery. Communications of the ACM, Vol. 56 No. 5, Pages 44-54. (Also, see earlier version without a first solution Data Privacy Lab White Paper 1071-1. Harvard University. Cambridge. January 2013. http://arxiv.org/abs/1301.6822 and http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2208240)
5. Bruene J. One-quarter of 50 Largest Online Advertisers are From Financial Services. NetBanker. May 2007. http://www.netbanker.com/2007/05/financial_services_portion_of_50_largest_online_advertisers.html
6. comScore Media Metrix. 2013 U.S. Panel. http://www.comscore.com/metrix/xpc.asp
7. Steven Levitt, co-author of the book Freakonomics, introduced the notion of a “black name index” in his PhD thesis that examined distinctively black names. He defined the index as the percentage of times the name is given to black babies divided by sum of the percentage of times the name is given to black babies and to white babies. Fryer R and Levitt S. The Causes and Consequences of Distinctively Black Names. Quarterly Journal of Economics. 2004; 119(3):767-805. We generalized this computation to more than two groups.
The author’s views are his or her own, and do not necessarily represent the views of the Commission or any Commissioner.