Wednesday, April 30, 2014

Who’s Afraid of the Big Bad Data?


SEGA is coming out with what might be the last, best hope for the Alien games franchise: Alien Isolation. It takes place 15 years after Alien and 42 years before the events of Aliens. It involves Ripley’s daughter (who was noted in Aliens but was cut from the movie) who is investigating her disappearance and winds up on a space-station that is, of course, inhabited by an Alien. In short it is a well designed (the screen shots look like they are from ‘alien’) reasonably intelligent, possibly worthy addition to the Alien canon and, especially, to the hit-and-miss legacy of Alien computer games.

That may or may not have interested you.

Why it interested me was this: I got an ad for it in my Twitter feed. Now, I have bought—and played—every Aliens game on the market. I even, kinda, enjoyed the hugely maligned Colonial Marines. I’m a sucker for Aliens games—but I hadn’t heard of Alien Isolation. When it appeared in my Twitter feed as a promoted Tweet I was intrigued.

I don’t have any gamers on my Twitter feed. I don’t blog (mostly) about computer games. I’m not active on any computer gaming fora. The Tweet was clearly a targeted advertisement—someone in the Alien Isolation marketing food-chain had paid money to get their message out to me: and they were right to do so. I’ll probably buy the game.

So the question is … how? Pitching Alien Isolation to me is the holy grail of direct marketing because:

  • I was not aware of it. Maybe I shoulda been (and when it came out I definitely would have been)—but I am a pre-order market for them which makes it worthwhile.
  • It is deeply interesting to me: I paid for Colonial Marines which I knew was bad. This is in my sweet spot for interest. It is perfectly targeted.
  • It is not ‘obvious.’ While people who know me would, at very least, be able to guess I’d have interest in an Aliens game, (a) there is so far as I know zero evidence of that in my Twitter feed, (b) no tracking cookies or other browser-based intelligence that could give it away, and (c) remarkably little digital data on that at all (Steam would show two Aliens games in my library—but that’s about it).

What Does Big Data Know About You?

I’m reading Patrick Tucker’s The Naked Future. He’s a futurist who envisions a time in the not-so-far future (say, 5-10 years) where the vast amounts of telemetry we produce (our digital footprint) is able to expose our present—and likely future to everyone (well, at least us—and some big data providers, most likely). He covers things like OK Cupid’s predictions of love from online matches. He talks about crime prediction techniques that use ambient data to tell where the next crimes will likely be committed. He talks about a drone program that sees cities in real time.


There are chapters on how data mining could create custom educational plans, how earthquake prediction could work, and so on. The upshot of the analysis is this: even today—much less in the future—we produce enough data about ourselves for predictive engines to make some very good guesses about what we might like, who we might like, or where we might be located.

In the future this could improve dramatically.

On The Other Hand

On the other hand, almost NO ONE is doing as good a job as Twitter did with the Alien game. Let’s see:


  • BP Driver Rewards. A better way to purchase gas … I guess.
  • Drivers Auto Mart: A way to sell my car.
  • T-100 Thyroid Support. I don’t need Thyroid pills … I don’t think.
  • Small Valves Save Lives: An ad for replacement aortic valves without open-heart surgery
  • Fidelity Investments: Financial services
  • Equifax: Get a credit report
  • Merril Edge: Online stock trading.

Now, I don’t drive a whole lot. I don’t need (so far as I know) heart valves or Thyroid pills. I already have credit protection so I’m not that concerned about my specific report. I will need a new car in a year or so—so the car thing isn’t stupid—but it’s not useful to me now. I already have a Merrill Edge account. In other words, it’s all either a complete miss (Merrill Edge wasted resources pitching to me) or an almost complete miss.


Amazon should be one of the best at predicting what I’d like. After all, they know what I’ve already spent a ton of money on—not just through Amazon—but through their Amazon Chase credit card. What do they show me?

  • Some graphic novels in the series I’ve already got (a new East of West release)
  • Some cell phone chargers. I’d researched them and bought a couple as gifts.
  • Some Wi-Fi stuff, a keyboard, a phone head-set. I bought a new wireless router a few days ago. I got a keyboard months ago. I got a few phone head-sets a week ago.
  • Pens that go on your keychain (I bought one two weeks ago). A key-chain (the one I bought a while ago)
  • Some cables (I got one a while back), a headset for the computer (I got one a while ago)
  • Some Transmetropolitan graphic novels (good—but I have them)
  • A wallet (I got a wallet a few weeks ago)
  • A bunch of Minecraft stuff (I got some minecraft stuff for gifts)

In other words: If I already bought it, Amazon wants to pitch it to me. This is stupid: If I already bought it, I don’t need it. Amazon should be one of the best and instead they’re one of the worst.


Twitter hit it off with the Alien game. They also want to sell me a subscription to shaving razors that interests me greatly. On the downside they pitched Sarah Palin’s something-or-other which isn’t a stupid guess—but is definitely a miss. I went and looked today and didn’t see any promoted tweets so there’s that (periodically I’ll get a World News Daily tweet promising some bring-down-Obama news).

Why are these (save for Twitter) all so bad?

Why Is Predictive Selling So Bad?

I’ve done some research. The reason it seems predictive selling is so bad is because Google isn’t doing it. What do I mean? Well, the problem is three-fold:

  1. The basic data … isn’t good. I went to AboutTheData—Axicom’s public service to see what they knew about me. You have to enter everything but your bank account number into that sight so I don’t suggest it—but for me?
    1. They got my birth date and gender right. My ethnicity is technically right—but I’m … erm … White Hispanic—not Hispanic.
    2. They have my marital status as single. Wrong.
    3. They have my political party as Republican: you can decide if that’s right or not.
    4. They don’t know much about my occupation—but they think it’s Professional / Technical
    5. They know where my house is—fine—but they have no idea what I drive.
    6. Household economic data is … catastrophically low. They clearly have no idea what either I nor my wife makes.
    7. They know nothing about my household data or interests.
  2. I tried the WatchDogs app—it’s for a game, but it’s good. It analyzes your Facebook data and finds: You—it can locate me with 92.8% accuracy, Who I care about “Collateral Damage”)—it found nothing. Stalkers (close FB friends—a few), Liabilities (people who tag me and thus expose me). Obsessions (it found my Hapkido martial arts school) and Scapegoats (it was people I know but am not close to and would ‘sacrifice’).
    1. IT grossly underestimated my salary, got my age and residence right.
    2. It says I’m most likely found in the gym (not bad)
  3. Google’s profile is pathetic. It has a correct age range and knows I like action movies—but that’s about it (NOTE: Google knows much, much more than that about me—but that’s what, apparently, advertisers see).

Secondly? The data is not being shared. Lots of players see part of the picture but only one (Google) sees nearly all of it.

  1. My favorite movies (Netflix knows—but is not sharing). Google sees my alert emails when Netflix sends me something. Knowing which movies I watch would tell you a lot about my personality and would definitely allow other movie makers to pitch me shows I might enjoy.
  2. What consumer goods I already own and how happy I am with them. I need to buy a TV and have searched it on the web and Amazon—but Best Buy doesn’t know this (nor will they).
  3. What my hobbies are beyond a narrow range. A human searching me on the web could find out, easily, about many of my hobbies and areas of interest. This would allow marketing. Google has much of this data—but is not leveraging nor sharing it with, for example, game vendors.
  4. What my aspirational goals are. I have been looking at cars I like—Google knows this. So do tracking cookies—but it would take someone doing some serious work to know what I expect to pay … and even worse, when I expect to buy one (I hope not for a year or two).
  5. What my reading interests are (Amazon knows this—or should, but is not being insightful with it). Amazon’s book suggestions are horrible. I’m well aware my favorite authors have many books out. Show me some books that I’m not aware of.
  6. What my pain points are—what I would pay to have done for me. The shaving razor subscription is good here (I hate shopping for razor blades). No one else is coming close (grocery delivery would make my day).

This upshot is this: We produce all this telemetry. Our phones track us every minute of the day. Our web-searches show our interests and thinking. Our email content shows communication and importance. Facebook shows how we want to present ourselves. Twitter knows what (short) messages get our attention—and which voices we are interested in. Amazon knows what happens when “interests level” breaches the “action” threshold. Steam (and Amazon) know which games we buy. Steam knows which games we play.

Netflix knows what our interests are for 1-2 hours a day (estimated). HBO knows if we’re watching Game of Thrones—so does Comcast.

The two problems are that (a) the data is not being shared and (b) there is precious little insight coming out of the recommendation engines these multi-billion dollar companies have. Netflix is trying—yes. But there’s no excuse for Amazon’s recommendations and Facebook’s ads. Where data does exist it is often out of date and patchy (Axicom) or simply contained (Facebook). When we use a data-gathering source in a limited way (LinkedIn) the picture may not be complete.

When we use a source comprehensively (Google) the data is unstructured. The tools to take full advantage of this do not yet exist.

The Future

The sharing problem will not easily go away. There is little incentive for Netflix to have a privacy policy that lets it share with Amazon. Google isn’t going to share the content of your email with Facebook. Apple isn’t highly motivated to sell your geolocation to buyers. Yes, money motivates—but there is reason to think that certain activities that cross the creepiness threshold will have real blowback.

Technology should solve the insight problem if we can get good enough sample sizes to mine and smart enough systems to ponder the mess of information we’re producing. We should see strides in that in a few years and big ones in a decade but I have to wonder. I think there is a fundamental missing piece here which is creativity. The idea that Twitter is searching the space of video games is interesting—but I’m not sure that’s actually what’s happening here. I don’t see any other evidence for it (no other games).

Machines can produce emergent behavior which feels like creativity but they do not yet actually create it. The horizon for making out-of-the-box leaps about what I’d want may be beyond the 15 year horizon I’m looking at here. The (very smart) TV show Person of Interest speculates that a machine that is capable of predicting human behavior has to be “at least as smart as a human” (in its own way, at least) and I think they’re maybe on to something there. If product placement was always handled with the skill that Michael Jordan played basketball, for example, we wouldn’t call it ‘product placement’—it’d just be ‘stuff we see in movies that we recognize.’

So there’s a level of skill in targeted advertising and a level of line-of-sight and a level of intuition and creativity that I think our current Big Data vendors are simply nowhere near approaching. I was frankly surprised at Axicom’s level of errors about me. I’m kind of insulted by Amazon. I think Facebook is patently naive. Twitter … interests me. Is a curated 140-character environment somehow more structured when it comes to understanding the user? Maybe so.

Is Google, for all their access to me, simply unable to get me a game I’ll almost surely buy? Or were they just not approached with the ad? They’ll see this post. Their massive data-banks will store it. Their engines will analyze it … and they’ll learn … nothing? I think we’re probably more than a decade away from Patrick Tucker’s Big Data Utopia.


  1. Good post. There's another enterprise that has almost as much data about you as Google, but are woefully under-valuing and monetizing it - your bank. The fact that banks have as-of-yet not started to meaningfully add value to the transaction stream to help you make better financial decisions is, to me, jaw dropping (a low-balance alert doesn't really cut it in this day and age). Perhaps Big Data is too new for banks, or the talent too expensive, or the algorithms too complex, or the concern around the 'creep factor' too much to overcome, though I have to think that people would opt-in to prescriptive analytics that help you make better decisions for you and your loved ones. So many potential use cases.

    1. I know something about this too! So--banks DO know a few things (family structure, marital status, and they do have a much better idea of my finances). In some cases they don't see the big picture either (I bank at several different banks as a result of my marriage and her existing banking framework, who had the best mortgage program for buying our house, and so on)--but yes: my bank would understand my hobbies to a degree if they put two and two together.

      What they have is a massive scale of data. They get to see how customers evolve over their lifetimes--something Amazon, Twitter, and Facebook can maybe only *buy* at this point. They understand and participate in risk-analysis: Amazon does't have to model whether its customers will "pay it back."

      So, yes--banks (and insurance companies) have a ton of data.

      More than google? Well ... I think my bank can place me with pin-point accuracy in the what-can-he-buy zone. I think banks can make very strong predictions about what I will be for big purchases (house, car, kid, marriage, college).

      On the other hand, banks are probably not as good at recommending me a book to kindle for the airplane ride as Amazon ought to be (OUGHT to be). Amazon (through the Amazon rewards card I use for EVERYTHING) knows when and how often I go to the movies with my wife. My bank doesn't.

      Google and Skype (Gmail, weekly Skype call) knows which of those Facebook friends I speak with every week. My bank doesn't know much about my personal network...

      So, yeah: I think banking data *combined* with another data source would be a powerhouse.

      Maybe Amazon should buy some banks.

      And Netflix. They need to get their recommendations improved.

    2. That was The Omnivore--thanks for the read!!

  2. (Veronica here.)

    I have a Facebook anecdote. Once I was shopping over on and looking at a particular pair of Prada wedge boots. Which, okay, these are certainly a rather *specific* product to look at. Then I browse over to Facebook, which pops up a Norstrom’s ad for those very boots.

    There is no way that could be a coincidence.

    Thing is, I have no idea if this a good strategy or not. I mean, they know I am interested in the boots. On the other hand, I already know where to find them, for how much, and exactly what to do to get them. I was just on the Nordstom’s site.

    I don’t assume super competence, but nor do I assume incompetence. These companies do metrics, and “conversion” is a big deal. It’s possible that posting follow up ads like that on Facebook has some percentage of conversion that makes it worthwhile. Maybe not.

    I bought the boots.

    Data mining is a complex topic. For some things, these techniques are very good. For others, not so much. There are diminishing returns. Plus there is the “Curse of Dimensionality” (Google it), which severely limits finding those *most subtle* patterns.

    We’ve already picked much of the low hanging fruit.

    Thing is, for every person who knows a thing or two about the actual math, there are a hundred online journalists who read crap articles by other online journalists (and maybe skim an O’Reilly book), and who breathlessly repeat golly-gee predictions.

    1. In The Naked Future the author talks about how print media is in trouble because in the electronic space "ads follow you wherever you go." I suppose the theory is that if they keep throwing it in your face you will eventually purchase--and that may well be true.

      But each 'imprint' after you bought them is wasted entirely. Of course the banner system doesn't know you bought them--and that's the point.

      -The Omnivore

    2. (Veronica again.)

      Well, I think these companies must be happy to get better “imprints” than they get from TV/Radio, so there is that. We have to compare these techniques to the alternatives.

      That said, I agree that Amazon trying to sell you something you bought *on Amazon* seems goofy.

      But then, I wouldn’t want to judge without seeing the numbers. Maybe those duplicate ads get clicked a lot. Perhaps folks go to read reviews, and then maybe add a review of their own, which they might not have. Maybe they click for no obvious reason — as long as they click! — and then are attracted by the “customers also bought…” list. Who knows. They have the numbers. We do not.

      (Which doesn’t rule out the possibility their algorithm is just wonky.)

    3. Another possibility we shouldn't rule out: that some people are deliberately trying to confound merchants' similarities engines. Oh, I know their numbers are far too insignificant to have any noticeable impact in most cases - their "customers also bought" lists quickly grow distressingly accurate - but in the edge cases where purchase volumes are low, pranksters just might manage to deliver a surprise or two.

      You wouldn't ordinarily suspect much population overlap between purchasers of Anne of Green Gables and Canadian walrus porn (Rule 34 practically guarantees this: Tusk Lust? Ice Floe Hos?), but hey, I don't make the rules. And if consequently gets its datasets crossed up enough to simultaneously market Harley-Davidson accessories and Hello Kitty merch to the same demographic segments, well... You never know. Untapped markets and all that.

      It wouldn't surprise me a bit if customers who've downloaded anything by Justin Bieber got pitched that nautical classic How to Avoid Huge Ships. Because, duh.

      -- Ω

    4. @Veronica: I was wondering about that this AM: Do people who have 'bought cables' go out and 'buy more cables'? Possible--but (a) I kinda doubt it and (b) I don't.

      Amazon already has a 1-to-1 relationship with me--my sign-in account. They can see I don't buy a bunch of stuff over and over. If they have to treat me like everyone else, okay--but Netflix is better than that ...

      @Ω: We know people are doing hilarious fake reviews on Amazon (and probably lots of non-hilarious fake reviews ... the SEO guys need -something- to do today). I like the idea of a Project Chaos group trying to mess with digital footprints but it seems less and less likely as more bugs get caught (Heartbleed would have been USEFUL in that regard!--although Amazon wasn't compromised, alas).

      But yeah: We can only dream that Big Data has worms of nonsense running through its veins! Its the sci-fi Lovecraftian dystopia we've all been waiting for!

      -The Omnivore