Byte Me (Harder)

If you find yourself perusing IBM's website researching big data (why not?) you may stumble upon an interesting claim:

Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data.

If you Google "quintillion" the results list IBM's webpage, a few dictionaries, and some websites dedicated to large numbers. In other words, it's too abstract a figure. The best way I can describe it is if you had 2.5 quintillion hamsters you could cover the Earth in hamsters approximately 52 times. And at the rate described above, you could do that every day.

So what is a byte? Per Wikipedia, historically the byte "was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit of memory in many computer architectures." I like that the definition includes "in a computer," as though it might happen elsewhere. In that context, of all the characters ever produced on this planet in the history of man, 90% of them were created in the last two years: neat.

What I find interesting are the more peculiar ways humans are using this data. A New Yorker article profiling the online dating site OkCupid (among others), and titled Looking for Someone provided an excellent result: the most innocuous question you can ask to determine the likelihood of getting lucky.

For the uninitiated, OkCupid is the creation of four math majors from Harvard. The objective of the site is to match humans for dating / mating by asking users hundreds of questions to identify suitable partners. By aggregating and analyzing all of these questions (data) the company revealed some startling conclusions. The challenge, however, became that of matching individuals comfortable sharing certain private information about themselves, with others that had the same interest, but lacked the comfort to say so online. And by interest I mean sex. Christian Rudder, one of the four nerdy founders, identified the gap-bridging question:

Rudder has discovered, for example, that the answer to the question “Do you like the taste of beer?” is more predictive than any other of whether you’re willing to have sex on a first date. (That is, people on OK Cupid who have answered yes to one are likely to have answered yes to the other.)*

Thank you big data.

In 2014 Rudder published his findings in Dataclysm, which I only recently picked up and have found entertaining thus far. In addition to the humorous statistics, the text has a human component for a book concerned with data. As he writes: 

My girl is two and I can tell you that nothing makes the arc of time more clear than the creases in the back of your hand as it teaches plump little fingers to count: one, two, tee.*

If the anecdotes bore, worry not, such a topic could hardly be entirely so introspectively sentimental. In his introduction Rudder claims OkCupid's tagline should've been "Making the Ineffable Totally Effable." My favorite line so far. 

Note: some reviews claim that Dataclysm involves only basic statistics, and others claim it is offensive. If you are looking for hard science or a politically correct narrative, consider that this blog identifies with an angry doodled doe. 

Also, quintillion is defined as either a 10 with 18 zeros behind it, or a 10 with 30 zeros behind it. The definition changes based on your location. That is not a small difference. 

Footnotes denoted by asterisk and listed in order:

Nick Paumgarten. "Looking for Someone." The New Yorker 4 Jul. 2011. Web.

Christian Rudder, Dataclysm (New York: Broadway Books, 2014), p. 39

Ibid., p. 20