Have you seen LinkedIn’s infographic announcing that they have reached 100 million users? If not, follow the link and look at the stack of business cards on the right side. The stack is so high that you will have to scroll to get an idea of the immensity of it.

You want some insight? Yeah? I’m going to take the thickness of a business card and multiply it by 100 million people. Then I’m going to express that as feet. Then I’m going to compare that to Mt. Everest. Oh, that’s SO interesting.
We’ve seen this style of graphic before, and we’re going to see it again. But why do these train wrecks play out over and over again? The justification behind them goes as follows. People can’t handle raw large numbers such as 100 million; they need context So far, so good; there isn’t anything wrong with that argument. But not all comparisons are created equal. My complaint is that LinkedIn’s comparison doesn’t give useful context.
I can imagine LinkedIn’s infographic team having this internal dialogue:
Infographic Ian: We all know that LinkedIn is about business professionals. They don’t fuck around like those Facebook kids. I think we should convey that somehow. They are professionals.
Marketing Mason: Right! Nothing says professional like business cards. Let’s include a business card in our visualization.
Data Scientist Sarah: Really? How many of our users still actually have business cards anymore? How many even relate to a business card as being important?
Visualizing Valerie: Don’t be so negative, Sarah. I agree with Ian! Business cards are a great idea. Let’s go with that.
Ian: Visualization is about small multiples. We can’t just have one business card. We need lots of them.
Mason: Quit using fancy infographic speak, Ian. I’ll boil it down. What if we take all of those people’s business cards and stack them on top of each other..
Ian: (thinking to himself) Oh, the irony…
Mason: … how tall would that be?
Sarah: (wrinkling her brow while trying to figure out how you could stack 100 million business cards under various wind conditions) For an average thickness business card, that stack would be about 100,000 feet tall.
Mason: That seems big to me. Right? I’m not really sure. It is just a number. We need to give it context.
Ian: Right! We need to compare it against something big that people understand.
Valerie: I like mountains. How tall is Mt. Everest? It would be awesome if our stack of business cards was taller than Mt. Everest!
Sarah: Mt. Everest’s elevation is about 29,000 feet above sea level.
Valerie: Wow, our business card stack is more than three times taller than Everest.
Mason: I love that we are comparing ourselves to something HUGE and yet we still are BIGGER.
Valerie: FTW.
Sarah: But… the elevation of Olympus Mons is about 82,000 feet.
Ian: Shit. If we take use Olympus Mons as a reference then we aren’t winning as much as we thought.
Mason: Fuck. What would the investors think if we weren’t always winning?
Valerie: Olympus What?
Sarah: Olympus Mons is a volcano on Mars.
Mason: We don’t have any Martians users, do we? I just don’t want to offend any potential users.
Ian: No, not yet.
Valerie: Most of us here (looks at Sarah) live on planet Earth. So we don’t have to compare ourselves against some big hill on Mars. Crisis averted.
Mason: Valerie is right. FTW.
Sarah: I simply don’t get the connection between an arbitrary planetary feature and a stack of 100 million antiquated pieces of paper that would blow over with the slightest breeze.
Ian: (Ignoring Sarah) Thanks everybody. With this awesome visualization, we are going to rock it. In fact, we are so awesome, we have already won. So I’m going home. Right after I relax on the throne while reading the Feltron report.
It boils down to this: there is no conceptual connection between quantity of people and the height of Mt. Everest. Or Olympus Mons.
My point is simple: contextualizing is important, but arbitrary comparisons do not make for good context. It is possible and preferable to tell a story in creative, compelling ways without resorting to meaningless metaphors.
Intel: I just watched this Intel video about their optical lab. It talks about Light Peak, a multi-protocol fiber-based technology that gives great bandwidth: 10 Gbps at first and moving to as high as 100 Gbps later.
Intel seems interested in moving this into the consumer space, not just the telecom or networking space. What do consumers care about? I see four factors: bandwidth, durability, cost, and cable length.
Bandwidth: This is relatively new technology, consumers will be excited about the high bandwidth even if it comes at a price premium for adapters and cables. Light Peak’s 10 Gbps of bandwidth rocks. With this much capacity on a multi-protocol wire, you could run your networking, video, and who knows what else over one cable. The “who knows what else” part leaves a lot to the imagination.
Durability: Consumers need cables that are easy to handle. How can fiber cables be durable enough for consumer use? Handling fiber requires some care: you have to pay attention to not crimping it, bending it too much, or tensioning it too much.
Cost: Cost is not the key driver but still important. I don’t have any good estimates here. The best I can do is ask a question: how does one make fiber cables that are suitable for consumer usage in terms of price? My understanding is that the fiber itself is relatively cheap, but the terminators and interface technology are somewhat more expensive.
Length: I have mixed thoughts about the advantages that fiber brings. At present, I just don’t see consumers wanting to connect a hard drive or monitor 5+ meters away, much less 30 meters away. However, in the future, it does open up some interesting possibilities for audio-video installations in homes and businesses. Perhaps the technology will shift how people think about interconnecting devices.
Apple’s Light Peak: I’ve read that Apple will likely release the new MacBook Pros with a Light Peak port. Apple’s implementation, called Thunderbolt, will work with copper wiring, if I understand it correctly. It is not clear to me if it works with copper and fiber interchangeably.
Here’s a shout out to Katherine who told me me about Tribler: “Oh nice! A completely decentralized BitTorrent protocol: http://bit.ly/3Xk8fC for Mac and Windows”
I’m downloading Tribler now, so I cannot yet comment on the software or user experience, but I have a few thoughts rattling around that I will share.
Tribler’s origins are interesting: Tribler is a research project of the Electrical Engineering, Mathematics and Computer Science faculty of the Delft University of Technology. A file-sharing research project based out of a university? Sounds scandalous. Seriously. From what I understand, university IT departments do their best to discourage file sharing programs! In 2007, the U.S. Congress has talked about getting universities to be more aggressive in stopping illegal file sharing.
Some articles on the Web (such as Truly Decentralized BitTorrent Downloading Has Finally Arrived) imply that Tribler is the first truly decentralized BitTorrent client. But wait! Isn’t BitTorrent (BT) already completely decentralized? Have my illegal downloading skills and knowledge atrophied? So I did some reading and got clear. The BT Protocol is typically, but not always, dependent upon various tracker files that have to live somewhere. That said, there are variants that can make it completely decentralized:
Torrent files are typically published on websites or elsewhere, and registered with at least one tracker. The tracker maintains lists of the clients currently participating in the torrent … Alternatively, in a trackerless system (decentralized tracking) every peer acts as a tracker. Azureus was the first … BitTorrent client to implement such a system through the distributed hash table (DHT) method. An alternative and incompatible DHT system, known as Mainline DHT, was later developed and adopted by the BitTorrent (Mainline), µTorrent, Transmission, rTorrent, KTorrent,BitComet, and Deluge clients. (Source: Wikipedia’s article on the BitTorrent Protocol)
My attempts to articulate the subtleties made me realize that the terms “centralized”, “decentralized”, and “distributed” are somewhat muddled, obscuring the awesomeness of BT. Tracker files for the “typical” BitTorrent are “centralized” in the sense that they have to live somewhere in their entirety. This is confusing because the Internet is often described as decentralized, even though in most cases, the pieces of content do have to live somewhere in their entirety. What makes BT so powerful is that the files themselves are segmented and spread across multiple clients in a way that allows the mesh to transfer the files efficiently and robustly. (I choose the word “robustly” deliberately; advocates of efficiency often seem to forget that full-steam ahead, perfect-operating-conditions efficiency doesn’t happen all that often in the real world.)
It might be more insightful to say that BT clients know how to heal their files rather to be pedestrian and state that BT clients are merely transferring files around. In other words, BitTorrent is powerful because it is both “decentralized” and “segmented” for its content.
The modifications described above in the Wikipedia quote — namely, making the tracker files decentralized and segmented — make BT even more resilient.
P.S. The Tribler folks are at least a little clueless in one regard. They describe their Mac OS X download as an “Apple” download — this sort of signals that they don’t understand that Apple makes other devices that are not Mac OS X compatible (hello iPad!). Yesterday I noticed that iPad sales revenues now exceed those of the Macbook family.
(Most recently edited on 2011-01-23.)
It strikes me as inaccurate, brazen, and silly when itsy-bitsy companies name their top dog “CEO.” If you are going to be a “chief” executive officer, it helps to have other executive officers around [1], not to mention other employees. Maybe I’m oversimplifying it, but if you don’t have a board, you don’t really have a true CEO.
[1] “The core duty of a CEO is to facilitate business outside of the company while guiding employees and other executive officers towards a central objective.” (From Wikipedia’s article describing the position of Chief Executive Officer)
I just checked out the Cablegate: The Game which is, technically, an interesting example of user-driven (yes, crowd-sourced) entity extraction. I just tried out the game by tagging entities on a cable giving background about Israel before a visit from the Deputy Secretary, James Steinberg.
Aside: Why aren’t more people asking if the cables you see are authentic copies? This reveals my skeptical nature, but when I see information anywhere, one of my first reactions is: how authentic do I think it is? I don’t have answers, so I remain open minded and Bayesian about it.
Anyhow, this is not the place for me to go into my opinions on WikiLeaks or the associated themes. This post is just about the technical/social aspects of this particular game, as it is a really timely and crafty example of blending a game and a distributed work platform. I view it also as an interesting example of hypertext annotation, which I’ve been interested in for quite some time.
I just played around with two separate browser windows pointing to the same cable. Here are some technical observations:
- Points scored in one browser window did reflect in the other. (However, to discover this, I need to do a page refresh. That’s understandable, but not quite state of the art.)
- I only got points for some terms I highlighted, but not others. There is some process that I don’t understand that determines if a highlighted phrase is point worthy. I’d like to learn how that is setup.
I have a few technical critiques:
- Instead of using “name” the interface should say “person”. “Name” is ambiguous because it could refer to any proper noun, including a place or location.
- If you first tag a last name, such as “Steinberg” but later want to tag the first and last names together “James Steinberg” the interface will not allow it.
- The interface changes the case of text you highlight. For example, if I highlight “QME” as a topic (which is used to describe Israel’s Qualitative Military Edge) it will show up as “Qme”.
I have a few technical suggestions:
- Expand the UI to allow users to select boilerplate page breaks that occur on some cables and mark them as such. This would allow later processing to treat the content of a cable as one uninterrupted chunk.
- Expand the UI to allow the connecting of synonyms, references, or acronyms. “Steinberg” and “James Steinberg” for example should be connected, as well as “QME” and “Qualitative Military Edge.”
If you didn’t realize that by playing Cablegate you were providing very valuable information to the owner of the site, Paul, who supports WikiLeaks, then I have a bridge in Alaska I can sell you.
The web world is all kinds of hot and bothered about ranking systems. Part of the enthusiasm arises because people are just so eager to express their opinions. They want to rate things. On the flip site, site owners want to aggregate these preferences and understand what they mean at a broader level.
I just checked out two questions on the statistics stack exchange web site: A basic voting system question and What are some of the best ranking algorithms with inputs as up and down votes?. (Of course I had to comment, rate, and participate. Reading these questions was quite stimulating.)
The web community is both ahead and behind the economics field when it comes to ranking systems.
Some parts of the web are ahead, such as Slashdot, Hacker News, Neflix, Amazon, LinkedIn, Stack Overflow. They drive innovation because they have lots of data and have strong incentives to make useful recommendations. They are faster than most (if not all) political entities when it comes to trying out new feedback channels. I would also bet that they are quicker than academics at writing papers. Smart web players figure out their ranking systems by building, experimenting, and adjusting. They learn organically.
Some parts of the web community, however, are also very behind. Few web developers and/or interaction designers are aware of the theory of voting systems — a fascinating topic area which broadly answers the question of “how do you convert a bunch of individual preferences or rankings into an group metric?” Even at times when the Web community seems to be savvy — for example, when Data.gov mentions its use of Bayesian Ratings — they seem to rely on purely statistical thinking, which misses a lot of the key insights that you get from economics (e.g. voting systems and game theory).
To pick a voting system, you choose a ballot style and a tallying method. There is not one right choice. The best you can do is find a pretty good voting system (by that I mean one with more desirable than undesirable properties AND that has group buy-in) for a problem space.
There are many pitfalls when it comes to eliciting the preferences of a group. It is important to think about social issues such as equality or fairness. There are so many ways to get it wrong! For example, many Web systems reward the “first comment” or are overly sensitive to the first ranking. Thank goodness that we are seeing movement in the right direction now. How many years did we have to suffer through craptastic comment systems before a more unified system (Disqus) is catching on and showing a halfway decent way to do it?
From what I’ve seen, web developers who are new to ranking systems are more likely to copy what some other web site and very unlikely to read economic or political writing about voting systems. That is a shame. To paraphrase, overgeneralize, and take out of context Zed Shaw (he would be proud), the web development community can be a ghetto. They could gain a lot from cross-fertilization; the problems of ranking systems are not new and certainly not unique to the web.
When it comes to ranking systems and community voting, I don’t buy the traditional argument that it is smart to just get something out there and iterate later. Getting it wrong upfront may create the wrong initial conditions for your community and hamper participation. Building the right values into your ranking and voting system is the heart of participation. It deserves some forethought.
(Most recent edits made on January 12, 2011.)
Dear Rubyists,
Here is a little trick if you want empty hash lookups to yell at you:
h = Hash.new(RuntimeError)h[:wonk] = true
h[:dj]
=> RuntimeError
I’m using this in my seeds.rb file in a Rails 3 project. I want to store a bunch of seed data and refer to it in other objects. This little tip helps me find mistakes faster.
-djwonk
Update #1! Whoops! This doesn’t actually cause the RuntimeError to be raised. It just returns the constant (RuntimeError). I think what I am trying to do might not be easy. The Hash documentation says “It is not possible to set the a default to a Proc that will be executed on each key lookup.” Bummer!
Update #2: This does the trick!
class AngryHash < Hash
alias :[] :fetch
endh = AngryHash.new
h[:wonk] = true
h[:dj]
IndexError: key not found
Yes, I coded in anger.
For the Ruby developers that use bundler, consider using this instead of just a plain `require ‘bundler/setup’`
2011-01-23 Update: now I find myself using the plain `require ‘bundler/setup’`. I can’t warn everybody about everything; you have to draw the line somewhere.
