Friday, October 2, 2009

Amazon's Whispernet: the door swings both ways

Amazon's Whispernet and Whispersync have been touted as a revolution, an innovation in the content pipeline that will propel eBooks from a niche market into the mainstream population. If you have a Kindle, Whispernet means that you have access to all of Amazon's Kindle content from anywhere there is Sprint coverage, and that your user-generated content is automatically backed-up and synced across multiple devices by Amazon. For free. Great! It also means that Amazon pulls data from your Kindle; obviously, Amazon retrieves the content that it backs up, but perhaps it gets other information as well. To co-opt an old adage: if you can see Amazon, Amazon can see you.  And what Amazon sees (and how often) affects your data privacy and ownership.  Given Amazon's track record with transparency, Kindle users should be proactive in learning more about features of their device that are not advertised.

So just what information is sent to Amazon by your Kindle? According to Amazon:
"Information Received. The Device Software will provide Amazon with data about your Device and its interaction with the Service (such as available memory, up-time, log files and signal strength) and information related to the content on your Device and your use of it (such as automatic bookmarking of the last page read and content deletions from the Device). Annotations, bookmarks, notes, highlights, or similar markings you make in your Device are backed up through the Service. Information we receive is subject to the Privacy Notice."
This description is very vague. Just what kind of information is logged on the device, and how much of it is sent to Amazon? What information related to content is sent? Does Amazon only receive information related to content that is downloaded from Amazon, or will they receive information about, for instance, the fan-fiction you downloaded? How often is this information sent to Amazon?

If the Kindle communicated with Amazon via WiFi, you could set up your access point or router to intercept traffic from the Kindle and find the answers to the above questions that way. However, the Kindle communicates with Amazon over a Sprint 3G network via an account you have no access to, so you'd have to somehow intercept and interpret the cell signal. My understanding of wireless technology is extremely limited, but I'm guessing that you'd have to either passively intercept the signal (a federal crime in the United States) or somehow execute a man-in-the-middle attack against your Kindle (for instance, by setting up or impersonating a cell station, which would also probably constitute at least one federal crime).

Without being able to intercept traffic from the Kindle, the only way to determine exactly what information is sent by a Kindle is hacking it, which is against the Amazon Kindle License and Terms of Use:
No Reverse Engineering, Decompilation, Disassembly or Circumvention. You may not, and you will not encourage, assist or authorize any other person to, modify, reverse engineer, decompile or disassemble the Device or the Software, whether in whole or in part, create any derivative works from or of the Software, or bypass, modify, defeat or tamper with or circumvent any of the functions or protections of the Device or Software or any mechanisms operatively linked to the Software, including, but not limited to, augmenting or substituting any digital rights management functionality of the Device or Software.

Luckily, there are people far more tech-savvy, more willing to risk bricking their Kindles, and more willing to risk getting banned by Amazon than I am. They've who have figured out what information gets sent to Amazon. Since the sample logs from that forum post are just snippets, my interpretation may not be entirely accurate, but what seems to get sent to Amazon is:
  • the times at which you switch screens (i.e. from the list of books to a particular book)
  • the details of the book you are reading (i.e. the title, authors, the Amazon Standard Identification Number, content type, publisher, publication date, display title and authors, length, when you last accessed the book, your last location in the book, whether it's encrypted, whether it's a sample, whether it is newly downloaded to your Kindle, the path to the file on the Kindle system, whether there is text-to-speech metadata)
  • the details of your device - I'm not sure how to interpret all the data, but it seems like: your EVDO network information, signal strength, your latitude and longitude, and more
Obviously, because the Kindle has a 3G radio, Sprint knows where you are. But why is this information sent to Amazon as well? The terms of use did not mention receiving information regarding your location coordinates.

From the various complaints I've read about Whispersync not applying to PDFs on the Kindle DX, and the phrasing in various reviews regarding Whispersync, I assume Amazon does not sync your non-Amazon content across your devices. However, it's possible that Amazon also retrieves statistics about your non-Amazon content, along with your Amazon content (i.e. title, filename, etc. as given above).

Furthermore, when you use the web browser, all traffic goes through an Amazon web proxy (which is understandable - the proxy can optimize pages for display on the Kindle, for example by filtering out large images and/or video that the Kindle can't display anyway). This information is not disclosed to the Kindle user (although perhaps Amazon did not mention this in the Kindle terms of use because the browser is experimental). So if you use the Kindle for web browsing, Amazon also receives information about which sites you visit.

There is an obvious reason why Amazon wants all this data, a reason that doesn't involve tinfoil hats: market research. The wealth of information regarding users' reading habits provides accurate data I doubt companies can even pay for because self-reporting is notoriously subject to recall bias. Setting up a market research study at such a scale, with a sample size numbering in the hundreds of thousands, if not millions, would probably be prohibitively expensive. The data the Kindle collects is commercially valuable to Amazon, possibly valuable enough to offset what it pays to Sprint for connectivity.

While Amazon's motivations are likely benign, there may still be negative externalities associated with the data collection.  For instance, although this information is generated by their users, the users cannot access it and have no control over how long it is stored or how it is used.  Is losing control over personal data a reasonable price for a user to pay for the convenience of Whispernet? Maybe so - it depends on the preferences of each Kindle user.  However, the price is likely one that most users aren't even aware of and do not know how to evaluate. Which brings me to another cliché I was considering as a title for this blog post: there is no such thing as a free lunch.