Thursday, January 25, 2018

ok, so about the Data Guy/Author Earnings thing.

Specifically, this.

I'm agin' it, as the actress said to the bishop. Here's a few thoughts.

1. This isn't publically available data that Data Guy is using. This is data extracted from Amazon's private servers by 3rd party robot web crawlers. If Data Guy were selling this data, with the only difference that Amazon had set their no robots policy strictly enough so that he would have had to have actively cracked through Amazon's web security, then the criminal act would be clearly obvious.

2. A counterfactual. Suppose instead that Statistical Lady (SL) had spent the past couple of years extracting buyer data from Amazon's web servers, and publically analyzing it in general terms. That is, suppose SL had published a series of analyses of buyer data in terms of the average Amazon customer, the anonymous distribution of Amazon customers, and so on.

To this point, Statistical Lady has been proper in treating the data she extracted, randomizing and anonymizing the results so that no individual Amazon customer can be extracted from her analysis. Further, SL has been doing this analysis freely. In effect, SL has been a social data scientist using the data she extracted in a reasonable fair use manner with no possibility of renumeration.

But, now Statistical Lady changes tactics. Now, she sets up a commercial service, and changes her analysis. Now, rather than anonymizing the data, SL is ranking the top 100000 Amazon customers, by pseudonym or fuzzing out the names in public, while at the same time offering to anyone who comes along the opportunity to buy that data, unmodified and in complete, personal detail.

That is, SL now offers to sell to all comers the personal buying habits of every Amazon customer.

Again, at this point, the behavior would be clearly obvious as having crossed the line.

In particular, a customer purchasing from Amazon, whether explicitly or not, whether in black letter law or not, has an expectation that their consumer habits on Amazon should not be available by purchase to third parties. If, for some reason, Amazon's terms of service were to allow them to sell the buying data of individual consumers to anyone with the wherewithal to purchase it, no questions asked, then, whether this action is legal or not, one would expect that Amazon's customers would, within hours, abandon the company completely.

And this is especially important for the non-obvious aspects of the Amazon customer base. Consider what would happen if Amazon offered to sell the equivalent usage data for their Web Services to anyone who wants it. In that case, one would expect that Amazon's Web Services would be bankrupt within hours.

Now, from the counterfactual, it seems clear that Amazon is not likely to be making available their retail customers's buying habits, since whether or not they have a legal right to do so, from a practical business point of view such actions would be insane, in my opinion.

Thus, as a general matter, one would expect that Amazon has a significant interest in insuring that buying data on their servers is considered, as a practical matter, inviolate, protected, and most of all, private.

For every buyer, there must, of course, be a seller. Thus, as a matter of fair trading, if the buyer has a practical expectation of privacy, then so should the seller. Thus, the counterfactual indicates that in the real life example, whatever the practical legal impediments to Data Guy's commercial activities, as an ethical matter, Data Guy has completely crossed the line by selling his data in the way that he has.

The only practical way to thread the needle, in my opinion, is for Data Guy to systematically randomize, anonymize completely, his commerically available data set, in a 3rd party auditable manner. No opt ins, opt outs, nothing. Such things are half-measures, in my opinion, and are more than likely simply a lawsuit waiting to happen.

No comments:

Post a Comment

Please keep it on the sane side. There are an awful lot of places on the internet for discussions of politics, money, sex, religion, etc. etc. et bloody cetera. In this time and place, let us talk about something else, and politely, please.