Friday, August 07, 2015

Our life in bits and bytes

kw: book reviews, nonfiction, algorithms, prediction, sociology

What would life be like if the atoms that make us up were just big enough to see, if we could witness directly how they slide, merge and separate? How complex could our life be if the sum total of our lives could be described by, say, 1,000 characteristics, or perhaps 100? How about 10?

Yet how quick we are to pigeonhole people according to one or two, or at most five, distinguishing items! What do most of us now about, for example, Yo Yo Ma? Male, Chinese, famous musician (maybe you know he is a cellist), … anything else? How about that he is French born, a Harvard graduate, and has earned 19 Grammys? That's six items, more than most people probably now about him.

To what extent do you think you could predict his tastes and buying habits from these six items? If another person shares these six characteristics, to what extent will he also share certain tastes in clothing or food or books to read? Some people wish us to think, "to a great extent". In The Formula: How Algorithms Solve All Our Problems and Create More by Luke Dormehl, some of the people he interviewed claim to do just that. (Maybe you've made a profile on a dating site that starts matching you up when you've entered no more than four or five items. And how fully have you completed your FaceBook profile?). But some go to quite an extreme in another direction, using "big data" to pry inside our skulls.

What kind of big data? All your searches on Google, Yahoo, Alta Vista, Bing, or whatever; every click, Twitter text, FaceBook, LinkedIn, blog post, or online chat. We create tons of data about our day-to-day, even moment-by-moment activities. There was recently an item on the noon radio news about a company that aggregates such data and sells "packages" to companies, who pay $1 to $2 million dollars on some periodic basis for it (That's all I remember, I was listening with half an ear while folding laundry). Why is all that data so valuable? Because businesses believe they can better predict which products will sell to what kind of people if they crunch it.

A few months ago a handle on a drawer broke. Naturally, the cabinet is decades old and nothing even remotely similar in style could be found at Home Depot or a decorator's salon. So of course I looked online for something with the right spacing of mounting holes, with an appearance that would be compatible with the cabinet, in a set of four, so the handles would all match. It took a few days. I bought a set I liked, online, and installed them. For the next several months, however, ads about cabinet door handles appeared everywhere I went online: Google, FaceBook, Amazon, eBay. They all knew I'd been looking for door hardware. None of them knew I was done looking! (Google, are you listening? Do, please, close the loop and collect purchase data also.)

What is The Formula? Luke Dormehl calls it an Algorithm. What is an algorithm? To anyone but a mathematician it is a Recipe or a Procedure. I used to have a book, which I used into unusability: How to Keep Your Volkswagen Alive: A Manual of Step-by-Step Procedures for the Compleat Idiot by John Muir and Richard Sealey. With its help I kept my 1966 Bug alive into Moon Unit territory. The "procedures" were recipes, or algorithms, for things like setting valve clearances, changing a wheel bearing, or overhauling an engine. In computer science, an algorithm is the detailed instructions to a computer to direct it what you want it to do, very, very exactly.

Here is the kicker. A traditional algorithm is carried out in a procedural manner (don't pay attention to claims of non-procedural, object-oriented computer language gurus. At the root, a computer CPU carries out a series of procedural instructions), according to a "computer code" or "program", written in one or more formal languages. Some time ago I looked at the internal release notes for the Android OS used in many cell phones. That version, at least, released in 2009, had modules written in 40 computer languages. No matter how complex the program or program system, the instructions are written by a person, or perhaps by many persons, and no matter how many, their knowledge is finite. There are also time constraints, so that the final product will be biased, firstly by the limitations of the programmer(s), secondly by tactical decisions of what to leave out for the sake of time or efficiency, and thirdly by the simplifications or shortcuts this or that programmer might have made so that some operation was easier to write the code for. They may also be biased by inner prejudices of the programmer(s).

Another kicker: A kind of start-stop-start process had been going on around Neural Networks. They try to mimic the way our brains are wired. There are two kinds, hardware and software. Hardware neural nets are difficult to construct and more difficult to change, but they have much greater speed, yielding almost immediate results. Because people who can wire up such hardware are quite rare compared to people who can write computer software, hardware nets are also rare, and nearly all the research being done with them is being done using software simulations. "Machine learning" by neural nets can be carried out by either hard- or software nets, but I'll defer remarks on one significant difference for the moment.

A neural network created for a specific task—letter recognition in handwritten text, for example—is trained by providing two kinds of inputs. One is a series of target images to "view", perhaps in the form of GIF files, or with appropriate wiring, a camera directly attached. The other is the "meaning" that each target image is to have. A training set may have five exemplars of the lower-case "a", along with five indicators meaning "that is an a", five of "b" and their indicators, and so forth. The innards of the net somehow extract and store various characteristics of the training data set. Then it is "shown" an image to identify, and it will produce some kind of output, perhaps the ASCII code for the letter.

The inner workings of neural nets are pretty opaque, and perhaps unknowable without extremely diligent enumeration of all the things happening at every connection inside. But at the root, in a software neural network there is a traditional algorithm that describes the ways that the network connections will interact, which ones will be for taking input or making output, which ones will store things worth "remembering", and so forth. This is one reason that software nets are rather slow, even on pretty fast hardware. The simulation program cannot produce the wholly parallel processing that a hardware net uses (brains use wholly parallel processing, and are hard-put at linear processing, the opposite of computer CPU's). If the net is small, with only a few dozen or a few hundred nodes, the node-by-node computations can be accomplished rapidly, but a net that can recognize faces, for example, has to be a lot bigger than that. It will be hundreds of times slower.

Now for the other significant difference. The computer running the simulation is digital, while a hardware network is analog. I remember the first time I used a computer, that I was quite impressed to see calculations with 7-8 digits of significance, and if I used double precision, 15 digits. That sounds very precise, and for many uses, it is. Fifteen digit precision means one can specify the size of something about the size of a continent to the nearest nanometer. That is about the size of five or 10 atoms. However, a long series of calculations will not maintain such a level of precision. For many practical uses, calculations of much lower precision are sufficient. Before computers came along, buildings and bridges were built, and journeys planned; a slide rule was accurate enough to do the calculations. My best precision using a slide rule was 3-4 digits. But "real life"systems are typically nonlinear, and the sums tend to partly cancel one another out. You might start with very accurate measurements (but it's quite unlikely they are more accurate than 4-6 digits). Run a simulation based upon those figures a few dozen steps, and somewhere along the line there might have been a calculation similar to this:

324.871 659 836 648 - 324.860 521 422 697 → 0.011 138 413 951 016 4

If you've been counting digits, you might notice that the digits 0164 (which I colored red) are superfluous...where did they come from? That is the rounding error, both that which arose from representing the two numbers above in binary format, and that from the conversion of the result back into decimal form for display. But the bigger problem is that, counting only the black digits, only 11 are useful. Four have been lost. Further, if you were to start with decimal numbers that can be represented exactly in binary form, such as 75/64 = 1.171 875 and 43/128 = 0.335 937 5, multiplying them results in 3,225/8,182 = 0.393 676 757 812 5, which has 13 digits of precision, whereas the original numbers had seven each. Thus it typically takes twice as many digits to represent the result of a multiplication, as were needed to represent the two multiplicands.

I could go on longer, but an interested person can find ways to determine error propagation in all kinds of digital systems, many of which have long been studied already. By contrast, an analog system is not limited by rounding errors. Rather, real wires and real electronic components have thermal noise, which can trouble systems that run at temperatures we might find comfortable. Further, Extracting the outputs in numerical form takes delicate equipment, and the more accurately you want those output numbers to be, the more delicate and expensive the equipment gets. However, until readout, the simulation runs with no errors due to subtraction or multiplication, other than gradual amplification of thermal noise.

Suffice it to say, both direct procedural algorithms and neural network machine-learning systems are in use everywhere, trying to predict what the public is going to do, be it buying, voting, dating, relocating, or whatever. That is the main reason for science, after all: predicting the future. Medical science in the form of a doctor (or more than one) looks at a sick person and first tries to find a diagnosis, an evaluation of what the problem is. The next step is a prognosis, a prognostication or prediction; it is the doctors' expectation of the progress of the disease or syndrome, either under one treatment or another, or under none. A chemist trying to determine how to make a new polymer will use knowledge of chemical bonding to predict what a certain mixture of certain chemicals will produce. Then the experiment is carried out to either confirm the expectation (the prediction), or if it does not, to learn what might have gone against expectation and why. The experiments that led to the invention of Nylon took ten years. But based upon them, many other kinds of polymers later proved easier and quicker to develop. It is even so in biological science. Insect or seashell collecting can be a fun hobby, but a scientist will visit a research museum (or several) to learn all the places a certain animal lives, and when various specimens were collected, and then determine if there is a trend such as growing or shrinking population. Is the animal going extinct? Or is it flourishing and increasing its range worldwide?

In the author's view, The Formula represents the algorithms used in the business world, broadly construed, to predict what you might like, and thus present you with advertising to trigger your desire for that thing. My experience with cabinet handles shows that they often get their timing wrong. Many cool and interesting ads showed up, but it was too late. However, that isn't the author's point. The predictive methods find what ads to show us for products, or prospective dating partners on eHarmony or OK Cupid, or those that manage a politician's image, all tend to narrow our choices. A case in point from the analog world: one of the best jobs I had before going into Engineering came about because an Employment Agent, leafing through job sheets, muttered, "You wouldn't be interested in that," but I quickly said, "Try me!"

Try making some Google searches while logged in to Google, and then (perhaps using a different browser, and if you're really into due diligence, on a different computer network such as a library), making the same searches while not logged in. The "hits" in the main column will be similar, or possibly the same. But the ads on the right are tailored to your own search history and other indicators that Google has gathered.

Is all this a bad thing? Maybe. You can game the system a little, but as time goes on, your history will more and more outweigh things you do differently today. Sure, I got a sudden influx of ads about cabinet handles after searching for same, but if I had a history as a very skilled handyman (I don't!), the exact ads I saw might have been quite different. And I might have also seen ads about certain power tools intended to make the mounting of new cabinet handles even easier.

The author has four concerns and spends a chapter on each.

  1. Are algorithms objective? They cannot be. Programmers are not objective, and machine learning is dependent on the training set, which depends on the persons who create it, and they are not objective.
  2. Can an algorithm really predict human relationships? We have proverbs that give us pause, such as, "Opposites attract", and "If you're not near the one you love, you'll love the one you're near".
  3. Can algorithms make the law more fair? I was once asked by a supervisor if I thought he was fair. I replied, "Too much concern for fairness can result in harshness. We (his 'direct reports') wish to be treated not just fairly but well. We'd like a little mercy with our justice." Mr. Dormehl cites the case of an experiment with an inflexible computer program, given the speed records from a car on a long-distance trip. It issued about 500 virtual tickets. A different program, that averaged speed over intervals just a little longer, issued one ticket.
  4. Can an algorithm create art? Since all the programs created to date operate by studying what makes existing artworks more or less popular, they can only copy the past. True creation means doing what has not been done. Picasso and others who developed Cubism did so against great opposition. Now their works sell for millions. It was art even before it was popular, but the "populace" didn't see it that way for a couple decades.

The book closes with a thoughtful section titled "How to Stay Human in the World of the Formula." While he has some suggestions, I think the best way is to avoid being totally predictable. In many ways, that is hard for me, because I am a man of regular habits. I'm quite happy eating the same meat-and-cheese sandwich for lunch day after day, taking the same route to a work place (or these days, a place I volunteer), eating at a certain kind of restaurant and eschewing most "fine dining" places, wearing a certain kind of garb depending on the season, playing (on acoustic instruments, not electronic devices) certain kinds of music to the exclusion of others, and so forth. But I am also the kind of guy, when I make a mobile, it will be quite different from any other I have ever made: different materials, different color schemes, and different numbers of hanging objects clustered—or not—in various ways. I made one out of feathers once; not my most successful mobile. When I write a formal document or a letter for sending via snail mail, though I type it because handwriting is so slow, I usually pick a new typeface in which to print it; I have a collection of nearly 2,000 font files, carefully selected either for readability or as specialized drop caps (I love drop caps, though I am careful in their use). I haven't bothered to try alternate typefaces for this blog, because there are only 7 available anyway, and the default is as good as any.

The author proposes that we "learn more about the world of The Formula". Sure. But as long as Google's Edge Rank (formerly Page Rank) is a black box, and as long as everyone out there from FaceBook and LinkedIn to Amazon and NetFlix keep tweaking their own black box "recommendation engines", it will be a kind of arms race between the cleverest consumers and the marketers. But, hasn't that always been true?

No comments: