To AI or not to AI
Welcome to another special edition of „Mediocrity and Madness“! Usually this Podcast is dedicated to the ever-widening gap between talk and reality in our big organizations, most notably in our global corporates. Well, I might have to admit that in some cases the undertone is a tiny bit angry and another bit tongue-in-cheek. The title might indicate that.
Today’s episode is not like this. Well, it is but in a different way. Upon reflection, it still addresses a mighty chasm between talk and reality but the reason for this chasm appears more forgivable to me than those many dysfunctions we appear to have accepted against better judgement. Today’s podcast is about artificial intelligence and our struggles to put it to use in businesses.
This podcast is to some measure inspired by what I learned in and around two programs of Allianz, “IT Literacy for top executives” and “AI for the business”, which I had the privilege and the pleasure to help developing and facilitating.
I am tempted to begin this episode with the same claim I used in the last (German) one: With artificial intelligence it is like with teenage sex. Everybody talks about it, but nobody really knows how it works. Everybody thinks that everyone else does it. Thus, everybody claims he does it.
And again, Dan Ariely gets all the credits for coining that phrase with “Big Data” instead of “artificial intelligence” which is actually a bit related anyway. Or not. As we will see later.
To begin with, the big question is:
What is “artificial intelligence” after all?
The straightforward way to answering that question is to first define what intelligence is in general and then apply the notion that “artificial” is just when the same is done by machines. Yet here begins the problem. There simply is no proper definition of intelligence. Some might say, intelligence is what discerns man from animal but that’s not very helpful, too. Where’s the boarder.
When I was a boy, I read that a commonplace definition was that humans use tools while animals don’t. Besides the question whether that little detail would be one that made us truly proud of our human intelligence, multiple examples of animals using tools have been found since.
To make a long story short, there is no proper and general definition of intelligence. Thus, we end up with some self-referentiality: “It’s intelligent if it behaves like a human”. In a way, that’s quite a dissatisfying definition, most of all because it leaves no room for types of intelligences that behave – or “are” – significantly non-human. “Black swan” is greeting. But we’re detouring into philosophy. Back to our problem at hand: What is artificial intelligence after all?
Well, if it’s intelligent, if it behaves like a human, then the logical answer to this question is: “artificial intelligence is when a computer/machine behaves like a human”. For practical purposes this is something we can work with. Yet even then another question looms: How do we evaluate whether it behaves like a human?
Being used to some self-referentiality already, the answer is quite straight forward: “It behaves like a human if other humans can’t tell the difference from human behavior.” This is actually the essence of what is called the “Turing test”, devised by the famous British mathematician Alan Turing who next to basically inventing what we today call computer sciences helped solving the Enigma encryption during World War II.
Turing’s biography is as inspiring as it is tragic and I wouldn’t mind if you stopped listening to this humble podcast and explored Turing in a bit more depth, for example by watching “The imitation game” starring Benedict Cumberbatch. If you decide to stay with me instead of Cumberbatch, that’s where we finally are:
“Artificial intelligence is when a machine/robot behaves in a way that humans can’t discern that behavior from human behavior.”
As you might imagine, the respective tests have to be designed properly so that biases are avoided. And, of course, also the questions or problems designed to ascertain human or less human behavior have to be designed carefully. These are subjects of more advanced versions of the Turing test but in the end, the ultimate condition remains the same: A machine is regarded intelligent if it behaves like a human.
It has taken us some time to establish this somewhat flawed, extremely human-centric but workable definition of machine intelligence. It poses some questions and it helps answering some others.
One question that is discussed around the Turing test is indeed whether would-be artificial intelligences should deliberately put a few mistakes into their behavior even despite better knowledge, just in order to appear more human. I think that question comes more from would-be philosophers than it is a serious one to consider. Yet, you could argue that if taking the Turing test seriously, in order to convince a human of being a fellow human the occasional mistake is appropriate. After all, “to err is human”. Again, the question appears a bit stupid to me. Would you really argue that it is intelligent only if it occasionally errs?
The other side of that coin though is quite relevant. In many discussions about machine intelligence, the implicit or explicit requirement appears to be: If it’s done by a machine, it needs to be 100%.
I reason that’s because when dealing with computer algorithms, like calculating for example the trajectory of a moon rocket, we’re used to zero errors; given that the programming is right, that there are no strange glitches in the hardware and that the input data isn’t faulty as such. Writing that, a puzzling thought enters my mind: We trust in machine perfection and expect human imperfection. Not a good outlook in regard to human supremacy.
Sorry, I’m on another detour. Time to get back to the question of intelligence. If we define intelligence as behavior being indiscernible from human one, why then do we wonder if machine intelligence doesn’t yield 100% perfect results. Well, for the really complex problems it would actually be impossible to define what “100% perfect” even is, neither ex ante nor ex post but let’s stick to the simpler problems for now: pattern recognition, predictive analysis, autonomous driving … . Intelligent beings make mistakes. Even those whose intelligence is focused onto a specific task. Human radiologists identify some spots on their pictures falsely as positive signs of cancer whilst they overlook others that actually would be malicious. So do machines trained to the same purpose.
I am rather sure that the kind listener’s intuitive reaction at this point is: “Who cares? – If the machine makes less errors than her human counterpart, let her take the lead!” And of course, this is the only logical conclusion. Yet quite often, here’s one major barrier to embracing artificial intelligence. Our reaction to machines threatening to become better than us but not totally perfect is poking for the outliers and inflating them until the use of machine intelligence feels somewhat disconcerting.
Well, they are competitors after all, aren’t they?
The radiologist case is especially illuminating. In fact, the problem is that amongst human radiologists there is a huge, huge spread in competency. Whilst a few radiologists are just brilliant in analyzing their pictures, others are comparatively poor. The gap not only results from experience or attitude, there are also significant differences from county to country for example. Thus, even if the machine would not beat the very best of radiologists, it would be a huge step ahead and saving many, many lives if one could just provide a better average across the board; – which is what commonly available machines geared to the task do. Guess what your average radiologist thinks about that. – Ah, and don’t mind, if the machine would not yet be better than her best human colleagues, it is but a matter of weeks or months or maybe a year or two until she is as we will see in a minute.
You still don’t believe that this impedes the adaption of artificial intelligence? – Look this example that made it into the feuilletons not long ago. Autonomous driving. Suppose you’re sitting in a car that is driven autonomously by some kind of artificial intelligence. All of a sudden, another car – probably driven by a human intelligence – comes towards you on the rather narrow street you’re driven through. Within microseconds, your car recognizes its choices: divert to the right and kill a group of kids playing there, divert to the left and kill some adults in their sixties one of which it recognizes as an important advisor to an even more important politician or keep the track and kill both, the occupants of the oncoming car … and unfortunately you yourself.
The dilemma has been stylized to a kind of fundamental question by some would-be philosophers with the underlying notion of “if we can’t solve that dilemma rationally, we might better give up the whole idea of autonomous driving for good.” Well, I am exaggerating again but there is some truth in that. Now, as the dilemma is inextricable as such: bye, bye autonomous driving!
Of course, the real answer is all but philosophical. Actually, it doesn’t matter what choice the intelligence driving our car makes. It might actually just throw a dice in its random access memory. We have thousands of traffic victims every year anyway. Humankind has decided to live with that sad fact as the advantages of mobility outweigh these bereavements. We have invented motor liability insurance exactly for that reason. Thus, the only and very pragmatic question has to be: Do the advantages of autonomous driving outweigh some sad accidents? – And fortunately, probability is that autonomous driving will massively reduce the number of traffic accidents so the question is actually a very simple one to deal with. Except probably for motor insurance companies … and some would-be philosophers.
Here’s another intriguing thing with artificial intelligence: irreversibility. As soon as machine intelligence has become better than man in a specific area, the competition is won forever by the machines. Or lost for humankind. Simple: as soon as your artificial radiologist beats her human colleague, the latter one will never catch up again. On the contrary. The machine will improve further, in some cases very fast. Man might improve a little, over time but by far not at the same speed as his silicon colleague … or competitor … or potential replacement.
In some cases, the world splits into two parallel ones: the machine world and the human world. This is what happened in 1997 with the game of Chess when Deep Blue beat the then world champion Gary Kasparow. Deep Blue wasn’t even an intelligence. It was just a brute force with input from some chess savvy programmers but then humans have lost the game to the machines, forever. In today’s chess tournaments not the best players on earth compete but the best human players. They might use computers to improve their game but none of them would stand the slightest chance against a halfway decent artificial chess intelligence … or even a brute force algorithm.
The loss of chess for humankind is a rather ancient story compared to the game of Go. Go being multitudes more complex than chess resisted the machines about twenty years more. Brute force doesn’t work for Go and thus it took until 2016 until AlphaGo, an artificial intelligence designed to play Go by Google’s DeepMind finally conquered that stronghold of humanity. That year, AlphaGo defeated Lee Sedol, one of the best players in the world. A few months later, the program also defeated Ke Jie, the then top-ranking player in the world.
Most impressive though it is that again only a few months later DeepMind published another version of its Go-genius: AlphaGo Zero. Whilst AlphaGo had been trained with huge numbers of Go matches played by human players, AlphaGo Zero had to be taught only the rules of the game and developed its skills purely by playing against versions of itself. After three days, this version beat her predecessor that had won against Lee Sedol 100:0. And again only three months later, another version was deployed. AlphaZero learnt the games of Chess and Go and Shogi, another highly complex strategy game, in only a few hours and defeated all previous versions in a sweep. By then, man was out of the picture for what can be considered an eternity by measures of AI development cycles.
AlphaZero not only plays a better Go – or Chess – than any human does, it develops totally new strategies and tactics to play the game, it plays moves never considered reasonable before by its carbon-based predecessors. It has transcended its creators in the game and never again will humanity regain that domain.
This, you see, is the nature of artificial intelligence: as soon as it has gained superiority in a certain domain, this domain is forever lost for humankind. If anything, another technology will surpass its predecessor. We and our human brains won’t. We might comfort ourselves that it’s only rather mundane tasks that we cede to machines of specialized intelligence, that it’s a long way still towards a more universal artificial intelligence and that after all, we’re the creators of these intelligences … . But the games of Chess and Go are actually not quite so mundane and the development is somewhat exponential. Finally, a look into ancient mythology is all but comforting. Take Greece as an example: the progenitor of gods, Uranos, was emasculated by his offspring, the Titans and these again were defeated and punished by their offspring, the Olympians, who then ruled the world, most notably Zeus, Uranos’ grandson.
Well, Greek mythology is probably not what the kind listener expects from a podcast about artificial intelligence. Hence, back to business.
AI is not necessarily BIG Data
Here’s a not so uncommon misconception: AI or advanced analytics is always Big Data or – more exactly: Big Data is a necessary prerequisite for advanced analytics.
We could make use of the AlphaZero example again. There could hardly be less data necessary. Just a few rules of the game and off we go! “Wait”, some will argue, “our business problems aren’t like this. What we want is predictive analysis and that’s Big Data for sure!”. I personally and vehemently believe this is a misconception. I actually assume, it is a misconception with a purpose but before sinking deeper into speculation, let’s look at an example, a real business problem.
I have spent quite some years in the insurance business. Hence please apologize for me using an insurance example. It is very simple. The idea is using artificial intelligence for calculating insurance premiums, specifically motor insurance third party liability (TPL). Usually, this is a mandatory insurance. The risk it covers is that you in your capacity of driving a car – or parking it – damage an object that belongs to someone else or that you injure someone else. Usually, your insurance premium should reflect the risk you want to cover. Thus, in the case of TPL the essential question from an actuary’s point of view is the following one: Is the person under inspection a good driver or a not so good one? “Good” in the insurer’s sense: less prone to cause an accident and if so, one that usually doesn’t come with a big damage.
There are zillions of ways to approach that problem. The best would probably be to get an individual psychological profile of the respective person, add a decently detailed analysis of her driving patterns (where, when, …) and calculate the premium based on that analysis, maybe using some sort of artificial intelligence in order to cope with the complex set of data.
The traditional way is comparatively simplistic and indirect. We use a mere handful of data, some of them related to the car like type and registration code, some personal data like age or homeownership and some about driving patterns, mostly yearly mileage and calculate a premium out of these few by some rather simple statistical analysis.
If we were looking for more Big Data-ish solutions we could consider basing our calculation on social media timelines. Young males posting photos that show them Friday and Saturday nights in distant clubs with fancy drinks in their hands should emerge with way higher premiums than their geeky contemporaries who spend their weekends in front of some computers using their cars only to drive to the next fast food restaurant or once a week to the comic book shop. The shades in between might be subtle and an artificial intelligence might come up with some rather delicate distinctions.
And you might not even need a whole timeline. Just one picture might suffice. The forms of our faces, our haircut, the glasses we fancy, the jewlery we wear, the way we wrinkle our noses … might well be very good indicators of our driving behavior. Definitely a job for an artificial intelligence.
I’m sure, you can imagine other avenues. Some are truly Big Data, others are rather small in terms of data … and fancy learning machines. The point is, these very different approaches may well yield very similar results ie, a few data related to your car might reveal quite as much about the question at hand as an analysis of your Instagram story.
The fundamental reason is that data as such are worthless. Valuable is only what we extract from that data. This is the so-called DIKW hierarchy. Data, Information, Knowledge, Wisdom. The true challenge is extracting wisdom from data. And the rule is not: more data – more wisdom. On the contrary. Too much data might in fact clutter the way to wisdom. And in any case, very different data might represent the same information, knowledge or wisdom.
As what concerns our example, I have first of all to admit that I have nor analytical proof – or wisdom – about specifics I am going to discuss but I feel confident that the examples illustrate the point. Here we go.
The type of car – put into in the right correlation with a few other data -- might already contain most of the knowledge you could gain from a full-blown psychological analysis or a comprehensive inspection of a person’s social media profile. Data representing a 19 year old male, living in a certain area of town, owning a used but rather high powered car, driving a certain mileage per year might very well contain the same information with respect to our question about “good” driving as all the pictures we find in his Facebook timeline. And the other way around. The same holds true for the information we might get out of a single static photo.
Yet the Facebook timeline or the photo are welling over with information that is irrelevant for our specific problem. Or irrelevant at all. And it is utterly difficult to get a) the necessary data in a proper breadth and quality at all and b) to distill relevant information, knowledge and wisdom from this cornucopia of data.
Again: more data does not necessarily mean more wisdom! It might. But one kind of data might – no: will – contain the same information as other kinds. Even the absence of data might contain information or knowledge. Assume for instance, you have someone explicitly denying her consent to using her data for marketing purposes. That might mean she is anxious about her data privacy which in turn might indicate that she is also concerned about other burning social and environmental issues which then might indicate she doesn’t use her car a lot and if so in a rather responsible way … .
You get the point. Most probably that whole chain of reasoning won’t work having that single piece of data in isolation but put into the context of other data there might actually be wisdom. Actually, looking at the whole picture, this might not even be a chain of reasoning but more a description of the certain state of things that denies decomposition into human logic. Which leads us to another issue with artificial intelligence.
The unboxing problem
Artificial intelligences, very much like their human contemporaries, can’t always be understood easily. That is, the logic, the chain of reasoning, the parameters that causally determine certain outcomes, decisions or predictions are in many cases less than transparent. At the same time, we humans demand from artificial intelligence what we can’t deliver for our own reasoning: this very transparency. Quite like us demanding 100% machine perfection, some control-instinct of ours claims: If it’s not transparent to us (humans), it isn’t worth much.
Hence, a line of research in the field of artificial intelligence has developed: “Unboxing the AI”.
Except for some specific cases yet, the outlook for this discipline isn’t too bright. The reason is the very way artificial intelligence works. Made in the image of the human brain, artificial intelligences consist of so-called “neural networks”. A neural network is more or less a – layered – mesh of nodes. The strength of the connections between these nodes determines how the input to the network determines the output. Training the AI means varying the strengths of these connections in a way that the network finally translates the input into a desired output in a decent manner. There are different topologies for these networks, tailored to certain classes of problems but the thing as such is rather universal.
Hence AI projects can be rather simple by IT standards: define the right target function, collect proper training data, plug that data to your neural network, train it … . It takes but a couple of weeks and voila, you have an artificial intelligence thatyou can throw on new data for solving your problem.
In short, what we can call “intelligence” is the state of strengths of all the connections in your network. The number of these connections can be huge and the nature of the neural network is actually agnostic to the problem you want it to solve. “Unboxing” would thus mean to backwardly extract specific criteria from such a huge and agnostic network. In our radiologist case for example, we would have to find something like “serrated fringes” or “solid core” in nothing but this set of connection strengths in our network. Have fun!
Well, you might approach the problem differently by simply probing your AI in order to learn that and how it actually reacts to serrated fringes. But that approach has its limits, too. If you don’t know what to look for or if the results are determined not by a single criterion but by the entirety of some data, looking for specifics becomes utterly difficult. Think of AlphaZero again. It develops strategies and moves that have been unknown to man before. Can we really claim we must understand the logic behind, neglecting the fact that Go as such has been quite resistant to straightforward tactics and logic patterns for the centuries humans have played it.
The question is: why “unboxing” after all? – Have you ever asked for unboxing a fellow human’s brain? OK, being able to do that for your adolescent kids’ brains would be a real blessing! But normally we don’t unbox brains. Why are we attracted by one person and not by another? Is it the colour of her eyes, her laughter lines, her voice, her choice of words …? Why do we find one person trustworthy and another one not? Is it the way she stands, her dress, her sincerity, her sense of humour? How do we solve a mathematical problem? Or a business one? When and how do the pieces fall into place? Where does the crucial idea emerge from?
Even when we strive to rationalize our decision making, there always remain components we cannot properly “unbox”. If the problem at hand is complex – and thus relevant – enough. We “factor in” strategic considerations, assumptions about the future, others’ expectations … . Parts of our reasoning are shaped by our personal experiences, our individual preferences, like our risk-appetite, values, aspirations, … . Unbox this!
Humankind has learnt to cope with the impossibility of “unboxing” brains or lives. We probe others and if we’re happy with the results, we start trusting. We cede responsibilities and continue probing. We cede more responsibilities … and sometimes we are surpassed by the very persons we promoted. Ah, I am entering philosophical grounds again. Apologies!
To make it short. I admit, there are some cases in which you might need full transparency, complete “unboxing”. And in case you don’t get it, abolish the idea of using AI for the problem you had in mind.
But there are more cases in which the desire for unboxing is just another pretense for not chartering new territory. If it’s intelligent if it behaves like a human why do we ask for so much more from the machines than we would ask from man?
Again, I am drifting off into questions of dangerously fundamental nature. Let’s assume for once that we have overcome all our concerns, prejudices and excuses, that despite all of them, we have a business problem we full-heartedly want to throw artificial intelligence at. Then comes the biggest challenge of all.
The biggest challenge of all: how to operationalize it
Pretty much like in our discussion at the beginning of this post, on the face of it, it looks simple: unplug the human intelligence occupied with the work at hand and plug in the artificial one. If it is significant – quite some AI projects are still more in the toy category – this comes along with all the challenges we are used to in what we call change management. Automating tasks comes with adapting to new processes, jobs becoming redundant, layoffs, re-training and rallying the remaining workforce behind the new ways of working.
Yet changes related to artificial intelligence might have a very different quality. They are about “intelligence” after all, aren’t they? They are not about replacing repetitive, sometimes strenuous or boring work like welding metal or consolidating accounting records, they dig to the heart of our pride. Plus, the results are by default neither perfect nor “unboxable”. That makes it very hard to actually operationalize artificial intelligence. Here’s an example.
It is more than fifteen years old, taking place at a time when a terabyte was an still an incredible amount of storage, when data was still desired to be stored in warehouses and not floating around in lakes or oceans and when true machine learning was still a purely academic discipline. In short: the good old times. This gives us the privilege to strip the example bare of complexity and buzz.
At that time, I was together with a few others responsible for developing Business Intelligence solutions in the area of insurance sales. We had our dispositive data stored in the proverbial warehouse, some smart actuaries had applied multivariate statistics to that data and hurrah, we got propensities to buy and rescind for our customers.
Even with the simple means we had by then, these propensities were quite accurate. As an ex-post analysis showed, they hit the mark at 80% applying the relevant metrics. Cutting the ranking at rather ambitious levels, we pushed the information to our agents: customers who with a likelihood of more than 80% were to close a new contract or to cancel one … or both. The latter one sounds a bit odd, but a deeper look showed that these were indeed customers who were intensely looking for a new insurance without a strong loyalty. – If we won them, they would stay with us and loyalty would improve, if a competitor won them, they would gradually transfer their portfolio to him.
You would think that would be a treasure trove for any salesforce in the world, wouldn’t you? Far from it! Most agents either ignored the information or – worse – they discredited it. To the latter purpose, they used anecdotal evidence: “My mother in law was on the list”, they broadcast, “she would never cancel her contract”. Well, some analysis showed that she was on the list for a reason but how would you fight a good story with the intricacies of multivariate statistics? Actually, the mother-in-law issue was more of a proxy for a deeper concern. Client relationship is supposed to be the core competency of any salesforce. And now, there comes some algorithm or artificial intelligence that claims to understand at least a (major) part of that core competency as good as that very salesforce … . Definitely a reason to fight back, isn’t it?
Besides this, agents did not use the information because they regarded it not too helpful. Many of the customers on the high-propensity-to-buy-list were their “good” customers anyway, those with who they were in regular contact already. They were likely indeed to make another buy but agents reasoned they would have contacted them anyway. So, don’t bother with that list.
Regarding the list of customers on the verge of rescinding, the problem was a different one. Agents had only very little (monetary) incentive to prevent these from doing so. There was a recurring commission but asked whether to invest valuable time into just keeping a customer or going for new business, most were inclined to choose the latter option.
I could continue on end with stories around that work, but I’d like to share only one more tidbit here before entering a brief review of what went wrong: What was the reaction of management higher up the food-chain when all these facts trickled in? Well, they questioned the quality of the analysis and demanded to include more – today we would say “bigger” – data in order to improve that quality, like buying sociodemographic data which was the fad at that time. Well, that might have increased the quality from 80% to 80+% but remember the discussion we had around redundancy of data. The type of car you drive or the sum covered by your home insurance might say much more than sociodemographic data based on the area you live in. … Not to speak of that eternal management talk that 80% would be good enough.
What went wrong?
First, the purpose of the action wasn’t thought through well enough from the start. We more or less just choose the easiest way. Certainly, the purpose couldn’t have been to provide agents with a list of leads they already knew were their best customers. From a business perspective the group of “second best customers” might have been much more attractive. Approaching that group and closing new contracts there would have not only created new business but also broadened the base of loyal customers and thus paved the way for longer term success. The price would of course have been that these customers would have been more difficult to win over than the “already good” ones so that agents would have needed an incentive to invest effort into this group. Admittedly going for the second-best group would have come with more difficulties. We might have faced for example many more mother-in-law anecdotes.
Second, there was no mechanism in place to foster the use of the information. Whether the agents worked on the leads or not didn’t matter so why should they? Worse even with the churn-list. From a long-term business perspective, it makes all the sense in the world to prevent customer churn as winning new customers is way more expensive. It also makes perfect sense to try making your second-best customers more loyal but from a short-term salesman’s or -woman’s perspective boiling the soup of already good customers makes more short-term sense. Thus, in order to operationalize AI target systems might need a thorough overhaul. If you are serious, that is. The same holds true if you would for example want to establish machine assisted sentiment analysis in your customer care center.
Third, there was no good understanding of data and data analytics neither on the supposed-to-be users’ side nor on the management side. This led to the “usual” reflexes on both sides: resistance on the one side and an overly simplified call for “better” on the other one. Whatever “better” was supposed to mean.
Of course, neither the example nor the conclusions are exhaustive, but I hope they help illustrate the point: more often than not it is not the analytics part of artificial intelligence that is the tricky one. It is tricky indeed but there are smart and experienced people around to deal with that type of tricky business.
More often than not, the truly tricky part is to put AI into operations,
- to ask the right questions in the first place,
- to integrate the amazing opportunities in a consistent way into your organization, processes and systems,
- to manage a change that is more fundamental than simple automation and
- to resist the reflex that bigger is always better!
So much for today from “Mediocrity and Madness”, the podcast that usually deals with the ever-growing gap between corporate rhetoric and action. I dearly thank all the people who provided inspiration and input to these musings especially in and around the programs I mentioned in the intro, most notably Gemma Garriga, Marcela Schrank Fialova, Christiane Konzelmann, Stephanie Schneider, Arnaud Michelet and the revered Prof. Jürgen Schmidhuber!
Thank You for listening … and I hope to have you back soon!
Mehr zu "Mittelmaß und Wahnsinn":
Oder direkt bei