This article highlights weak spots in artificial intelligence (AI) using case studies and guides us to focus on specific risks.
If you were to improve something, what is the first thing you would do? If you are thinking of assessing past performance, you are right. It is the most efficient and effective method of understanding improvement areas and fixes.
And, it should not be any different for artificial intelligence (AI) projects. Automation and primitive forms of AI have been in the trial and operation for the past many years. Some have worked as expected, while some have failed. I have chosen a few use-cases that have either failed or have not shown full benefits as expected.
These case studies highlight weak spots in AI and guide us to focus on specific risks. Of course, these use-cases are merely a drop in the ocean, but they represent a few common aspects.
Microsoft chatbot Tay on Twitter
On March 23, 2016, Microsoft released an artificially-intelligent chatbot called Tay. Named after the acronym Thinking About You, Tay was developed to help understand how AI would interact with human users online.
Subsequently, Tay was programmed to ingest tweets and learn from them to communicate by responding and interacting with those users. Mainly, the target audience was American young adults. (I do not think the outcome would have been any different if the target audience was any different.) However, this attempted experiment lasted only 24 hours before Tay had to be taken offline for publishing extreme and offensive racist as well as sexist tweets.
Tay was a classic example of a system that was vulnerable to burning and pertinent issues in the data science world—garbage in, garbage out. It was developed to learn from active real-time conversations on Twitter, but it could not filter offensive inputs or bigoted comments in the process. Ultimately, Tay learned from user responses and reflected the same kind of emotion or thinking. Since most of the tweets were abusive, racist and sexist, Tay’s responses followed the pattern.
The second attempt to release Tay by Microsoft did not go too well either and the bot was taken down soon after the second release. It has not been up online since then.
Of course, Microsoft did not explicitly program Tay for discrimination of any kind. It is safe to assume that its learning data did not have any discriminating characteristics either. However, feedback loop from which Tay was supposed to learn had a flaw.
Racial discrimination in Amazon’s same-day delivery
In early 2016, Amazon rolled out same-day delivery to its Prime programme subscribers, but only for a select group of American cities and for a select group of neighbourhoods. These neighbourhoods were the ones where concentration of Prime subscribers was large enough to justify operational costs.
However, it soon became an issue when people realised that predominantly non-white neighbourhoods were primarily excluded from this offering. Customers from these areas were not happy.
When this issue was highlighted in several forums and the media, Craig Berman, Amazon’s vice president for global communications, asserted, “When it comes to same-day delivery, our goal is to serve as many people as we can.” He continued, “Demographics play no role in it. Zero.” From Amazon’s standpoint, this seemed to be a logical approach from the cost and efficiency perspective, such that they would prioritise areas with most existing paying members over the others.
A solely data-driven approach that looked at numbers instead of people did reinforce long-entrenched inequality in access to their services. For people who were (inadvertently) excluded, it did not make much difference. The fact that racial demographics tend to correlate with location or zipcode did result in indirect discrimination.
Apparently, in scenarios where zipcodes are considered as one of the data points for any decision-making, there will always be some inherent bias. It is because zipcodes usually represent tightly-knit communities of one kind. Perhaps better sampling of input data could help to a certain extent in minimising this risk, but it would be far from being absolute zero.
I remember about ten years ago, Indian banks used this (zipcode-based) strategy to exclude applicants from credit card or personal loan applications. I do not know if they are continuing these practices still.
Uber’s autonomous car kills a pedestrian
An Arizona pedestrian was killed by an automated vehicle owned by Uber, in 2018. A preliminary report released by National Transportation Safety Board (NTSB) in response to the incident stated that there was a human presence in the automated vehicle, but the human was not in control of the car when the collision occurred.
An accident usually implies several things going wrong at once and, hence, it is difficult to investigate as well as ascertain final accountability and root cause. Various reasons contributed to this accident, including poor visibility of the pedestrian, inadequate safety systems of the autonomous car and lack of oversight by the human assistant.
The legal matter was eventually settled out of courts, and further details were not released. However, issues around liability present significant complexity due to multiple parties and actors involved. On this occasion, the vehicle was operated by Uber but was under the supervision of a human driver who was an Uber employee, and it was operated autonomously using components and systems designed by various other tech companies.
Responsibility attribution with an AI system poses a significant dilemma and, unfortunately, there are no set standard guidelines to make it more transparent.
On occasions like these, where human lives are directly affected, clarity is undoubtedly needed along with a universal framework to help in a standardised approach to deal with such matters.
Although there was no final and clear legal verdict as such, the fact that ambiguity exists is a significant concern. When there are several actors at play, human as well as non-human, direct as well as indirect, ascertaining accountability, root cause and liability will be the key for clarity.
DoS attack on smart home by a light bulb
In 2009, Raul Rojas, a computer science professor at Free University of Berlin, built one of Germany’s first smart homes. Everything in his house was connected to the Internet so that lights, music, television, heating and cooling could all be turned on and off from afar. Even the stove, oven and microwave could be turned off with Rojas’s computer.
Now the challenge with his setup was an ubiquitous one—there were several manufacturers and protocols at play. To help address this, Rojas designed the whole system to connect all smart devices to one hub. The hub was the central coordinator of all communication between devices as well as through the Internet.
A few years after the installation, during 2013, the smart home gave up and stopped responding to Rojas’ commands. In computer parlance, the system hung-up or froze. When Rojas investigated, it turned out that there was just one culprit which caused this problem, a connected lightbulb!
He found that this light fixture burned out and was trying to tell the hub that it needed attention. However, when doing this, it was continuously sending requests to the hub, which overloaded the network and eventually caused it to freeze. In other words, it was causing a denial of service (DoS) to the rest of the smart home devices by standing in their way of communication with the hub. Rojas changed the bulb, and the problem vanished promptly.
This issue, however, highlights a few potential problems in smart homes as well as so-called autonomous systems. When things go wrong and are out of hands, how can an end-user take over control and put things back in their right place, let alone control or even investigate the matter to figure out what is wrong in the first place? Rojas did not design for a bypass mechanism where he could commandeer the whole system. Maybe because it was an integration of several heterogeneous systems, or perhaps he did not think of it.
Australian telco flushes millions of dollars
In early 2018, an Australian telecommunications company bit the bullet and rolled out an automation bot (auto-bot) for its incident handling process. Although the telco expected to reap benefits from the beginning of its implementation and save on more than 25 per cent of the operational costs, the plan backfired.
The auto-bot was designed to intercept all network incidents and then follow a series of checks based on the problem type selected by users. It was programmed to take one of the three actions based on tests it would perform. It would remotely resolve the incident by fixing the issue programmatically, or it would assume that a technician’s visit is required to customer premises. And accordingly, it would send someone directly, or if none of that were apparent, it would present the case to the human operator for further investigation and decision.
This approach was sound and seemed quite logical in the first place. However, after commencement of the auto-bot operation, within a few weeks, the company realised the auto-bot was sending an awful lot of technicians in the field than before. Sending out technicians for field visits was a costly affair and was always the last choice for fixing an issue. However, the auto-bot maximised on that.
After investigation, the team figured out that there were a few incident scenarios that a human operator could understand (and invariably join the dots) but were not clear enough to be programmed for the auto-bot. In all such cases, a human operator would have taken a different decision than the auto-bot.
The primary issue was, despite finding out the flaw in logic, the automation team was unable to turn off the auto-bot (much like what Microsoft did with Tay). They had implemented the auto-bot in all or nothing manner, and it was sitting right in the middle of the user and operator interface. It essentially meant, either all incidents would go through the auto-bot and get wrongly handled often, or none would go through the auto-bot and would get manually handled. Now, the only issue was that the telco was not ready to handle such a workload—they had already released the staff for saving costs.
Eventually, the telco set up another project to fix the auto-bot while it was in operation and wasted several million dollars in the process. They spent money on two things—for continuing service with the artificially-stupid auto-bot and for running a massive fix-up project that lasted more than a year.
Eventually, the endowment effect kicked in. The company had no plans to go back and fix the problem from its roots but instead kept pushing through and wasting an enormous amount of money. The crucial question is: who eventually paid for this?
In my view, this implementation went wrong on several levels, right from system design to its implementation and fixing of the problems. But the first and foremost question that emanates is: Why there was no plan B, a kill switch of some sort to stop this auto-bot?
The auto-bot development and rollout were not thoroughly tested for all potential scenarios and, thus, lacked testing rigour, which could have identified problems early on. While the time required to fix the situation was too long, detecting the failure of the auto-bot took considerably longer.
Did you notice a pattern here?
Whether it is chatbot Tay by Microsoft, Uber’s autonomous car or the auto-bot of Australian telco, all of these have something in common that failed. And it is not the technology!
In all these scenarios, either the creator of AI or businesses that deployed the AI was not careful enough. They did not follow the fundamental tenet of handling something as powerful as automation or primitive AI responsibly. We often say, “With great power comes great responsibility.” And yet, in all these cases, responsible design or deployment did not happen or did not occur in full spirit.
Responsible behaviour is necessary for the deployment and use of AI as well as all other stages from conception to design, testing to implementation, and ongoing management and governance.
Almost all the cases discussed here had a certain level of weaknesses in the solution conception stage, and this directly seeped into their development.
In cases of a chatbot, telco’s auto-bot and autonomous car, emphasis on solution quality was not enough. There might have been a few testing routines—just enough to meet the requirements of IT development frameworks—but not enough to meet the AI development framework, which does not exist!
In the case of Amazon, I would question the skillset of decision-makers. Although it appeared to be an inadvertent decision to roll out same-day deliveries to only a few localities, thoughtful consideration could have raised these questions in reflection. The same goes for the telco case or lightbulb situation where creators lacked thoughtfulness in the design of the solution.
Nine things to learn from this
While there are several use-cases of AI to learn from, I specifically chose these five, which are indicative of a common issue with AI design, development, deployment and governance—a thoughtful and responsible approach.
In summary, I would say:
- Data governance is essential from ethical AI’s point of view; therefore creators of AI need to ensure they have robust data governance foundations, or their AI applications risk being fed with inappropriate data and breaching several laws.
- Narrow AI is all about the relation between input and output. You provide input x and get output y, or there is input x to do output y. Either way, the nature of input affects output such that indiscriminate input can lead to adverse outcomes. And this is just one good reason why rigourous testing is so important. We must note that in the case of AI systems, general IT system testing mechanisms are usually not enough.
- Automated decisions are suitable when there is a large volume of decisions to be made, and the criteria are relatively uniform and uncontested. When discretion and exceptions are required, use automated systems only as a tool to assist humans, or do not use them at all. There are still several applications and use-cases that we cannot define as clearly as a game of chess.
- Humans must always be kept in the loop no matter what, whether it is during the design phase of AI, or testing or deployment. Majority of AI systems are still kids and need a responsible adult to be in charge. Most importantly, ensuring enough human resources are available to handle the likely workload is always a good idea.
- Most importantly, a transparent chain of accountability is necessary. If the answer to the question “who is responsible for the decision made by this AI?” does not yield single person’s name, then this gap needs to be fixed.
- For customer-facing AI systems, having a good experience at all times is crucial. If customers have a terrible experience, they will lose trust and eventually render the AI solution useless.
- Fairness, lack of bias, interpretability, explainability, transparency, repeatability and robustness are a few critical aspects that are some of the must-have characteristics for a trustworthy AI solution.
- As AI systems become more powerful, managing risk is going to be even more critical from good governance and risk management points of view. Having this governance in place is not only an umbrella requirement for the industry but is also a good idea for every business to have inhouse.
- Ethical aspect must always be upheld, and just because something can be done, should not be done. AI that negatively affects employees, customers or the general public, directly or indirectly, serves no good and should not exist.
American philosopher John Dewey, said, “Failure is not a mere failure. It is instructive. The person who really thinks learns quite as much from his failures as from his successes.”
Anand Tamboli is a serial entrepreneur, speaker, award-winning published author and emerging technology thought leader