1/3. We are publishing a series of three articles dealing with artificial intelligence. The first article reflects on the biases of AI. The second article highlights AI regulation in the EU. In the third article, we seek to explain the decisions made by AI algorithms.
I participated in the AI summit in Lyon (France) a few days ago, a conference organized by OMNES Education. I am a lecturer at this school. My articles talk about and analyze emerging technologies. Biases are everywhere, whether they are conscious or not (ideology, partial knowledge, neurosis…, bills to pay).
When talking about Artificial Intelligence, informed people know that beyond the tech, the most important element is the data — the size of the data sets, the recurrence of new sets, the quality of the data, etc. — that feeds the AI “brain”.
For non-experts — sometimes reluctant — the bias of AI is therefore quite obvious: it is the bias related to data. As for any system, any entity, any research, avoiding biases is the key to success. But you have to be conscious of it! Human biases can never be corrected: this is what gives all the pleasure in our lives. Phylosophically it is the characteristic of the human being. The biases of an AI can be corrected (by the human or in a meta way, i.e. by specialized AIs), it is the nature of the artificial.
Since the disappearance of Neanderthal, Man had the monopoly of conceptual intelligence. The term “intelligence” describes the cognitive function of being aware of situations, learning from them and applying them to make decisions. The term “artificial” refers to machines.
The term Artificial Intelligence (AI) is used when a machine mimics the cognitive functions that humans associate with other human minds. Compared to human-programmed intelligence, AI is able to create its own algorithms through the process of Machine Learning (ML).
The major challenges currently facing Artificial Intelligence
The three topics considered as challenges in the field of machine learning (ML) are: bias and fairness, weak signals and learning on networks.
This is only a partial view of the challenges in AI, which is a very broad and mostly interdisciplinary field. AI is a set of tools, methods and technologies that enable a system to perform tasks in a quasi-autonomous way, and there are different ways to achieve this.
ML consists of the machine learning from examples, training itself to perform tasks efficiently afterwards. The great successes in this field are computer vision and automatic listening, used for applications in biometrics for example, as well as natural language processing. One of the questions that currently arises is how much confidence can be placed in ML tools, as deep learning requires very large volumes of data, which often come from the web.
Unlike datasets that were previously collected by researchers, web data is not acquired in a “controlled” way. And the massive nature of this data can sometimes lead to ignoring the methodological questions that should be asked to exploit the information it contains. For example, training a face recognition model directly from web data can lead to biases, in the sense that the model would not recognize all types of faces with the same efficiency. In this case, the bias can be induced by a lack of representativeness of the faces used for training.
However, the disparities in performance may also be due to the intrinsic difficulty of the prediction problem and/or to the limitations of current ML techniques.
It is well known, for example, that the level of performance achieved for the recognition of newborn faces by deep learning is much lower than for adult faces.
But today we have no clear theoretical insight into the link between the structure of the deep neural network used and the performance of the model for a given task.
Could these biases ever be removed or reduced?
There are different types of bias. They can be relative to the data, there are the so-called “selection” biases, linked to the lack of representativeness, “omission” biases, due to endogeneity, etc. Biases are also inherent to the choice of the neural network model, of the ML method, a choice that is inevitably restricted to the state of the art and limited by the current technology.
Tomorrow, we may use other, more efficient, less computationally intensive representations of information, which could be deployed more easily, and which may reduce or eliminate these biases, but for the moment, they exist!
The role of the quality of the datasets, used for learning in biases
Given the necessary volume, the data often comes from the web and is therefore not acquired in a sufficiently controlled manner to ensure its representativeness. But there is also the fact that this data can be “contaminated”, in a malicious way. This is currently an issue for the computer vision solutions that will equip autonomous vehicles. The vehicle can be deceived by manipulating the input information.
ML is based on a frequentist principle and the question of the representativeness of the data during the learning phase is a major issue. To take the example of autonomous driving, we see many vehicles on the road today, equipped with sensors to store as much experience as possible. That said, it is difficult to say how long it will be before we have seen enough situations to be able to deploy a system that is intelligent and reliable enough in this field to deal with all future situations.
There are applications for which the data available today allow the implementation of ML in a satisfactory way.
For other problems, in addition to experimental data, generative models will also be used, producing artificial data to account for adverse situations, but without being able to claim exhaustiveness. This is the case for ML applications in cybersecurity, in order to try to automatically detect malicious intrusions in a network for example.
Generally speaking, there are many problems for which the data available is too sparse to implement ML in a simple way. This is often the case in anomaly detection, especially for predictive maintenance of complex systems.
In some cases, the hybridization of ML and symbolic techniques in AI could provide solutions. These avenues are being explored in the aviation domain, as well as in medical imaging. Beyond their efficiency, such approaches can also allow machines to make decisions that are easier to explain and interpret.
What is driving AI today?
The field of mathematics contributes a lot, especially in terms of efficient representation of information and algorithms. But it is also technological progress that drives AI forward.
Recent technical advances, particularly in the field of memory, have made it possible to implement deep neural network models.
Similarly, distributed computing architectures and dedicated programming frameworks have made it possible to scale up learning on large volumes of data. More frugal approaches still need to be designed!
Source: Stephan Clémençon