Interview with Sven Schlarb, scientist at AIT Austrian Institute of Technology GmbH.
How is the Co-Change project contributing to research in your institution?
Sven Schlarb: The Co-Change project combines researchers who think about ethical principles with projects that integrate these principles into their work. As such, responsible research, not only the monitoring function, is brought into a project’s design-phase, so that ethics has a real impact on the choice of technology. Often the driver of a project is a new technology, and from this driver, the search for stakeholders that are interested in the new technology begins. Research projects are often driven by enabling technologies and the use case or the discussion with end-users and stakeholders are a secondary priority. Opposite to such approaches, we at AIT, according to our applied research mission, aim to take into account end-users’ perspectives as well as societal and policy requirements from the very beginning in the project scope and project set-up. Although we already work and research in dedicated funding programs where ethical and societal aspects are per definition part of the project scope, see for example the Austrian KIRAS funding programme, there is especially in the context of the huge disruptive effects stimulated by digitalisation technologies in many areas, a need to focus much more on ethical and societal aspects as market drivers for the development of digital innovation. The difference of the Co-Change lab is to find ways to let the ethical perspective jump in earlier so that it is an essential part performed in the design phase and not only limited to a monitoring function.
What does responsible research and innovation (RRI) mean in your field and how it is a part of your research at present?
Sven Schlarb: My research focuses on natural language processing (NLP) – processes that have to do with unstructured text data – from which we extract information. This includes analysing the grammar of natural language or tokens of interest, which we call named entities. These can be persons, names, locations, dates; meaningful information which we could use to recognize events, for example, such as people taking part in events at a certain location and date.
AI is a big word. In our work, it is mainly based on statistics and has to do with how often words appear in certain context. We create language models which use large amounts of text data as input to be able to find patterns in new text data. We can also use these models to predict words that probably follow in given word sequences. For example if I say: “this evening, I go to the…” we can imagine words that probably follow, such as “theatre” or “cinema.” We also know this from Google; if you type a sequence of words, you get suggestions.
One of the projects that we had as our main use case in the Co-Change project was COPKIT, a project together with law enforcement. Police organisations need to perform crime investigations. This sometimes entails extracting information from the dark net, where people can access websites that are not available by the usual browsers. From there, people can go to retail-style websites and buy, for example, drugs. To get a quick overview, data can be gathered and analysed using AI. But, while there are certainly people who go to the dark net to buy drugs, there are others who simply sell their washing machine. Monitoring everything would mean observing people and collecting information on people who might not be doing anything illegal. In our research we therefore focus on use cases that avoid the use of personal information.
Furthermore, when processing natural language we often make use of pre-trained models. There are a lot of datasets available for this purpose on the internet. Many people download dark net market data and use it to create language processing software. But just because it is on the internet doesn’t mean that you can use it however you want. However, as we are working on applied use cases, we need real market data with real use of language and market operations that always contain personal data.
How do you balance privacy and ethics with the need for real data in your research?
Sven Schlarb: A simple approach is to pseudonymise the data: we automatically replace letters and numbers of phone numbers and email addresses, for example, with random letters and numbers so that we do not accidentally leak personal information. In COPKIT, we have an ethical legal privacy team involved from the very beginning that advises us regarding ethical aspects to be considered when dealing with specific use cases and data sets
Another ethical problem is that data contains background information and bias that we might not be aware of, such as gender bias or racism. For example, we take the whole Wikipedia corpus and train the model on that, and take common model as a basis to draw domain-specific conclusions. By giving the machine these examples it learns patterns of language usage, but there might be pre-judgements encoded that we are not aware of so the model itself can be biased. For example, for professions, if you say that a person is doing room cleaning, the model might implicitly conclude that the person is a woman, while a programmer, on the other hand, would be assumed to be a man. These are biased conclusions that might come out of model predictions. So, we saw that these things play a role and that we need to pay attention to what we do and what datasets we use and which kind of biases they might contain.
Do you have a golden vision about how to balance societal/ethical values in your research?
Sven Schlarb: What I would like to see is that research is inter-disciplinary, and that we think about how ethical principles can be formalised and used as an integral part to support AI from a technical perspective. Additionally, these formalised ethical principles can be used to correct AI decisions that are wrong, so that problematic conclusions or decisions can be overruled as they are validated against human-controlled principles.
These principles should reflect a community point-of-view; a single developer’s opinion is not a democratic set-up. In our concrete scenario, this could be a rule-based technology inherently using ethical principles which would be able to correct certain suggestions or decisions that are made by prediction models and recognising that they are problematic in certain regards.
In that sense, having an inter-disciplinary team involved in the early design-phase of a project, so that you are not only limited to technology or stakeholder interests, is important. In my opinion, there are clear benefits to the stakeholder if they know that what the technology concludes is verified to be compliant with ethical principles.
That being said, we have to define a new understanding of research activities. It is of crucial importance that we apply dedicated research in this very critical domain to understand potential risks stimulated by new technologies for privacy and ethical requirements. End-users, service providers and policy makers have to be brought in an active discussion for handling new technologies. Only when we – developers as well as users of technology – understand the functions of new technologies, as well as the potential impacts especially on privacy and ethics, will we be able to act as responsible technology developers and technology users.
To sum this up, this is an essential part of our Centers’ mission statement. By working in such projects we ensure that our future digital systems are highly secure, reliable and trustful as well as conforming to ethical rules of our society.