Transparent and accessible global data critical for European researchers: Interview with Professor Thomas Hartung

The landscape of scientific research is shaped by global digital infrastructures, with critical databases in health, medicine, climate, and the social sciences. The dominance of foreign and commercial databases poses strategic vulnerabilities for academic freedom, innovation, and research continuity in Europe. What are the risks associated with this concentrated ‘power’ of some databases? And how to ensure that European researchers’ work is not affected by political or other abuse of access to crucial databases? Ahead of the STOA workshop “Data Sovereignty in Research: Global Dependencies, Risks, and the European Response” on 1 October, the ESMH spoke with Professor Thomas Hartung, Chair of Bloomberg School of Public Health.

What is meant with ‘data sovereignty’ and ‘data resilience’?

Thomas Hartung: As field chief editor of Frontiers in Artificial Intelligence, I see data sovereignty not simply as “where the server sits” but as: who has the right to decide how data are governed, shared, and reused.

In Europe, this means data should be handled according to European values—privacy under GDPR, transparency under the Data Governance Act, and accessibility under the Open Science framework. Data resilience, on the other hand, is about ensuring that data remain available and usable, even under technical failures, political frictions, or economic shocks. To me, resilience means not only stable servers, but infrastructures that guarantee long-term accessibility, provenance of data, and portability across platforms.

Why do we need to discuss it now from a European perspective?

Thomas Hartung: Having professorships in the US and Germany, I see this mostly from a trans-Atlantic perspective. The timing is crucial because Europe is moving from regulating data protection to shaping the entire data economy. With the Data Act, the Data Governance Act, and the European Health Data Space, Europe is creating rules that will govern how research data can be reused across borders and sectors. These instruments directly affect how we as scientists access and link data. The urgency is heightened by the fact that many of the infrastructures researchers rely on are governed outside Europe. So if Europe wants to maintain academic freedom and scientific independence, it must ensure that core research data remain FAIR—findable, accessible, interoperable, reusable—and hosted under frameworks that reflect our scientific values.

If major infrastructures supporting databases are governed outside Europe, are there any fields of research where the EU leads when it comes to data?

Thomas Hartung: Absolutely. Europe has made remarkable contributions. The Copernicus program provides the world’s largest open Earth observation dataset, freely available to researchers worldwide. In life sciences, ELIXIR and EMBL-EBI set global standards for bioinformatics. EBRAINS, built on the Human Brain Project, provides an extraordinary atlas and modeling resource for neuroscience. And EuroHPC, with supercomputers like LUMI and Leonardo, ensures European sovereignty in compute-intensive science. These infrastructures are designed with openness in mind—open access to data, open metadata describing how the data were generated, and FAIR principles at their core.

In which areas do European researchers depend most on data that is governed outside Europe?

Thomas Hartung: In many domains, we rely on U.S.-based infrastructures. PubMed and PubMed Central are indispensable for biomedical research. The DOI system that structures scientific literature is largely governed by U.S.-based Crossref. arXiv, which has transformed the culture of preprints, is hosted by Cornell University. These are invaluable resources, and I want to be clear: the U.S. has provided the world with extraordinary open infrastructures. But dependency means that if policies, access models, or legal frameworks change, European research could be vulnerable. That is why mirroring, federating, and creating European nodes of such infrastructures is so important.

What does this dependency mean for academic freedom and open science?

Thomas Hartung: It shows how fragile our open science ecosystem can be. Academic freedom depends on researchers having uninterrupted access to data, literature, and metadata that allow reproducibility. If a commercial provider suddenly restricts APIs or increases costs, the ability to do text-and-data mining for AI research could be compromised. If metadata are incomplete, the transparency required for reproducibility is lost. As an editor, I argue strongly for open access publishing and for journals to require FAIR data and metadata to be deposited in trusted repositories. This is not a European versus American issue—it is about ensuring global science remains transparent, auditable, and accessible.

What are the other ways in which concentrated power over databases can affect researchers?

Thomas Hartung: The danger is subtle. Whoever controls the index of the scientific record controls visibility—what papers or datasets are easily found, and which remain obscure. Terms of service can limit how data are reused for machine learning or knowledge graphs. The concentration of power can lead to agenda-setting, simply by tweaking what appears first in a search result. This is why we need pluralism in infrastructures and mandatory openness for the data underlying scientific publications.

What infrastructures already support data resilience?

Thomas Hartung: Europe is not starting from scratch. The European Open Science Cloud is a federated platform for FAIR data. ELIXIR, EBRAINS, Copernicus, EuroHPC—all of these are designed to safeguard availability and align with European research priorities. Initiatives like Gaia-X aim to create a federated cloud and data ecosystem. Together they form the backbone of data resilience, but they need stronger political and financial backing to scale and compete with global giants.

How can we ensure that European researchers’ work is not affected by political or other abuse of access to crucial databases?

Thomas Hartung: The answer is openness and federation. If data and metadata are open, machine-readable, and mirrored across nodes, then no single provider can lock us out. FAIR principles must be the default. In practice, this means funding European mirrors of indispensable resources like PubMed, ensuring continuity APIs for text-mining, and building legal shields so researchers can switch providers without losing access. Above all, transparency and auditability must be non-negotiable—we must always know how data were generated and processed.

Finally, as a researcher working in the U.S., how do you see the current situation regarding academic freedom?

Thomas Hartung: The United States remains one of the most vibrant research environments in the world. The openness of its scientific infrastructures, from PubMed to arXiv, is a gift to global science. At the same time, there are increasing political pressures at state level, and heated debates on free speech in academia. From my perspective, the real risk is less about censorship and more about procedural friction—restrictions on data access, burdensome compliance requirements, or sudden changes in availability of platforms. This need is amplified in the era of AI as the global knowledge integrator – I have called this the beginning of “scAInce”. This is why I advocate globally for open access publishing, open data, and FAIR principles. They are the best safeguard we have to ensure that academic freedom and open science survive, no matter where we are in the world.

👉 Follow the webstream of the STOA event