On the geopolitics of digital knowledge – by Domenico Fiormonte*

ChatGPT, the artificial intelligence launched by the consortium Open AI has captured the attention of the world’s media, triggering both apocalyptic reactions and palingenetic delusions. On the one hand we have the case of Geoffrey Hinton, the pioneering AI scientist who left Google to express more freely alarm about the risks of these technologies; on the other Bill Gates prophesies with satisfaction (Microsoft is part of the Open AI consortium) the end of education as we know it.   While some among the creators and funders of ChatGPT, such as Elon Musk, are even calling for a moratorium to curb further ‘disturbing’ developments, few have explained how the “machine” is made and how it works.  How would it suddenly bring into being the insights from science fiction, from Kubrik’s Hal 9000 rebel computer to the Wachowski sisters’ Matrix movie.

ChatGPT is basically a powerful syntactic system, so it does not really know what it is talking about, but it is convincing in simulating textual interactions. It therefore does not produce original knowledge, does not possess common sense and has no experience of the world. Its credibility rests on an essentially statistical nature, but to the ordinary user it “appears intelligent.”  This is mainly for four reasons:

  • the computational power (speed)
  • the quantity and quality of data with which the neural network is fed,
  • the ability to “reverse” the search pathway within the Large Language Model (LLM) into a generative pathway (i.e., response creation), and
  • finally the ability to correct and recalibrate answers through human input.

Within these four points it is crucial to understand the way in which the Large Language Model (i.e., the data repository), is constructed. Not surprisingly this is the most obscure part of the whole process. The Washington Post in 2023 sought to shed light on this in an article mapping the “sources” used by Google Bard, one of the main competitors of ChatGPT.  The Post, with the support of the Stanford Allen Institute, has analyzed some ten million websites drawn from the Google C4 dataset, which is used to train not only Google’s AI products but also the LLaMA (Facebook’s Large Language Model). The ten million sites analyzed by the newspaper were divided into eleven categories: Business and Industrial, Technology, News and Media, Arts and Entertainment, Science & Health, Hobbies & Leisure, Home & Garden, Community, Job & Education, Travel, Law and Government. To give some examples, in the News & Media category, the top five sources are: wikipedia.org, scribd.com (subscription-based book and text bulletin board), nytimes.org, latimes.org, and theguardian.com. There are few surprises among the top five in the Science & Health category: journal.plos.org, frontiersin.org, link.springer.com, ncbi.nlm.nih.gov, nature.com. And finally in the Law & Government category the five top sites are:  patents.google.com (in first place), patents.com, caselaw.findlaw.com, publication.parliament.uk, freepatentsonline.com. It can readily be seen that most of the content is generated in the USA, where the commercial and private sectors prevail (with the exclusion of Wikipedia).

In conclusion, three aspects of this need to be emphasised:

1) these AI chatbots could not exist without us: not in the sense of engineers and computer scientists, but of the Internet users who have populated it with content in for around two decades of existence;

2) the methods used to build the aforementioned LLM, with few exceptions, are totally opaque;

3) the sources used to construct the LLM reflect heavy bias in geographical distribution, linguistic and cultural.

In short, the “knowledge” of artificial intelligences is predominantly Western and English-speaking. Moreover, the Post’s reconstruction reveals some interesting points of contact methodologically with the Cambridge Analytica case: the choice of sources with which to feed the AI brings us back to the problem of “cultural units” and their bias. Ultimately, these tools are cultural weapons in the hands of very specific geopolitical actors, and media attention, even when these represents tensions or contradictions.  This only reinforces their mythological status.

Perhaps the main challenges that the media and our societies will face in the coming years, is not how to establish new rules (e.g., for “ethical” use of AI, etc.), but to understand if we will still have the right to know who is “governing” the processes of construction and representation of reality. It will be necessary to join all epistemic forces (journalism, research, education, academia, etc.) to identify and understand who is designing such technologies, who is disseminating them, for what purposes, and why. From this challenge to the entire intellectual world will depend not only the future of democracy, but probably of knowledge, of our cultures and our memories – at least those cultures and memories that we have started to process, transmit and communicate from the time of the first appearance of writing, more than five thousand years ago.


* This is an English translation of an excerpt from: Domenico Fiormonte’s “Geopolitica della conoscenza digitale”, in Frattolillo, Oliviero (ed.), La doppia sfida della transizione ambientale e digitale. Roma, Roma TrE-Press, pp. 57-84. The full paper is free to download at: https://romatrepress.uniroma3.it/libro/la-doppia-sfida-della-transizione-ambientale-e-digitale

DFID-funded technology for education Hub Inception Phase consultation retreat hosted at Royal Holloway, University of London

It was great to have hosted the DFID-funded technology for education EdTech Hub three-day Inception Phase consultation retreat from the evening of  29th July through to 1st August at Royal Holloway, University of London.  This brought together some 30 members of the core team, funders and partners from the Overseas Development Institute, the Research for Equitable Access and Learning (REAL) Centre at the University of Cambridge, Brink, Jigsaw Consult, Results for Development, Open Development and Education, AfriLabs, BRAC and eLearning Africa, and the World Bank, as well as members of the Intellectual Leadership Team from across the world, and representation from the Bill and Melinda Gates Foundation.

The meeting was designed to set in motion all of the activities and processes for the Inception Phase of the eight-year Hub, focusing especially on

  • The Hub’s overall vision
  • The work of our three main spheres of activity
    • Research
    • Innovation, and
    • Engagement
  • Our governance structure
  • Our theory of change
  • Our ethical and safeguarding frameworks
  • Our communication strategy, and
  • Our use of Agile and adaptive approaches

The Hub aims to work in partnership to “galvanise a global community in pursuit of catalytic impact, focusing on evidence so we can collectively abandon what does not work and reallocate funding and effort to what does”.  Moreover, it is “committed to using rigorous evidence and innovation to improve the lives of the most marginalised”.

Above all, as the pictures below indicate, this meeting formed an essential part in helping to build the trust and good working relationships that are so essential in ensuring that this initiative, launched in June 2019, will achieve the ambitious goals that it has set.

 

Improving the management of digital government

Liz Quaglia and Tim Unwin from the UNESCO Chair in ICT4D attended the launch discussion for the Institute for Government’s new report on Improving the Management of Digital Government at a breakfast meeting on 21st June, which focused on the question “Who is responsible for effective, efficient and secure digital government?”.

Speakers at the event included:

  • Ciaran Martin CEO National Cyber Security Centre
  • Janet Hughes, Doteveryone
  • Bryan Glick, Editor Computer Weekly

and it was moderated by Daniel Thornton from the Institute of Government, one of the co-authors of the report (the other being Lucy Campbell).

Concluding thoughts from the speakers included:

  • It is very difficult to deliver effective digital government, but we should not despair and must keep moving forward to make things better;
  • It is essential to have a joined up approach across governments, with leadership at the highest level; and
  • How governments are organised is a secondary issue; what matters is beginning with a clear strategy, and then finding ways to deliver it.

The report itself makes interesting reading, and has wider relevance beyond the UK context.

Silvia Masiero’s seminar on big data and poverty in India

Silvia Masiero (Loughborough University, and Affiliated Member of the UNESCO Chair in ICT4D) has just finished a fascinating seminar at the UNESCO Chair in ICT4D on The Affordances of Big Data for Poverty Reduction: Evidence from India, which raised many interesting questions about the relative benefits and challenges of biometric data, especially in the context of demonetisation in India.  Slides of the presentation are available here, and her recent ICT4D briefing on the same subject is here.

This slideshow requires JavaScript.