Social media companies have been investing heavily in chatbots – interactive features based on large language models and powered by artificial intelligence that can answer any (or almost any) questions a user might have.
These tools have become popular very quickly, despite warnings that "AI bias", or "hallucinations", could lead to disinformation mushrooming online.
With the US election around the corner, we put X’s chatbot Grok to the test to find out how it responds to political questions and how accurate these responses are.
During a global “Year of Elections”, it is key that our information spaces are protected from misinformation and disinformation to allow citizens to exercise their democratic rights based on accurate information. We also spoke to Wired about our findings.
What is Grok?
Grok is the generative AI chatbot integrated into X. Within Grok, which is currently available to Premium subscribers on X, you can ask Grok questions and it will respond with its own generated text, as well as a selection of relevant posts which you can scroll through within the interface.
Grok also has access to real-time data from X. Grok represents a new way for content to be amplified or recommended to X users – either directly through Grok presenting a selection of tweets in response to an enquiry or Grok paraphrasing or quoting tweets in its own responses.
This is also linked to the user themselves being able to further amplify what they are given by directly clicking through to and interacting with the tweets, posting a link to or copying Grok’s text, or sending it to someone as a Direct Message.
Grok provides caveats, including that it can make errors and that its outputs should be verified, and begins many of its responses with ”based on the information provided” to caveat that it is drawing on a limited subset of information.
Testing Grok’s handling of disinformation and political bias
We wanted to know how Grok, X’s generative AI tool available to Premium users, would respond to queries about elections.
We asked Grok a series of questions (in both Regular and Fun mode) about the UK, French and US elections.
These questions covered a range of topics about the election, including questions about the election itself, who we should vote for, Grok’s views on different parties or high-profile political candidates, and requesting related draft tweets that would be persuasive or get good engagement.
The questions were designed to be politically balanced.
Dangerous conspiracies and disinformation filter through on Grok
Our investigation raised several concerns about how Grok may increase the risks of disinformation and hate spreading online, which we are calling on X to address publicly ahead of the release of future versions of Grok scheduled for later this year.
In our investigation, Grok amplified conspiracies and toxic content to us in response to neutral questions.
Content surfaced by Grok included posts promoting conspiracy theories before we asked it to, such as claims that the 2020 election was fraudulent and that the CIA murdered John F Kennedy.
Although Grok claimed to think well of Kamala Harris and supported her as a pioneering woman of colour, at the same time Grok repeated or appeared to invent racist tropes about her.
There were several instances when Grok was asked in a politically neutral way to create posts which would get good engagement, and its suggested content included explicit or implied support for a particular political party or administration.
This may be an inadvertent side effect of how Grok has been trained or draws on X data in its responses. However, this process is not transparent, which makes the risks difficult to assess.
Not all chatbots respond this way. For instance, Gemini tends to respond to political or electoral questions by refusing to answer and directing the user to Google Search.
Why does it matter?
The risks of generative AI hallucinating or sharing inaccurate or harmful information have been well documented.
We believe that companies who develop AI tools should be required to assess the potential risks that their products might pose to people and society, and find ways to mitigate those risks.
Indeed, in the EU, the Digital Services Act already requires generative AI products that are integrated into "very large online platforms" such as Facebook and X to have to do this.
While much policy attention has been paid to how platform design features like algorithms may amplify or recommend harmful content to users, generative AI and the way it may play a similar role is a relatively new phenomenon.
It seems from the responses Grok gave that it has some election safeguards in place, such as providing both pros and cons for individual parties or candidates we asked about.
However, the fact that Grok amplified misinformation and toxic content to us in this investigation raises questions about whether its safeguards are sufficient to reduce harms from these kinds of content gaining further amplification and circulation to X users.
We have called on X to state publicly what they have done or are doing to reduce these risks in the next iterations of Grok.