Automation of 'news headlines' on DONALD TRUMP using machine learning
Much have been said about the socio-cultural and political consequences of fake news, and their dependency on digital media technologies for its amplification. This is reflected in the concept of 'post-truth', in which objective and verifiable facts are no longer as influential in shaping public opinion as those specifically made to appeal to emotions and personal belief regardless of veracity. A clear example of mis-information virally spreading over social networks occurred during the 2016 US elections in which Donald Trump was elected president. More recent concerns over fake news have focused on the availability of new digital technologies such as machine learning and artificial intelligence (AI) as possible enablers in the next wave of fake news, what has been termed 'deep fakes'. The capacity of AI systems to generate text, audio and visuals on their own has certainly fueled much debate on the dangers of mis-using them to create 'deep fakes'. Such concern is the primary motivation behind the artwork I present here.
'Deep Fakes' as inspirational concept for art
In this work I wanted to explore the creation of a fake online news outlet in the form of a single webpage as representation of an art object. The art is not the website per se but the construction of a new reality based on fake headlines.
It is important to point out that fake headlines were derived from training a machine learning algorithm with real news headlines.
Creation of news headlines using machine learning
In order to have an algorithm automatically generate news headlines I used a Recurrent Neural Network (RNN) architecture, specifically Long-Short Term Memory (LSTM). RNNs and LSTMs are very useful for language modeling because they can predict the occurrences of characters in a sequence of text depending on the context from the input data.
I assembled a data set of more than 9,000 news headlines published from January 2015 until May of 2018. A text file containing 9,890 lines and 114,383 words was created as a result of using a Python script to generate API calls to NewsAPI.org requesting headlines containing the word 'Trump' in them. This text file was used as input to train the following two LSTMs:
(a) LSTM in Python with Keras
After experimenting with both of them on my MacBook Pro (CPU processing) I decided to continue with char-rnn because of faster processing of the corpus text on my laptop. I trained the model several times changing parameters and evaluating the resulting text until I could find a set of conditions that produced the most interesting results:
Here is a short list of news headlines that were generated:
'Comey revealed exactly how Donald Trump drama'
'Trump touts may drug dealers'
'Stormy Daniels sues Trump on tapper: Trump floats military ban on immigration 2qain attacks on $NRA guo'
'Trump tweet: Iran deal, a total killer for Trump got a smared under Trump as Mike Scott Pruittary republacure'
'Ex-Trump Low to Veet Math GRC Interview, It'
'Trump Reacts to Lifature Reacted, or Didn'
'Colbert: 'Sa Hope President Trump Is a Very Popor in Response'
'Happy Check: Sidnal Everson Cooper: Trump floats shutdown for 'slimeball' up 60,000% after Trump's trash talkin' Tweet a tactic against Robert'
'Trump says Playboy model for sex'
Interestingly, the algorithm could learn the basic 'style' of news headlines and also where and when to use capital letters when mentioning names such as Trump, Colbert, Comey and Stormy Daniels. Grammatical errors and non-sensical words are evident, this could in theory be ameliorated by increasing the amount of headlines in the text input, and adjusting parameters in the model. In some occasions, the headlines were constructed by joining pieces of two or more headlines and adding new words to them. In some others, the relationship among words within a sentence was totally new.
Placing of headlines into fake single-page website
As means to construct a fake single-page website that emulated a trustworthy news source, I took inspiration on the political section of the New York Times online. The name given to the fake news source was 'SPLASHED' and contained AI-generated titles and text as visual metaphor on the appropriation of advanced digital technologies for mis-information (Figure 1).
Figure 1. SPLASHED is a single-page deep fake online news source in which headlines are created using machine learning.
Create a corpus of text containing full articles written on Trump in order to train an LSTM to generate the content of news for SPLASHED to make it as deep fake as possible.
I would like to thank Laurenzo's lab and School of Creative Media (SCM), City University of Hong Kong; Gene Kogan for introducing char-rnn during his workshop at SCM.