Automated detection of political ideology from text: a case study of newspapers in Uruguay


ABSTRACT_

Although objective reporting is the landmark of professional journalism, several academics have argued that the media is ideologically biased. The association of the printed press (newspapers) with political parties have long been acknowledged in Uruguay. However, the lack of studies that empirically demonstrate and measure the extent of ideological bias towards political affiliations has prevented scholars from addressing the ideological diversity of newspapers. Here, the author describes the use of natural language processing and unsupervised machine learning algorithms in conjunction with network graph analysis to investigate the political leaning of five newspapers that published a total of 530 news articles on two political candidates from opposing parties during the election cycle of 2019.

THE INFLUENCE OF MEDIA IN SHAPING PUBLIC OPINION_

The media -radio, television, press, and online- has a vital role in society and its main function is the communication to the public of local, national and international events. Media could certainly influence what people think about a range of national and international issues. Several factors can contribute to the degree in which people are influenced by media; and these factors vary according to geographical and cultural contexts. For instance, a recent study co-authored by Hong Tien Vu and Peter Bobkowski from Kansas University showed that the strength of the effect of journalism on public opinion (from 16 countries on five continents) was dependent on the public's age, educational level, living area, political ideology; and also on the nation's macro variables such as economic development and media freedom [1, 2]. The study found that agendas set by media showed a moderately high correlation with issues the public considered most relevant, with countries such as South Korea, Taiwan, South Africa, Philippines, Mexico and Chile displaying statistically significant relationships between media and public agendas [2].

Media can also limit the scope of arguments and perspectives that inform public debate, and the subsequent construction of not only public belief but also their attitudes towards social change; as it was demonstrated by Happer and Philo in 2013 [3]. These authors could show that for the issue of disability in the United Kingdom, an increase in reporting from printed media that discussed the topic in unsympathetic terms led to negative public opinion on disability benefits and the persons who have them. Interestingly as well, the authors showed that repeated exposure to media messages related to climate change pre-disposed the attitude and behavior of people in adjusting their views and opinions to new information [3]. This means that public opinion could be selectively changed by exposure to media; a phenomenon that in United States was recently shown to occur when university researchers intentionally intervened 48 media outlets to activate public expression, causing citizens to discuss major issues of policy and politics as part of the ongoing collective 'national conversation' [4]. Interestingly, the authors could detect that their media intervention altered the composition of Twitter opinions expressed in the national conversation by 2.3% towards the ideological direction implicit by their published articles; and an increase in public engagement and discussion on Twitter that was 62.7% higher relative to the media's day's volume [4].

Overall, these works described that in different countries media had a considerable and tangible effect in shaping public opinion on public policy in general and on political issues in particular. This raises the question if media effects on public opinion is intentional or not; and more importantly, if press and media may or not reflect reality but filter and shape it instead according to their biases (inherent or intentional) towards certain policies and political views.

IS MEDIA BIASED?_

Although the premise of professional journalism rest on objective reporting, several scholars have described that the media consistently displays ideological bias [5-8]. Ideology is described here as a system of ideas, beliefs and ideals which constitute the basis for political and economical theories that guide policy making [9]. Media bias is considered intentional if it results from both a conscious act or choice and is sustained over time [5, 7]. In this sense, media bias is considered to be a systemic tendency instead of independently isolated incidents in which either journalists and/or media owners purposely implement in order to obtain a concrete political, social and/or economic goal.

Three types of media bias have been described [5], and these are: COVERAGE, GATEKEEPING, and STATEMENT. COVERAGE bias refers to the visibility of topics and entities, such as a person/politician or country, in media coverage. GATEKEEPING bias, also termed selection bias or agenda bias, refers to which stories media outlets select or discard for reporting. STATEMENT bias, also denominated presentation bias, relates to how articles and stories choose to report on concepts.

Media bias in news content could significantly impact the political attitude of voters and thus influence the outcome of elections [3]. What then are the potential effects of biased news consumption? A likely outcome would be the reduction of political diversity and views across the population; which in turns would diminish freedom of expression and democratic values. Indeed, it has been suggested that presence and cultivation of ideologically diverse news content would lead to healthier democracies [5]. Because of this, several countries have established laws that regulate media ownership as means to limit concentration of media outlets owned by few individuals or groups and their associated political ideologies [10, 11]. Concentration of media ownership has been the status quo in Uruguay [12, 13], a country located in South America that is the subject of study in this work.

IS THE MEDIA IN URUGUAY IDEOLOGICALLY BIASED?_

According to Adolfo Garcé, a political scientist, the media in Uruguay has been historically biased in political matters and closely associated to political parties until the 1950s, and gradually acquired more independence over political views ever since [14]. Following a period of dictatorship in which many media outlets were censured and closed, Uruguay re-established its democratic government in 1985, to which the media and in particular the printed press adjusted its political preferences onward to match the new ideological climate of re-stablished political parties when democracy was re-gained in the country [15]. Since 1985 then, the primary type of journalism practiced in Uruguay is 'declarative' in nature; and thus mainly characterized by reporter's citations of statements pronounced by politicians [16]. Because of this, the ideological perspective -specially in the printed press- of a media outlet could be evident by the analysis of the nature and frequency of citations derived from statements given by political figures who are aligned with the ideology of the media outlet in question. The author himself has been noticing ideological bias over the years towards a particular politician when reading the political section from one of the major newspapers in Uruguay. This prompted the author to engage on a research-based art project for the automatic detection of political ideology from text using machine learning and natural language processing algorithms.

OBJECTIVE_

The objective of the current work is to implement automated techniques to identify political ideology from text, mainly machine learning and natural language processing algorithms such as topic modeling and clustering. For the purpose of this study, text constituted 440 newspaper articles written about two major political candidates running for president of Uruguay that were published by journalists from three different media outlets during the period between June 1st to July 25th of 2019. Text also included an additional 90 news articles written on two political debates held on October 1 and November 13 of 2019. The author also included documents containing the programatic outlines being championed by each candidate and their respective political parties.

BRIEF INTRODUCTION TO THE POLITICAL CONTEXT OF URUGUAY_

Uruguay implements a presidential form of government with division of power among the executive, legislative and judiciary branches. The president, who is also the head of the state, is directly elected by the people for a five-year term. It is the president who then appoints a council of ministers for each administrative department [17]. The vice-president oversees the national legislature, which is composed of a bicameral parliament also elected by the people for a five-year term, and is composed of a 31-member senate and 99-member chamber of deputies. The last national elections were held on October (1st round) and on November (2nd round) of 2014 in which Dr. Tabarè Vàzquez -representing the leftist political party known as Frente Amplio- was elected president. The current presidential and parliamentary elections took place on October (1st round) and November (2nd round) of 2019. For this election cycle, there were eleven candidates running for president representing the following political parties:

Daniel Martinez > Frente Amplio (incumbent party)

Luis Lacalle Pou > Partido Nacional

Ernesto Talvi > Partido Colorado

Guido Manini Rios > Cabildo Abierto

Gonzalo Abella > Unidad Popular

Pablo Mieres > Partido Independiente

César Vega > Partido Ecologista Radical e Intransigente

Edgardo Novick > Partido de la Gente

Gustavo Salle > Partido Verde Animalista

Daniel Goldman > Partido Digital

Rafael Fernández > Partido de los Trabajadores

The focus of the current study took into account newspaper articles written about two major candidates running for president of Uruguay in the recently held national elections: Daniel Martínez and Luis Lacalle Pou, respectively. These politicians were leading the polls of public opinion [18] and represented political parties of contrasting ideologies (left and conservative, respectively). Newspaper articles written about Daniel Martinez and Luis Lacalle Pou dated to the time when primary elections were held in Uruguay last June 30th of 2019, and these politicians were elected to represent their political parties at the national elections on October and November of 2019.

The newspapers considered for this study were the following:

Primary Elections Period (June 1 to July 25 of 2019) - 440 news articles total:

El Observador (print & digital) https://www.elobservador.com.uy

La Repùblica (print & digital) https://www.republica.com.uy

Montevideo Portal (digital only) https://www.montevideo.com.uy/index.html

Political Debates between Daniel Martínez and Luis Lacalle Pou (October, 1 and November, 13 of 2019) - 90 news articles total:

El Observador

La Repùblica

Montevideo Portal

La Diaria (print & digital) https://ladiaria.com.uy

La Red21 (digital only) http://www.lr21.com.uy

The main corpus used as a dataset needed to train machine learning algorithms was derived from a total of 440 articles published by El Observador, La Repùblica, and Montevideo Portal in order to analyze document similarities (clustering) and to predict topics (topic modeling). Another 90 news articles published La Diaria and La Red21 in addition to El Observador, La República and Montevideo Portal on the political debate between the two political candidates previously mentioned were used instead to predict the political leaning for each newspaper based on the word context for a selected number of topics.

AUTOMATED DETECTION OF POLITICAL IDEOLOGY FROM TEXT_

In this study the author focused on news articles published by newspapers, which for many persons is the primary source of information and thus; they play a pivotal role in shaping personal and public opinion. Automatic detection of political ideology from news articles is based on the notion that journalists can modulate the reader's perception of a political topic through word choice; for instance when the author employs word usage with positive or negative connotations when referring to a political candidate or party, or by varying the credibility of the source [7]. The ideological perspective of a journalist is also often expressed in the choice of discussed topics as journalists with opposing ideologies will choose to write on different topics and make them more salient according to their views. Nonetheless, newspapers don't explicitly express their political preferences, which makes the task of detecting political ideology in news articles somewhat difficult.

Machine learning in conjunction with natural language processing algorithms have been implemented for the automated detection of political ideology from text. For instance, Elfardy et al, identified the ideological perspective of a person by using semantic features derived from the person's written texts [19]. The use of machine learning algorithms for detection of political ideology from news articles was also explored by Kulkarni and colleagues [20]. They proposed a model (based on Bayesian approach with stochastic attention units) that leveraged not only the text contained within news articles but also their titles and hyperlink structure (news articles would provide weblinks to other media sources with similar political ideology) as means to rank 59 news sources based on their predicted political ideology [20]. Gentzkow and Shapiro instead constructed an index of media's political bias that measured the similarity of news outlet's language to that of a congressional Republican or Democrat according to written text derived from Congressional Records in 2005 [5]. Their index measured the frequency of language usage that would 'sway' readers to the left or to the right on political issues; by examining the set of all phrases used by members of Congress during 2005, and identifying those phrases that were more frequently used by Democrats or Republicans. Consequently, they indexed newspapers by the degree in which they used 'politically charged' phrases in their news articles that were reminiscent of those used in political speeches by Democrat or Republican politicians [5]. The resulting index allowed the authors to compare newspapers to one another, rather than comparing them to any given standard of 'true' or 'unbiased' journalism [5]. Iyyer et al, implemented a recursive neural network for detection of ideological bias at the sentence level [21], differing from previous approaches that were based on 'bag of words' classifiers. The authors were interested in learning representations that could distinguish political bias given labeled data; with their dataset derived also from Congressional Debates during the year of 2005 [21]. Ahmed and Xing [22] developed topic models (multi-view Latent Dirichlet Allocation) capable of recognizing the ideological bias in a given document, their approach was also capable of summarizing where the bias was manifested on a topical level, and provided readers with alternate views that would help them to remain informed from different perspectives. Lazaridoul and Krestel focused on the examination of 'selection bias' in the sense of how much space a newspaper dedicates for each political party; they also examined how often politicians were mentioned and how often politicians were quoted in news articles [8].

Despite all the technical advances previously mentioned, a key question still remains to be answered and is: How the automated detection of ideological bias in news articles will eventually contribute to unbiased journalism and a more balanced coverage of political events and social issues to news readers in Uruguay? The work discussed here at least help towards this goal by providing the Uruguayan reader with an objective analysis of political journalism and its inherent bias so as to promote critical thinking and prevent the reader from potential manipulation by the written press.

TECHNICAL IMPLEMENTATION & RESULTS_

Corpora assembly_

The corpus was assembled by retrieving articles containing the name of the political candidate (either Luis Lacalle Pou or Daniel Martínez) by typing their names in the search box within the website of three newspapers: El Observador, Montevideo Portal, La República; whereas news articles on both political debates were downloaded from the website of newspapers during the same or the following days to the debates. The Python library NLTK [24] was used to estimate the size of the corpus in terms of tokens and is shown on Table 1.

Table 1. Structure of corpus used in this study in terms of number of news articles and document size in tokens for each newspaper and document of proposal for governance.

It is important to note that because the search of articles for each political candidate was conducted independently even within the same newspaper, news articles mentioning both candidates are thus shared between files: that is the same text file mentioning Luis Lacalle Pou and Daniel Martínez is shared between collections (a) El_Observador/Luis_Lacalle_Pou and (b) El_Observador/Daniel_Martínez.

The conjunction of news articles searched for 'Daniel Martínez' contained more tokens relative to the conjunction of news articles searched for 'Luis Lacalle Pou' for El Observador and Montevideo Portal newspapers but not for La República. During the time period of the study (June-1 to July-25 of 2019) La República had a considerably minor journalistic output in terms of published news articles, and consequently tokens, relative to the other two newspapers.

As part of the corpus, the author also included the programatic lines / proposals for governance for each of the political candidates [25, 26]. The proposals for governance were used as reference to compare word contexts of selected topics from news articles on political debates published by the five newspapers shown on Table 1.

Examining mentions of political pre-candidates from news articles_

When news articles are assembled together into a single text file according to their published date, the location of words of interest along a temporal line can be determined. This positional information can be displayed using dispersion plots (Figure 1 and 2) and served the purpose of investigating changes in language use over the studied time period (June 1 to July 2015 with preliminaries held on June 30 of 2019).

Figure 1. Dispersion plots for news articles searched and downloaded using the keyword 'Luis Lacalle Pou' from El Observador (top), Montevideo Portal (middle), and La República (bottom) newspapers, respectively. Each blue mark represents an instance of a word, whereas each row represents the entire corpus composed of news articles arranged into a single document according to publication date from June 1 (left) to July 25 (right) of 2019.

Figure 2. Dispersion plots for news articles searched and downloaded using the keyword 'Daniel Martínez' from El Observador (top), Montevideo Portal (middle), and La República (bottom) newspapers, respectively. Each blue mark represents an instance of a word, whereas each row represents the entire corpus composed of news articles arranged into a single document according to publication date from June 1 (left) to July 25 (right) of 2019.

From Figures 1 and 2 it can be seen that the frequency of mentions for the leading candidates Luis Lacalle Pou and Daniel Martínez was higher across the entire time period relative to their competitors within the same party (Juan Sartori, Jorge Larañaga, Enrique Antía, and Carlos Iafigliola for Partido Nacional; and Carolina Cosse, Mario