Natural Language Processing of facebook messages; and their inclusion into an abstract painting


Synopsis_

In this work the author attempted to hybridize aspects of contemporary digital communication such as Facebook text messages, personal data derived from social media use, text analysis and visualization using computer algorithms, and traditional abstract painting. As result the author conceived and created an abstract painting (acrylic on cardboard) having regular mail envelopes containing excerpts from incoming and ongoing Facebook messages. These messages were the result of communication exchanges between the author and 200 Facebook friends over a 2+ years period (January-1-2017 to March-17-2019). The work aims to be interactive in nature as visitors to the exhibit can interact with the painting by opening the envelops and reading the message excerpts. This work explores the possibilities of physical abstract painting as carrier of personal digital data to give visitors an insight into the social media activities from the author who created the painting.

Creation of abstract painting with regular mail envelops_

An acrylic on cardboard painting (of dimensions: 118 x 93 cm) having regular mail envelopes in it was created during March of 2019 (Figure 1). The visual content of the painting followed the artist's abstract style and was not directly related to data derived from text analysis of Facebook messages. The visual composition of the painting stand by its own and at the same time offer a visual trigger to viewers for them to realize the paintings contain regular mail envelops in them. These envelops are the 'interactive agents' that bridge physical and digital worlds as they contain not regular letters but excerpts derived from the digital exchange the author held with Facebook friends.

The painting have 20 mail envelops in it and thus provided the author with the opportunity to share with the audience 20 different conversational topics that took place via Facebook text messaging.

Figure 1. Acrylic on cardboard painting containing regular envelopes. The artwork harbors inside each envelop a printed excerpt from Facebook text messages the author exchanged with 200 hundred friends over a 2 years period.

Natural Language Processing of Facebook's text messages_

In order for the author to curate/select interesting conversational topics he has held with friends in Facebook over a two-year-period, he recurred to text analysis using Natural Language Processing algorithms written in Python. For this, the author manually created two text corpuses: one containing all outgoing Facebook's text messages from the author to his friends; an another containing all incoming Facebook's text messages from friends to the author (Figure 2). The text files contained messages written in English and Spanish. References to weblinks, email addresses, or phone numbers were removed from the corpus of text messages and thus any posterior analysis.

Figure 2. Sketch depicting the overall approach to curate interesting texts from Facebook messages exchange between the author and his friends on the social media platform. Two corpus texts were created: one containing outgoing messages from the author to 200 friends; whereas the other corpus contained incoming messages from 200 friends to the author. Both corpus were compared and analyzed using Natural Language Processing (NLP) and Complex Network Analysis (CNA) in Python. The insight derived from this approach guided the curation/selection of interesting text excerpts to place inside each mail envelope associated with the painting shown on Figure 1.

The Python's library Natural Language Toolkit (NLTK) was used to perform text-data-mining on the created text corpuses and to extract interesting insights on the topics written and discussed between the author and his friends. These insights then were considered as the 'conversational topics' to be included in printed format to be placed inside the mail envelops on the painting.

The corpus assembled from outgoing Facebook messages from the author to friends was 1.42 times larger than the corpus text assembled from incoming messages. Despite this fact, the lexical richness of incoming text messages from friends to author was higher than outgoing text messages from author to friends. Lexical richness is a measure of unique words used/written relative to the total size of the corpus text.

Text lengths:

(OuTe Corpus) Outgoing texts: 47,516 vocabulary items (tokens)

(InTe Corpus) Incoming texts: 33,390 vocabulary items (tokens)

Lexical Richness:

OuTe: 0.20 (each world is repeated 5 times on average)

InTe: 0.26 (each word is repeated 4 times on average)

When identifying the top 100 most commonly used words in each corpus text, their cumulative frequency graphs showed that they constituted 42.5% of the outgoing text corpus, and 39.9% of the incoming text corpus, respectively (Figure 3).

Figure 3a. Cumulative frequency graph for top 100 commonly used words from corpus text containing outgoing facebook messages from the author to his friends.

Figure 3b. Cumulative frequency graph for top 100 commonly used words from corpus text containing incoming facebook messages from friends to the author.

Although most of the frequently used words shown on figure 3 are words related to the inner workings of English and Spanish language (personal pronouns and articles for example), the author found 'work' and 'tango' interesting words to analyze because they were shared among the two corpus texts. It is interesting to note that whereas the author frequently mentioned words such as 'art', 'video', and 'media', these words were not frequently mentioned in incoming Facebook texts from friends.

Top_100_words:

OutTe Corpus: > 'work' (102 occurrences), 'tango' (187 occurrences), 'art' (92 occurrences), 'video' (71 occurrences), and 'media' (61 occurrences)

InTe Corpus: includes > 'work' (53 occurrences) and 'tango' (52 occurrences)

Since there are 20 mail envelopes for each of both paintings (shown on Figure 1), the author needed to find 20 interesting words in total that were shared among the two text corpuses. Because 'work' and 'tango' were initially found within the top 100 most frequently mentioned words, an additional 18 words of interest shared among the incoming and outgoing text corpuses needed to be found. For this reason, the author focused his attention to (1) words that were longer than 7 characters and had been mentioned more than 7 times; (2) collocation of words; and (3) words ending with 'ing'.

(OuTe) Words from OUTGOING Facebook Messages written BY AUTHOR to friends:

['(practice', 'Argentina', 'Diciembre', 'Facebook', 'Gracias!', 'Holidays!', 'Jusleine', 'Leonardo', 'Leonardo,', 'Montevideo', 'Princeton', 'Richard,', 'Saludos!', 'University', 'University.', 'algorithm', 'alternative', 'apologize', 'articulo', 'artificial', 'artistas', 'artistic', 'artworks', 'artículos', 'audiovisual', 'available', 'background', 'building', 'buscando', 'collaborate', 'collection', 'community', 'computer', 'consulado', 'contaminación', 'daughter', 'different', 'electronic', 'electronica', 'entonces', 'entrance', 'escribir', 'festival', 'following', 'gracias!', 'gracias.', 'haciendo', 'included', 'inspired', 'interesa', 'interesante', 'interest', 'interested', 'interesting', 'intersection', 'learning', 'material', 'mensaje.', 'message.', 'mientras', 'milonga.', 'milongas', 'multimedia', 'opportunity', 'organizar', 'original', 'performance', 'performance)', 'performance.', 'practica', 'practice', 'preguntar', 'promotional', 'propuesta', 'proyecto', 'realizar', 'regards,', 'remember', 'research', 'response.', 'resulting', 'saludos!', 'shooting', 'something', 'soundtrack', 'speakers', 'surrealistic', 'technology', 'thinking', 'together', 'tomorrow', 'tonight.', 'trabajar', 'traditional', 'upcoming', 'visualization']

(InTe) Words from INCOMING Facebook Messages written BY FRIENDS to author:

['Thursday', 'actually', 'alternative', 'available', 'different', 'electronic', 'entiendo', 'everything', 'festival', 'haciendo', 'important', 'interesting', 'learning', 'material', 'performance', 'practice', 'probably', 'proyecto', 'recording', 'schedule', 'something', 'thinking', 'tomorrow']

Shared words between OuTe and InTe are (with interesting words in bold):

alternative - available - different - electronic - festival - haciendo- interesting

learning - material - performance - practice - proyecto - something - thinking - tomorrow

A collocation is a sequence of words that occur together unusually often, and are characteristically resistant to substitutions with words that have similar senses. Clear examples from results below include the name of cities as collocations, such as Hong Kong, Buenos Aires, and New York.

Collocations of words in OuTe:

Hong Kong; message finds; would like; new media; gracias por; Kind

regards,; short film; media art; Happy Holidays!; creates

surrealistic; Martin Calvino; machine learning; two weeks; regards,

Martin; resulting images; Buenos Aires; surrealistic images;

promotional material; so. Happy; computer algorithm

Collocations of words in InTe:

Hong Kong; Hola Martín,; New York; Hola Martín!; muchas gracias;

Gracias por; gracias por; Muchas gracias; machine learning;

alternative room; Hola Martín.; get back; would love; Hey Martin,;

little bit; Hola Martin; right now.; 21.30 hs.y; hs.y sabados; Looking

forward

Interesting shared collocations:

Hong Kong - machine learning

The author was interested in analyzing the use of verbs or words ending with 'ing' and how their frequency may change in outgoing versus incoming Facebook messages:

Words in OuTe which ends with 'ing' and were mentioned more than 7 times:

['DJing', 'asking', 'being', 'bring', 'building', 'coming', 'dancing', 'doing', 'during', 'following', 'going', 'interesting', 'learning', 'leaving', 'letting', 'living', 'looking', 'morning', 'playing', 'resulting', 'sending', 'shooting', 'something', 'taking', 'talking', 'thing', 'thinking', 'trying', 'upcoming', 'using', 'working', 'writing']

Words in InTe which ends with 'ing' and were mentioned more than 7 times:

['being', 'dancing', 'doing', 'everything', 'getting', 'going', 'interesting', 'learning', 'looking', 'meeting', 'recording', 'something', 'talking', 'thinking', 'working']

Shared words between OuTe & InTe ending with '-ing':

being - dancing - doing - going - interesting - learning - looking - something - talking - thinking - working

Examination of shared-words contexts between incoming and outgoing Facebook messages_

A concordance view shows every occurrence of a given word and its context. This allows for comparing the context in which words that are shared between incoming and outgoing Facebook messages were written. Based on the text analysis previously shown, the author has identified 20 words that were shared between incoming and outgoing Facebook messages that are an interesting case for a concordance. These words and their contexts will be printed and included in the painting shown on Figure 1 by placing them inside the envelopes (20 envelopes - 20 shared words - 20 concordances) (Figure 4). Based on the above, the selected words are:

work - tango - alternative - electronic - festival - performance - proyecto - Hong Kong - machine learning - being - dancing - doing - going - interesting - learning - looking - something - talking - thinking - working

Figure 4. A print out of concordance for the word 'work' to be placed into regular mail envelopes associated with an abstract painting.

Let's take a look at the concordance of the word 'work':

Outgoing messages: 25 out of 102 matches displayed

. I left you a phone message at your work in Princeton. I wanted to reach out

to possible implement collaborative work Hi Keiichi, thanks for your kind wor

hi, thanks for your kind words. Your work is also very interesting! Perhaps we

program! Thank you so much for your work and help! sure, no rush. I am thinki

itional dancers (female and male) to work along with you? Hi Santiago and Beat

st part comes: that is to submit the work to several Film_Festivals in the Ind

I am confident you guys will make it work on the spot with improvisation. Alre

does Mondays and Wednesdays evenings work with you? The short film will probab

program! Thank you so much for your work and help! here is the written essay

st part comes: that is to submit the work to several Film Festivals in the Ind

stead of the studio and you guys can work it out as best accommodate this sure

the evening Jusleine just get out of work at 6PM and lives in NJ at what time

earsal ok, what days besides Sundays work with you? Female dancer is Jusleine

do it? Sundays are Mondays (anytime) work best for me. because it is on the su

-provoking research, commentary, and work relating to current discourse and em

ht y de ahi la conexión) Ryota! your work is amazing... Ahi te mande mensaje w

ch other any other time! How is your work going? Kind regards Muchas gracias M

elligence. It occurred to me that my work may be of your interest to be featur

the other artists. Will study their work Hi Christiane, happy new year! I hop

sking me to extend the context of my work and include references to artists wo

f you know more artists whose visual work pertaining DNA has been influential

st part comes: that is to submit the work to several Film Festivals in the Ind

about possibly joining Montclair to work at the intersection of science and m

e delayed. I will stay for a year to work on artificial intelligence and art H

ideo as well. Does sometime in April work for you? I am leaving to Hong Kong i

Incoming messages: 25 out of 55 matches displayed

it allows it in ur own website might work no soundcloud doesnt work maybe a gi

site might work no soundcloud doesnt work maybe a github repo? and provide the

si po, christian es muy bueno a ver work work work ahora estoy terminando de

o, christian es muy bueno a ver work work work ahora estoy terminando de mezcl

ristian es muy bueno a ver work work work ahora estoy terminando de mezclar el

ays & times, and I’ll see what I can work out. Also, let me know how much time

But if we can't communicate i can't work with you on this Please always commu

have time to discuss this now. Im a work too Or 6 to 7:30 At the studio. Make

8pm. Not sure about after I usually work in Newark till 3pm My calendar is th

oing to be there for an entire year? Work is going well but there is much of i

in, I think there is a lot of visual work referencing DNA but more interesting

idency sounds amazing - what type of work are you doing there? Congratulations

Ok, let's do it!!! It probably would work better. Put your request on FB. To b

t now. Yes, that is the idea. Yes, I work on it everyday. I am also creating a

e come down then thanks! Yes we will work on it! Ok. We know the tango very we

se let me know if any of these times work for you. Hi Martin! I saw your post

son you are looking for. I currently work as a paraprofessional at Irving in o

009. Nice ad it is B&W, classy, nice work you did - this is three tango we did

rtist Gona do 1060 build in card man Work with chinese manufacture I got my ow

n? Hugs! Now give me some dates that work for you for djing and I check the ca

hone etc I have gmail yes that would work oh the file is too big. do you have

ethnomusicologist, Alan Lomax, whose work inspired Marco’s Master’s Thesis, Ge

of a Rhythm (2018), which remains a work in progress. In Genealogy, Marco doc

instrument?” Your Nyu email doesn’t work What’s your email Espanish o inglish

d sensibilities are displayed in her work on over 30 films, installations, vid

The above text fragments show that a concordance on the word 'work' really provided many other words occurring just before and after the occurrence of 'work', providing a valuable context in which the word was used across many instances in the corpus texts, facilitating the interpretation in its differences in use among outgoing and incoming Facebook text messages. What other words appear in the text corpuses with similar range of contexts to 'work'?

Words in OutTe with similar range of contexts to 'work':

you - milonga - put - this - do - make - wife - tango - dj - email - idea - message - i

daughter - time - be - website - get - way - phd

Words in InTe with similar range of contexts to 'work':

time - works - do - am - post - you - nyc - leave - life - is - it - said - we - be - other - u - recorded - person - like - growth

Let's now take the word 'time' and conduct a concordance for outgoing Facebook text messages as test case to identify indeed that they share a similar range of contexts with 'work' in its respective text corpus:

Displaying 25 of 79 matches in OuTe:

r to the library there often, SO any time during 2019 would in principle work.

ple work. Would you like we set up a time to visit the lab and you can explain

tion? I am actively looking for full time employment right now Hi Sharon! Mart

ds, MC Hi Santiago, for the practice time at Nocturne I suggest you guys think

most. Do you want to have a practice time for dancing the soundtracks? here is

interested in participating and your time availability during next week or the

work at 6PM and lives in NJ at what time can you do it tonight ok, I thought

be nearby let us know so we can have time for talking about the film I see I w

came back to USA before the expected time since Tomas came to Boston for a 5 m

nks! please confirm me again at what time should I be there. I want to be with

should I be there. I want to be with time to set up ok, that's fine too. Great

enlivening him at NYC milongas each time I see you dancing on the floor + all

ses with him before? for us is first time thanks! great! let me know? hey Batt

le said yes and then they don't have time or they don't want to do it. I may n

nt are super small I better spend my time dancing when not working Super, let

ed artists out there. It will take a time for me as well as many applications

king a few classes with Naveira this time of year. I was wandering why don’t y

ow wow! next book? Prof told me next time I should prepare choreography to cut

e intended hey Batt, thanks for your time last Wed at TC. Please find the firs

o promote your book instead, so your time and effort can be compensated that w

me you guys met! Great! I hope next time I can catch up with you guys at a mi

thanks for these. I had a fantastic time just see your message now! I don't h

ter with me How many hours? Starting time to finish time? I can do that Great!

about tango, let me know. If I have time I would like to create two films in

de a resume/CV with bio? What's your time availability this week? for short in

When extending this procedure to all of the 20 words selected that were shared among outgoing and incoming Facebook text messages, the author could identify a group of words that shared a similar range of contexts based on concordance (Figure 5). Interestingly these words were not always shared between outgoing and incoming text messages, which prompted the author to study their relationship to the 20 selected words by conducting Complex Network Analysis (CNA).

Figure 5. Differences between OuTe (outgoing Facebook text messages) and InTe (incoming Facebook text messages) in the number of words that shard a similar range of contexts (based on concordance) to the 20 selected words (shown on the Y-axis).

Complex Network Analysis of words with similar range of contexts_

Based on data shown in Figure 5, the author proceeded to explore the quantitative relationships in networks created by connecting each of the 20 selected words to their corresponding words that shared a similar range of contexts on the concordance analysis. The author manually created two CSV (comma-separated values) files, one for OuTe and InTe respectively. These files contained the data shown on Figure 5, and they were analyzed and visualized using the Python's library NetworkX (Figure 6).

Figure 6a. Image depicting the layout network of OuTe data shown on Figure 5, which was created with NetworkX and visualized with pygraphviz module (Graphviz).

Figure 6b. Image depicting the layout network of InTe data shown on Figure 5, which was created with NetworkX and visualized with pygraphviz module (Graphviz).

From the images shown on Figure 6 it can be seen that the topology of the networks are different between OuTe and InTe, revealing (in visual format) differences in the use of words between the author and his 200 Facebook friends. For instance, the use of the words 'work' and 'working' in OuTe and InTe displayed quite different relationships with words that shared a similar range of contexts.

Because of the less-than-ideal visual layout of NetworkX and Graphviz, the author imported the networks created in NetworkX to Gephi, which can produce publication-quality networks layout. Furthermore, Gephi provides a set of tools to customize the network layout visualization according to 'degrees' (the number of immediate neighbors - adjacent nodes), and 'community structure' (coloring of nodes based on how they form tight groups called communities). The resulting networks layout are shown on Figure 7.

Figure 7a. Image depicting the layout network of OuTe data shown on Figure 5 and Figure 6, which was created with NetworkX and visualized with Gephi. The size of nodes is proportional to their 'degree' (number of adjacent nodes). Nodes were colored according to their 'community structure'. Nodes share the same color if they belong to the same group/community.

Figure 7b. Image depicting the layout network of InTe data shown on Figure 5 and Figure 6, which was created with NetworkX and visualized with Gephi. The size of nodes is proportional to their 'degree' (number of adjacent nodes). Nodes were colored according to their 'community structure'. Nodes share the same color if they belong to the same group/community. Coloring of nodes from figure 7a does not relate to figure 7b.

From the new network visualizations created in Gephi, differences in network topology between OuTe and InTe are even more evident; which the author finds very interesting. Differences in node size between figure 7a and figure 7b are evident, specially for the words 'something', 'learning', and 'alternative'. Community structures are also different between OuTe and InTe. For instance, the words shown on figure 5 'interesting' and 'going' form a community in figure 7a but 'interesting' forms a community with not only 'going' but with 'dancing' as well.

Discussion_

In this work the coalescence of two very different processes, that is plastic and algorithmic thinking, converged into a physical abstract painting. The painting being the physical agent conveying a digital message. This message reveals personal information from the author who created the painting, by given viewers the opportunity to read the author's digital exchange with 200 Facebook friends over a 2+ year period. Curation of written excerpts from this digital exchange was facilitated and guided by the utilization of algorithms inherent to the fields of Natural Language Processing and Complex Network Analysis that were implemented by the author for this project. Thus, a plastic idea (the painting with envelops) was enhanced with an algorithmic one (text analysis and visualization).

References_

https://www.nltk.org

https://networkx.github.io

http://www.graphviz.org

https://gephi.org

Supplemental Information_

Here it is the concordance analysis for the rest of the selected words:

INCOMING Facebook Messages written BY FRIENDS to author

Displaying 25 of 66 matches:

lavor your life with a little bit of tango Tango dance therapy for individuals

your life with a little bit of tango Tango dance therapy for individuals and co

therapy for individuals and couples Tango dance evenings and private lessons.

ompania es Italina y se llama Regina tango shoes Te mande Las fotos si alguna v

ved to New York where he founded his tango band Tango Meditarraneo and lately L

York where he founded his tango band Tango Meditarraneo and lately Los Peores d

ditarraneo and lately Los Peores del Tango and has collaborated with various ar

g seriously in karate, he discovered tango and soon knew that his life would ne

he same. He diligently commenced his tango studies, learning from tango masters

ced his tango studies, learning from tango masters and milongueros from Argenti

, adding that he strongly recommends tango for everyone, of all ages. Here ya g

ery exciting: Martin tu relación con tango es muy fuerte correcto? Conoces al c

fident it will be a good spontaneous tango moment Wednesday! It’s all about the

ging part lol! No problem! As far as tango practice goes, Beata & I