Procedurally generated artworks based on multiple sequence alignment of orthologous gene copies

I present here a novel approach to the procedural generation of artworks in series based on multiple sequence alignment of orthologous gene copies. In the strategy developed, I assigned to each of the four nucleotides present in a string of DNA (A, G, C, T) a specific artwork I previously created. All four artworks shared the same dimensions (825 X 315 pixels), with each pixel representing a single nucleotide in the string of DNA. This resulted in the visual representation of 825 nucleotides post-alignment.

The syntenic gene sequences selected corresponded to the Vernalization 1 gene (VRN1) from four different grass species: Brachypodium dystachyon, Brachypodium stacei, Oryza sativa, and Zea mays respectively. CDS sequences were downloaded from Phytozome and aligned using Clustal multiple sequence alignment. The total length of the sequence alignment was 852 base pairs, which means that an artwork of 825 pixels wide accurately represented almost the entire length of the coding sequence for the genes of interest.

Visual re-construction of VRN1 CDS post-alignment_

Each of the VRN1 sequences was visually re-constructed by assigning each nucleotide to a different artwork:

nucleotide A: artwork #1 [Atardecer_Geometrico_MartinCalvino_2016]

nucleotide G: artwork #2 [Etereo_MartinCalvino_2016]

nucleotide C: artwork #3 [Interferencia_MartinCalvino_2016]

nucleotide T: artwork #4 [Sequence_MartinCalvino_c.2016]

Re-contruction was procedurally generated by assigning each based pair within the DNA string to a column of one pixel wide within the respective artwork. This meant that an 825 pixels wide artwork would exactly represent 825 base pairs in a given CDS.

Artworks included were as follow:

The newly re-constructed artwork was then created by assigning a column from artwork #1 each time there was an A on the DNA string, a column from artwork #2 each time there was a G on the DNA string, a column from artwork #3 each time there was a C on the DNA string, and a column from artwork #4 each time there was a T on the DNA string.

For reference purposes, I've included below the multiple sequence alignment:

The resulting artworks were true to data shown in the alignment and thus visually represented differences in nucleotide composition caused by the evolutionary divergence of the gene in the four grass species mentioned.

Brachypodium dystachyon VRN1 artwork (Bradi1g08340.1 CDS)_

Brachypodium stacei VRN1 artwork (Brast02G311100.1 CDS)_

Brachypodium stacei VRN1 artwork (Brast06G240300.1 CDS)_

Rice VRN1 artwork (LOC_Os03g54160.1 CDS)_

Maize VRN1 artwork (GRMZM2G032339_T02 CDS)_

Maize VRN1 artwork (GRMZM2G553379_T07 CDS)_

Nucleotide differences toward the 3'end of the CDS from the homologous copies of VRN1 in Brachyposium stacei and maize are visually evident in the artworks presented.

I've demonstrated a novel approach to artwork creation that is not only inspired but based on nucleotide divergence at syntenic regions of closely related grass species. This approach will definitely help in my ongoing efforts to develop a set of tools and techniques to support ARTE GAGAISTA as new form of creative expression. GAGAISMO, as I've previously defined it, is the use of genome data as raw material for abstract and geometric expressionism.

Future work_

This technique can be applied to any group of syntenic loci across evolutionary related species (not only plants) as means to use nucleotide divergence as creative force in the composition of artwork series. Similarly, divergence at the protein level can be harnessed for artwork creation when a unique artwork is assigned to a given amino acid (this means 20 artworks for 20 amino acids). Furthermore, artwork series composition can be created directly from multiple sequence alignments without requirement of synteny.

Updated April 8, 2018_

As means to prove how extensible this novel approach is, I repeated the process previously explained for VRN1 gene and applied it to CONSTANS-like 14 related sequences derived from rice, millet, sorghum and maize. Their syntenic relationship had previously been shown in my work published at Genome Biology & Evolution ( Based on multiple sequence alignment of four CDS sequences ( I created artworks that reflect on the evolutionary divergence of their nucleotide sequences post-alignment. The for artworks used as precursor to assemble the resulting visual composition representing each sequence post-alignment are not shown.

Sevir.6G233400.1 CDS_

Sobic.007G189800.1 CDS_

GRMZM2G159996_T01 CDS_

LOC_Os08g42440.1 CDS_

The length of the alignment comprised of 1,583 base pairs and the wide dimension of the paintings comprised 1,200 pixels; this means that the painting captured 75.8% of the aligned sequences and all the associated nucleotide divergence within the first 1,200 base pairs, more than enough to create subtle visual difference among artwork series. Close inspection to the paintings will reveal those visual differences.