Oleaceae family comprises 24 genus which consists of 600 species and its members can be found all over the world. One of the first cultivated agricultural tree crops within Oleaceae family is the olive Olea europaea L which is usually used for production of table olives and edible oil. Olive is the native plant in coastal regions of Mediterranean and there are more than 86 varieties of Europea species found in Anatolia of Turkey. Turkey is known to be one of the largest producers of olive oil and the first producer of black table olive among all the countries in the world. Majority of the black table olive that produced by Turkey is the cultivar of Gemlik which represents 80% of the production. Therefore, many researches in Turkey are mainly focused on molecular and classical breeding program for Gemlik cultivar due to its economical importance.
It is essential to have the genetic studies in olive plants in order to understand the genetic mechanisms and hence to improve the quality and quantity of products. Expressed sequence tags (ESTs) which are obtained from the single-pass complementary DNA (cDNA) sequencing can provide genetic information of an organism. Since EST is generated with gene that expressed at certain stage or tissue of organism, the information on mature transcripts for coding region of genome can be shown by EST. Hence, EST databases are useful for functional studies, gene and marker discovery, and gene mapping. Currently, EST libraries for more than 40 plant species are established for providing information that is important for functional genomics studies in which the putative functions of genes can be deduced by searching for homologies with known genes.
Before this study, there were just around one thousand EST sequences which were available in GenBank database on February 2009. The EST studies for olives are obviously insufficient and lacking of sequence information may limit the genetic studies in olive plants. In this paper, two separate cDNA libraries which were constructed from young leaves and olive fruits for olive cultivar Gemlik were used to establish a rich EST collection. Clones which were obtained from the cDNA libraries were sequenced to generate ESTs. These sequences were assembled and clustered by using specific software and they were then deposited in GenBank database. Annotation of the generated EST sequences was performed using BLAST and BLAST2GO.
Method of preparing Expressed Sequence Tag (EST) database
The research material used in this study was the olive (O. europea) cultivar Gemlik (G 20/1). The methodology that involved in preparing EST database for Olea europaea was described by Ozgenturk et al. (2010).
Before constructing cDNA library, total RNA was isolated from young leaves and unripe olive fruits by using the RNA extraction kit. In this paper, RNeasy Plant Miniprep kit was used for total RNA extraction. Total RNA was treated with the Oligotex mRNA Mini Kit to obtain purified mRNA. The isolated mRNAs were collected and utilized for constructing two separate cDNA libraries. Since RNA molecules are very unstable and difficult to amplify, it is important to convert mRNAs into cDNAs which are more stable in order to analyse the information carried by mRNA molecules (Mullinax & Sorge, 2003; Tovey, 2011). The cDNA libraries can be constructed with the isolated mRNAs by using a cDNA library construction kit and the CloneMiner cDNA Library Construction Kit was used in this paper. Briefly, a double-stranded cDNA (ds cDNA) was first synthesised from mRNA by using primers provided in the kit. For the first strand synthesis, biotin-attB2-Oligo(dT) primer bound to the poly(A) tail of mRNA and the mRNA was used as the template for reverse transcription by SuperScriptTM ?? Reverse Transcriptase. The newly synthesised single-stranded cDNA (ss cDNA) served as the template for second strand synthesis by Escherichia coli DNA Polymerase ?. attB1 sequence was then incorporated to the 5’ end of ds cDNA by attB1 Adapter.
The synthesized ds cDNA was ligated into pDONR222 vector and transformed into competent E. coli strain DH5. The two cDNA libraries were plated onto LB-kanamycin agar plates. Individual colonies formed on the plates were picked into 384-well plates which containing SOB medium and incubated overnight. The cDNA libraries were then stored at -80°C after adding with glycerol (10% v/v).
Complementary DNA (cDNA) clones were randomly selected for isolating plasmid DNA with alkaline lysis method. The isolated DNA was digested and insert size was identified by performing 1% agarose gel electrophoresis. In this paper, 3840 clones were randomly selected and amplified by polymerase chain reaction (PCR) using M13 universal primers. The ABI 3730 capillary sequencer was used for automated sequencing of cDNA in order to generate ESTs.
Two separate cDNA libraries were established with mRNA extracted from young leaves and fruits respectively. The average insert sizes for the leaf and fruit cDNA libraries were respectively 1.6 kb and 1.1 kb. The leaf cDNA library consisted of 2.4 ? 106 clones and 2304 clones were selected for sequencing. Among 2.2 ? 105 clones within olive fruit cDNA library, 1536 clones were sequenced. Therefore, altogether 3840 EST sequences were generated from cloned cDNA sequencing.
In order to obtain high quality EST sequences, raw EST sequences were processed by using Phred software in which low-quality bases, vector, and adapter were removed. After processing, 106 low quality EST sequences were removed and the remaining 3734 ESTs were kept for contig assembly. Contig Assembly Program 3 (CAP3) was used to assemble leaf and fruit EST sequences into contigs. 205 contigs were generated from the assembling of 2228 leaf ESTs and 69 contigs were generated from 1506 fruit EST sequences assembly. The total numbers of contig and singleton that were established from these two libraries were 274 and 2478 respectively. Altogether 3734 ESTs and 249 high quality contigs were deposited in GenBank under the accession number GO242703 to GO246436 and EZ421546 to EZ421794 respectively.
In order to designate annotation of the generated EST sequences, BLAST was performed on the National Center for Biotechnology Information (NCBI) web server for searching gene homology. Six hundred thirty five EST’s unique gene sequences were found to have more than 80% homology with sequences of known function in the other species. Two thousand twenty four EST sequences have less than 80% homology to the expressed proteins, hypothetical proteins, putative uncharacterized proteins and unknown proteins in database. There are 1339 EST sequences that shown no homology with the sequences that exist in GenBank and about 96.9% of EST sequences that established in this study are different from that in the existing olive sequences database in NCBI. Therefore, these EST sequences can be considered as novel genes in O. europea. Besides BLAST analysis, gene ontology (GO) annotation was performed for the contig sequences generated from the ESTs by using Blast2GO. GO terms were distributed among the biological process, molecular function and cellular component categories.