Metagenomic Approach: Transforming In-Silico Research for Improved Biogas Production

The complexity of the microbial communities and metabolic pathways involved in the microbiological process of biogas production is poorly understood and numerous microorganisms in the fermentation sample of the biogas plant are still unclassified or unknown. The structure and function of microbial communities and the effects of the addition of trace elements are needed to be known, to control and channel the energy sources microbes produce and to capture and store the useful byproducts or for targeted screening of novel enzymes. In this review, we discussed an emerging idea that Metagenome sequence data from a biogas-producing microbial community residing in a fermenter of a biogas plant provide the basis for a rational approach to improve the biotechnological process of biogas production. The composition and gene content of a biogasproducing consortium can be determined through metagenomic approach which allows the design of the optimal microbial community structure for any biogas plant for the significant progress in the efficacy and economic improvement of biogas production and biofertilizer of either balanced nutrition or rich in specific element for plant growth produced from the sludge of biogas plant. Biogas-producing microbial community from different production-scale biogas plants supplied with different raw materials as substrates can be analyzed by polyphasic approach to find out the best raw material composition for biogas production. The phylogenetic structure of the microbial community residing in a fermentation sample from a biogas plant can be analysed by an integrated approach using clone library sequences and metagenome sequence data obtained by 454pyrosequencing.


Introduction
Renewable energy production and waste management are currently major issues worldwide. Biogas production can be a solution for these issues as because biogas is a promising renewable energy carrier as well as its production technology combines the elimination of organic wastes. The decomposition of organic wastes by microbial community is carried out under anaerobic conditions. The composition and actions of the various microbes are coordinated by various environmental and internal factors such as substrate ingredients, temperature, pH, mixing or the geometry of the anaerobic digester. A clear understanding of the organization and behavior of the multifarious community involved in biogas production is crucial for the optimization of their performance and attainment of the stable operation of the process. Moreover, to increase the yield of biogas, a detailed
Rising energy costs and considerations on long term environmental sustainability have placed renewable energy sources in the focus of debate. The development of renewable energy resources offers the chance to replace traditional fossil fuels and can help to reduce carbon dioxide emissions (Weiland et al., 2010). An economically attractive technology to generate bioenergy is the production of biogas that is a mixture of methane (CH4) and carbon dioxide (CO2) as the main components, with small amounts of hydrogen sulfide (H2S), nitrogen (N2), hydrogen (H2), ammonia (NH3) and carbon monoxide (CO) (Ohmiya et al., 2005). The most common and widespread utilization of biogas is the production of electricity and heat by its combustion in combined heat and power units.
The process of biogas production takes place under anaerobic conditions and involves microbial decomposition of organic matter, yielding methane as the main final product of underlying metabolic pathways. Complex consortia of microorganisms are responsible for biomass decomposition and biogas production involving the stages substrate hydrolysis, acidogenesis, acetogenesis and methanogenesis. However, most of these microbes, as well as their roles in biogas production, are currently unknown. Recently, the analysis of the structure, composition and activity of microbial communities in relation to input substrates and fermentation parameters in biogas plants have become the focus of research since a better understanding of the composition and activity of the multifarious microbial community is crucial for further optimization of reactor performance and fermentation process technologies. (Ács et al., 2013) Unfortunately, the identification and measurement of microorganisms (including viruses, bacteria, archaea, fungi, and protists) in the biosphere cannot be readily achieved due to limitations in culturing methods.
However, metagenomics, a non-culture based approach, is the study of genetic material recovered directly from environmental samples which has enabled researchers to comprehensively analyze microbial communities. The development of highthroughput methods, based on Sanger sequencing of the 16 S rDNA amplicons, has opened up new avenues for such investigations (McHugh et al., 2003;Klocke et al., 2006). The mcrA gene codes for one of the key enzymes in methanogenesis, the α-subunit of methyl-coenzyme M reductase occurring uniquely in methanogens. Alterations in the organization of methanogenic communities under various conditions have been reported on the basis of this phylogenetic marker (Luton et al., 2002;. Currently the most widespread next-generation sequencing method employs 454-pyrosequencing procedures for metagenomic purposes which has been used for the characterization of biogas-producing communities Gill et al., 2006) allowing the real-time study of live consortia in various environments through identification of the members of these communities (Tyson et al., 2004). Community structure analysis of the fermentation sample revealed that Clostridia from the phylum Firmicutes is the most prevalent taxonomic class, whereas species of the order Methanomicrobiales are dominant among methanogenic Archaea and the most abundant species is Methanoculleus marisnigri (Gill et al., 2006). The genus Methanoculleus play a dominant role in hydrogenotrophic methanogenesis, Clostridia contributes to the decomposition of organic matter, Archaea play crucial role in biogas production and the Methanomicrobiales are hydrogenotrophic methanogens .
In this review, we will introduce the most abundant members of the biogas producing community as well as highlight the recent advances in the field of metagenomics for analyzing microbial communities in biogas plants. Developments in several bioinformatics approaches are also discussed in context of biogas producing microbial metagenomics that include taxonomic systems, sequence databases, and sequence-alignment tools. If we go for the taxonomic distribution of the biogas community, we would see that prokaryotes comprise the most abundant domain of the biogas producing community where the predominant systematic groups are the Bacteria and Archaea. Within the Bacteria domain, the Firmicutes phylum has been proved to be the most abundant one. The classes Clostridia and Bacilli are the majority of the Bacteria in the biogas fermenter. Members of the Clostridia (36%) and Bacilli (11%) classes, together with members of the Bacteroidia (3%), Mollicutes (3%), Gammaproteobacteria (3%) and Actinobacteria (3%) classes have been found as the majority of the identified abundant species in biogas fermenter. The identification of these strains was based on best M5nr database hits. A number of sequence reads might not exhibit homology to any of the known and sequenced microbial species. It should be acknowledged that, there can be presence of numerous unidentified microbes e.g. Candidatus, Cloacamonas, Acidaminovorans in biogas fomenters. More than 1,000 representatives of the Bacteria domain have been identified in the metagenomic database (Roland et al., 2012). Members of the above-mentioned systematic groups have also been identified in the anaerobic digestion of maize silage and silage supplemented with animal manure based on the presence or absence of cellulose degrading activity and hydrogenise enzymes Kröber et al., 2009;.

The Most Abundant Members of The Biogas Producing Food-Chain
Around 10% of the identified microbes in the biogasproducing community might belong to Archaea. In the Achaea domain, the Methanomicrobiales family provides a prevalence of the identified species. At species level, the hydrogenotrophic methanogens dominate. Relatively low representation is shown by Acetotrophic methanogens in the biogas community. Special Archaea carry out volatile organic acids, CO2 and H2 generated by the acetogens. Aceticlastic and hydrogenotrophic methanogens can be distinguished in biogas fermentors. The hydrogenotrophic Archaea are capable of reducing CO2 to CH4, H2. In the domain of the Archaea, the Methanomicrobiales order predominates in the community. The most abundant species is Methanoculleus marisnigri. This Archeon has been found in several methanogenic consortia. Methanoculleus marisnigri JR1 is the only member of the Methanoculleus genus, which has been sequenced so far and it cannot be excluded because several members of the same genus can produce the high abundance of Methanoculleusrelated reads. Besides Methanoculleus, other representatives of Methanomicrobiales can contribute to the profuse of hydrogenotrophic methanogens, e.g. Methanospirillum hungatei, Methanosphaerula palustris, Methanoregula boonei, Methanocorpusculum labreanum and Methanoplanus petrolearius. From the class of Methanococci, Methanococcus maripalidus is also a hydrogenotrophic methanogen. Among the aceticlastic methanogens, Methanosarcina acetivorans has been found to be present in a relative majority. Unidentified archaeon has also been detected. Among rice Rhizophere methanogens has also been found in the anaerobic biogas community. This species can be described as having a unique aerotolerant H2/CO2 dependent lifestyle and enzymes for carbohydrate metabolism and assimilatory sulfate reduction. (Roland et al., 2012) In GS FLX Titanium system, new genera have been identified in the taxonomic profile. These include Streptococcus, Acetivibrio, Garciella, Tissierella, Gracilibacter, Gelria, Dysgonomonas, Arcobacter etc. (Sebastian et al., 2011) Streptococcus species have also been detected in mesophilic hydrogen-producing sludge and a glucose-fed methanogenic bioreactor (Dollhopf et al., 2011;Fernandez et al., 2000). But the specific functions of the Streptococcus members dominating the bioreactor are not known yet. Sequences related to the genus Acetivibrio (Firmicutes) has been recovered from a community involved in methanogenesis utilizing cellulose under mesophilic conditions. Acetivibrio species play a role in cellulose degradation. Species of the genera Garciella, Tissierella, Gracilibacter and Gelria (all Firmicutes) are also involved in different fermentative pathways (Bae et al., 2004). A reference species of the genus Geleria, namely Gelria glutamica, was isolated from methanogenic enrichment culture which is able to grow in co-culture with a hydrogenotrophic methanogen (Gelria et al., 2002) Hydrogenotrophic methanogens were dominant in the fermentation sample analyzed in a study by (Sebastian et al., 2011). The genus Dysgonomonas belongs to the family Porphyromonadaceae of the order Bacteroidales that has been isolated from stool samples and is able to ferment glucose resulting in the production of acids (Shah et al., R.F. Mukti and S.S. Sinthee (2019) Int. J. Appl. Sci. Biotechnol. Vol 7(1): 6-11 This paper can be downloaded online at http://ijasbt.org&http://nepjol.info/index.php/IJASBT 2009) Similarly bacteria of the genus Alkaliflexus also cluster within the order Bacteroidales and represent anaerobic saccharolytic organisms (Zhilina et al., 2004) Other genera such as Arcobacter play no dominant role in degradation of polysaccharides. Methanogenic consortia involvement in the biogas production process remains unknown. Firmicutes and Methanomicrobiales play a crucial role in hydrolysis, acetogenesis and methanogenesis in anaerobic degradation of plant biomass ).
An investigation was undertaken to study the microbial community succession in a sour and healthy digester where Ion torrent next-generation sequencing (NGS)-based metagenomic approach indicated abundance of hydrolytic bacteria and exclusion of methanogens and syntrophic bacteria in sour digester of rice straw. Functional gene analysis of the study revealed higher abundance of enzymes involved in acidogenesis and lower abundance of enzymes associated with methanogenesis like Methyl coenzyme Mreductase, F420 dependent reductase and Formylmethanofuran dehydrogenase in the digester. Increased abundance of methanogens (Methanomicrobia) and genes involved in methanogenesis was observed in the restored/healthy digester highlighting revival of pH sensitive methanogenic community. (Bioresour et al., 2006)

In-Silico Approaches for Metagenomic Analysis of Biogas Producing Community
The advent of next-generation sequencing (NGS) or highthroughput sequencing has revolutionized the field of microbial ecology and brought classical environmental studies to another level. Two commonly used NGS technologies utilized to date are the 454 Life Sciences and the Illumina systems, with the ratio of usage shifting in favor of the latter recently. (Oulas et al., 2015) Clone library sequences enable high resolution phylogenetic analysis of abundant taxonomic units, whereas metagenome sequence reads are more appropriate to describe the diversity of the community, but these sequences cannot be used for selfcontained phylogenetics. Accordingly, an integrated analysis approach, considering clone library sequences and shotgun metagenome reads combines advantages and options of both sequence data types.
Metagenomics is a comparatively new field of research on natural microbial communities. It has been strongly enhanced by the new high-throughput sequencing (HTS) technologies like Roche's 454-sequencing, ABI's SOLiD or Illumina's Genome Analyzer, 16S rRNA gene clone libraries and Sanger sequencing of the 16 S rDNA amplicons, SOLiD™ (sequencing by oligo ligation and detection) technology (Applied Biosystems), CLC Bio Genomics Work Bench software, MG-RAST software package, Phylogenetic categories, BLASTX, rDNA dataset, Roche's GS FLX Titanium technology, BLAST, RDP database (Release 10.10), CARMA, Genome Sequencer (GS) FLX platform, ARB database (Rothberg et al., 2008;Mardis et al., 2008;Metzker et al., 2010) CARMA is a new software pipeline for the characterization of species composition and the genetic potential of microbial samples using short, unassembled reads. WebCARMA is a refined version of CARMA available as a web application for the taxonomic and functional classification of unassembled ultra-short reads from metagenomic communities (Gerlach et al., 2009).
The taxonomic profiles for the GS FLX and the Titanium datasets can be computed employing the CARMA pipeline. The combined dataset consisting of the GS FLX and Titanium reads can be used to identify key organisms involved in the processes of biogas production. For this purpose, all reads should be classified according to Cluster of Orthologous Groups (COG). A subset of COG entries representing (a) the process of 'polysaccharide degradation', (b) 'acetogenesis' and (c) the 'methanogenesis' step within the fermentation process can be chosen. Obtained COG results can be generated by the CARMA software. This approach can led to the identification of reads for which both functional as well as taxonomic information could be retrieved. Though not all reads classified into the selected COG categories may actually represent the pathways, this analysis can provide insights into the relevance of different taxonomic groups for the hydrolysis, acetogenesis, and methanogenesis steps in fermentation of biomass. Misallocation of reads to these processes can be due to the fact that some COG entries include enzymes involved in different, but functionally related pathways. Based on the assignment to COG entries, reads can be classified as potentially coding for enzymes involved in the degradation of complex polymers, namely cellulose, hemicelluloses, and lignin. The less-abundant genera that were missed in the previous taxonomic profile for the same community can be revealed by more detailed taxonomic analysis. The contigs can be assembled by MG-RAST server for taxonomic analysis. E-values, percentages of homology and lengths of homology can be filtered for the results. Unassigned and unidentified sequences should be ignored in this analysis. (Sebastian et al., 2011) For intuition into the biogas-producing community, environmental gene tags (EGTs) and clusters of orthologous groups of proteins (COGs) could be used by DNA sequences generated by parallel sequencing. The raw sequence reads could be assembled into contigs by using the CLC Bio Genomics Work Bench software. The generated contigs should be uploaded to the MG-RAST server for automatically normalizing, processing and evaluating. Those that could be evaluated by the quality control should be aligned to sequences stored in a number of public databases. In this way, the DNA sequences from the SOLiD™ reads could be linked to taxa and metabolic functions. The assembled contigs should be subjected to taxonomic analysis through use of the MG-RAST server. The results should be R.F. Mukti and S.S. Sinthee (2019) Int. J. Appl. Sci. Biotechnol. Vol 7(1): 6-11 This paper can be downloaded online at http://ijasbt.org&http://nepjol.info/index.php/IJASBT filtered for e-values, percentages of homology and lengths of homology. Previous studies could design to improve the understanding of microbial communities in biogas producing anaerobic digestors, based on next-generation sequencing methods, and could rely exclusively on the pyrosequencing technique.  16S rDNA could be amplified by PCR in thermo cycler. Controls containing no DNA could be used to determine whether contaminants could be amplified. Each PCR product should visualize after electrophoresis. PCR should be ligated into the plasmid vector and the hybrid vectors could be used for transformation. Transformants should be screened and randomly selected colonies should derive from each sludge sample overnight (McHugh et al., 2003).
The DNA sequence of the 16S rRNA gene has found wide application for taxonomic and phylogenetic studies. It contains hyper variable regions that can be exploited for accurate taxonomic assignments. Using the filtered pyrosequencing reads to avoid a distortion of taxonomic profiles, a BLAST homology search using the RDP database can be performed. The differing number of 16S rRNA sequences should identify in the GS FLX dataset in comparison to the findings of previous study to be explained by the data normalization step. Furthermore, the ARB database could be used instead of the RDP database. All identified 16S rRNA sequences could taxonomically be classified by means of the RDP classifier. For all taxonomic ranks except domain, the RDP classifier could be able to assign a larger fraction of 16S rRNA fragments from the Titanium dataset. Nevertheless, it has to be noted that only a low fraction of pyrosequencing reads actually contains 16S rRNA fragments. (McHugh et al., 2003;Roland et al., 2012) In brief, the workflow for metagenomic analysis of silage can be as follows: Sample Collection  DNA Extraction and Purification  Quantification of Purified DNA  Shotgun Sequencing Library Preparation  Library Quantity and Quality Checking  DNA Sequencing  Analysis of Raw Sequence Data  Quality Control Trimming and Filtering Sequence Data  Metagenome Assembly  Paired-end Read Overlap  Taxonomic Classification  Functional Annotation  Visualizing CAZy Annotation (Tennant et al., 2017)

Conclusion
In this review, we gave an overview of the field of metagenomics, important bioinformatic tools and possible workflows, accompanied by application examples of biogas surveys successfully conducted. The metagenomic analysis of biogas-producing microbial communities is a novel approach for studying the complex interaction among microbes in an environment which is important for both basic research and the practical aspects of improvement of renewable energy production from biomass. Metagenomics is a special application and poses a real challenge since the complexity of the samples requires both high throughput and long reads. It is therefore important to compare the results obtained on a similar microbial community by using different analytical approaches; this can validate the various methodologies. It should be emphasized that a contribution is also made by microbes that are unknown or undetermined in the databases. These are not available for study by any of the current methods, but the rapid increase in available genome information justifies the exploitation of novel, highthroughput genomic methods in the field of community analysis.