الملخص الإنجليزي
Background: Gastric cancer (GC) is the third leading cause of cancer worldwide. Although molecular biology techniques (e.g., gene expression profiling, and next- generation sequencing) have identified predictive markers, none is reliable. Thus, understanding the pathogenesis and the underlying molecular mechanism of GC is essential to identify biomarkers for improving survival rates. With the new developing technologies, such as bioinformatics and NGS and publicly available databases such as Ensemble, GDC, it is now much easier and more reliable to carry out analysis on vast numbers of patients and detect differentially expressed genes. In previous studies, long non-coding RNAs were identified as circulating biomarkers, which are believed to be candidate molecules for non-invasive diagnosis.
coding RNAs (lncRNA) that have statistical significance on survival (p<0.01). After identifying differentially expressed lncRNAs (said to be the potential biomarker candidates), choose to proceed with one lncRNA, and create its interactome to understand how this lncRNA plays a regulatory role on the protein-coding genes, through miRNAs in a specific pathway for GC.
Method: We analyzed 443 cases (414 gastric cancer patients, 29 healthy) extracted from Genomics Data Common (GDC), using the publicly available pipeline protocol, and the GDCRNA Tools, a bioconductor?s package for data analysis
(http://bioconductor.org/packages/devel/bioc/vignettes/GDCRNATools/inst/doc/GDCRNATools.html). The analysis was divided into 4 main sections: organization and differential gene expression, differentially expressed gene visualization, competing endogenous RNA network, pathway enrichment analysis, and network analysis.
Results: After the analysis was completed in 414 cases, 20,993 genes were found to be differentially expressed. Out of which, 2100 were protein-coding genes, 170 were lncRNAs, and rest were pseudogenes, etc. 120 of the lncRNAs were upregulated (Fc>1) and 50 were downregulated (Fc<-1). But all 170 lncRNAs were not considered for further study and only statistically significant ones (p<0.01) were chosen to be proceeded with. Only 5 lncRNAs were found to have a statistically significant effect on the survival: AL353622.1, AL365181.3, LINC00884, TNFRSF14-AS1 (upregulated on lower stage patients) and AC125807.2 (upregulated in higher stage-patients). The interactome analysis results have revealed that AC125807.2 lncRNA interacts with 9 key protein-coding genes (large spectrum and highest number of interactions), including CCNB2, PRKDC, BUB1B, CDC25B, MCM4, E2F2, CDK6, PLK1, and PRKDC. When pathway analysis was carried out, the results showed that the cell cycle pathway was the most affected in GC, where these lncRNAs play a decisive role through protein-coding genes. The lncRNA AC125807.2 was found to interact with these key protein-coding genes through miRNAs and is believed to play a deregulatory role, leading to GC?s pathogenesis.
Conclusion: These findings suggest that AC125807.2, AL353622.1, AL365181.3, LINC00884, and TNFRSF14-AS1 are potential biomarker candidates for GC, where AC125807.2 might have a sponge regulatory effect role on 9 different key effective protein-coding genes involved in cell cycle pathway and contribute to the GC?s pathogenesis.