Self updating map
A core requirement of applications operating within such a computational ecosystem is the ability to discover, access and analyze subsets of large data services, as underscored by the recent doubling of the number of recognized breast cancer subtypes (Curtis et al., 2012).
Although studies as those reported in (Curtis et al., 2012), and other large-scale integrative analyses using the TCGA (Cooper et al., 2012; Setty et al., 2012; TCGA Research Network, 2008; TCGA Research Network, 2011; Zeeberg et al., 2012), themselves make use of broad datasets, their results are often the starting point for further study of the numerous biomolecular bases for tumorigenesis.
However, to realize this possibility, a continually updated road map of files in the TCGA is required.
Creation of such a road map represents a significant data modeling challenge, due to the size and fluidity of this resource: each of the 33 cancer types is instantiated in only partially overlapping sets of analytical platforms, while the number of data files available doubles approximately every 7 months.
They also present a much more interoperable and reproducible alternative to the still pervasive use of data portals. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser. Significantly, these enhancements enable the reporting of analysis results that can be fully traced to and reproduced using their source data.
Availability: A prepared dashboard, including links to source code and a SPARQL endpoint, is available at Alternatively, you can download the file locally and open with any standalone PDF reader: https://bioinformatics.oxfordjournals.org/content/29/10/1333pdf Advance Access publication April A self-updating road map of The Cancer Genome Atlas David E. However, to realize this possibility, a continually updated road map of files in the TCGA is required.
Significantly, these enhancements enable the reporting of analysis results that can be fully traced to and reproduced using their source data.Results: We developed an engine to index and annotate the TCGA files, relying exclusively on third-generation web technologies (Web 3.0).Specifically, this engine uses Java Script in conjunction with the World Wide Web Consortium’s (W3C) Resource Description Framework (RDF), and SPARQL, the query language for RDF, to capture metadata of files in the TCGA open-access HTTP directory.Specifically, this engine uses Java Script in conjunction with the World Wide Web Consortium's (W3C) Resource Description Framework (RDF), and SPARQL, the query language for RDF, to capture metadata of files in the TCGA open-access HTTP directory.
The resulting index may be queried using SPARQL, and enables filelevel provenance annotations as well as discovery of arbitrary subsets of files, based on their metadata, using web standard languages.
Simultaneous to the expansion of the TCGA, the tooling required for enabling computational ecosystems for data-driven medical genomics (Almeida, 2010) is maturing rapidly, to the point that tools operating within and providing such ecosystems are beginning to appear (Almeida et al., 2012b).