Archives

  • 2018-07
  • 2018-10
  • 2018-11
  • 2019-04
  • 2019-05
  • 2019-06
  • 2019-07
  • 2019-08
  • 2019-09
  • 2019-10
  • 2019-11
  • 2019-12
  • 2020-01
  • 2020-02
  • 2020-03
  • 2020-04
  • 2020-05
  • 2020-06
  • 2020-07
  • 2020-08
  • 2020-09
  • 2020-10
  • 2020-11
  • 2020-12
  • 2021-01
  • 2021-02
  • 2021-03
  • 2021-04
  • 2021-05
  • 2021-06
  • 2021-07
  • 2021-08
  • 2021-09
  • 2021-10
  • 2021-11
  • 2021-12
  • 2022-01
  • 2022-02
  • 2022-03
  • 2022-04
  • 2022-05
  • 2022-06
  • 2022-07
  • 2022-08
  • 2022-09
  • 2022-10
  • 2022-11
  • 2022-12
  • 2023-01
  • 2023-02
  • 2023-03
  • 2023-04
  • 2023-05
  • 2023-06
  • 2023-07
  • 2023-08
  • 2023-09
  • 2023-10
  • 2023-11
  • 2023-12
  • 2024-01
  • 2024-02
  • 2024-03
  • 2024-04
  • 2024-05
  • The identification of new stem cell markers characteristic o

    2018-11-06

    The identification of new stem cell markers characteristic of a phenotypic subset or experimental state remains a major a01 research goal. Researchers may wish to assess the uniqueness of an expression profile before embarking on experiments that rely on a reporter gene or antibody to select for a particular stem cell subset. Genes that are highly novel generally have few publications, and while data on these genes is likely to be collected in large-scale genomics datasets, finding and assessing this information can be challenging. Web-based tools such as BioGPS (Wu et al., 2009) or TiGER (Liu et al., 2008) do exist for rapid querying of single gene profiles across a tissue atlas; these tap into a common desire of researchers to quickly assess the expression of a single gene or a small gene set, but they lack relevant stem cell samples. A handful of databases, such as the mouse embryonic stem cell database FunGenEs (Schulz et al., 2009) and the haematopoiesis database SCDb (Hackney et al., 2002), focus on specialist stem cell datasets designed for consortiums around a specific area of stem cell biology. StemBase (Sandie et al., 2009) provides the most comprehensive mouse and human microarray collection focussed on stem cells, but the search terms are dataset-centric rather than gene-centric, and it can be difficult to use without explicit guidance or training. StemCellDB (Mallon et al., 2013) is a recently published expression database focussed on iPSC and ESC, hosting exemplary in-house generated data on highly curated stem cell lines. While meeting the community requirements for exemplary expression datasets on pluripotent stem cells, it lacks the breadth of experimental data available in the public domain and has limited visualisation functionality. Indeed, user interfaces for most existing gene expression data repositories are designed for bioinformaticians, not biologists (Pavelin et al., 2012). Many focus on data storage, and bioinformatics expertise is required to pull out relevant datasets systematically and analyse them appropriately. The task of identification, downloading, normalisation and analysis of the relevant stem cell datasets offers a significant barrier to this kind of query. Even in databases which offer ‘tissue-signatures’ that may include stem cell experiments, such as The Gene Expression Barcode (McCall et al., 2011), or the Human Gene Expression Atlas at EBI (Lukk et al., 2010), unwieldy search terms can make datasets of interest difficult to identify. This difficulty in filtering the large number of datasets for relevance and quality acts as a barrier for many stem cell biologists, thus the available expression resources are effectively underutilised. Stemformatics.org is an online data portal and a collection of visualisation tools, designed to help stem cell biologists identify and assess relevant datasets, gene sets and pathways. It addresses many of the problems identified above by hosting a growing collection of manually-curated, high-quality public datasets and providing an intuitive biology-centric workflow to assist researchers access gene profiles quickly. The Stemformatics target audience is a stem cell biologist with minimal bioinformatics background and focuses on easy to interpret views of the data using interactive graphs and heat maps. The site provides all data in downloadable formats which can be readily opened by most common desktop spreadsheet programs, as well as a translation feature to assist users who wish to run more sophisticated analyses using external software such as GenePattern (Reich et al., 2006; Kuehn et al., 2008) or MeV (Howe et al., 2011). Stemformatics supports some basic analysis features, including sample comparison and identification of correlated gene patterns. Flexible gene annotation features include the ability to create, manipulate and analyse private gene lists, and an integrated gene annotation function to help predict cell-surface proteins or membership in relevant pathways. Its biology-centric philosophy means that external tools and resources can be accessed quickly using common queries as the starting point, with the application automatically transforming the data into the required formats as needed. The resources described in this manuscript are available at www.stemformatics.org.