Bioinformatics Databases
  1. Huge amounts and variety of collected data
    1. Types (public)
      1. Nucleotide Sequences
      2. Protein Sequences
      3. 3D Structures
      4. Enzymes and Compounds: e.g., LIGAND, Chemical compounds and reactions
      5. Sequence Motifs: e.g., PROSITE, Pfam (Protein families database of alignments and hidden Markov models)
      6. Pathways: e.g., PATHWAYS, Metabolic and regulatory pathway maps
      7. Molecular Disease: e.g., OMIN (Online Mendelian Inheritance of Man)Protein Mutations
      8. Protein Mutations: e.g., PMD (Protein Mutant Database)
      9. Gene Expressions: e.g., GEO (Gene Expression Omnibus)
      10. Gene Catalog: e.g., GENES (KEGG Genes Database)
    2. Enabled by embedded computing, control, robotics, automation
      1. High throughput sequencing
      2. Robot arms/manipulators
      3. Automated sequencing
    3. Faster computers generate more processed data, simulated data, visualized data
    4. Enabled by cheaper and higher capacity storage
      1. Price / capacity ratio falling faster than CPUs (i.e., faster than Moore's Laws)
      2. Faster computers to access and search that data
  2. Technology
    1. Data types
    2. Data formats
      1. Flat file delimited
      2. HTML
      3. XML
      4. FASTA
      5. PHYLIP
      6. PAUP
      7. MAML
      8. NEXUS
      9. FASTA+GAP
      10. MnCIF
      11. Definitions and examples
        1. http://workshop.molecularevolution.org/resources/fileformats/
    3. Data structures
    4. Disk servers
    5. File servers
    6. File systems
      1. Sequential access
      2. Random access
        1. Indexes
      3. Hashing
      4. Indexed Sequential Access (ISAM, VSAM)
    7. Problems with above
      1. Concurrent access
      2. Integrity of data
      3. Redundancy of data, lack of reuse
      4. Consistency of the data
      5. Reinventing access and search software
      6. Nonstandard formats and query methods
    8. Solution: database management systems (DBMS)
      1. Transaction processing
      2. Decision support/data mining
      3. Data repositories
      4. Data marts
      5. Data warehouses
      6. Data dictionaries
    9. Data models
      1. Flat
      2. Hierarchical
      3. Semi-structured (hybrid flat and hierarchical, e.g., XML)
      4. Network
      5. Relational
      6. Object-oriented
      7. Object-relational
      8. Deductive (like Prolog, but more)
    10. Database products
      1. Open source
        1. MySQL
        2. PostgreSQL
        3. Cloudscape
        4. Ingress
      2. Commercial
        1. Oracle
        2. IBM
          1. DB2
          2. Sybase
        3. Informix
        4. Microsoft
          1. SQL Server
          2. Access
        5. Apple - FilemakerPro
        6. Hundreds more
    11. Interfaces
      1. Command line utilities
        1. Oracle sqlplus
        2. PostreSQL psql
      2. ODBC, JDBC, etc.
      3. CGI/DBI
      4. PHP
      5. ColdFusion
      6. Many more
    12. Relational model
      1. Based on relational algebra
      2. Tables (relations)
        1. Rows and columns
        2. Records and fields
        3. Tuples and attributes
      3. Constraints on values
        1. Not null
        2. Data type
        3. User defined complex data types (ORDBMS)
        4. Foreign keys
        5. "Key" columns
        6. Foreign keys: referential integrity
      4. Schema
        1. Entity relationship diagrams
        2. Data definitions
        3. Chado schema
          1. http://www.gmod.org/schema/doc/sequence.html
        4. Ensembl schema
          1. http://www.csd.abdn.ac.uk/~gjlk/ensemblfdm/daplex_schema.shtml
          2. http://apr2005.archive.ensembl.org/Docs/schema_description.html
        5. XML (example standards)
          1. AGAVE (Architecture for Genomic Annotation, Visualization and Exchange)
            1. http://www.agavexml.org/
          2. BioML ( BIOpolymer Markup Language)
            1. http://xml.coverpages.org/bioml.html
          3. BSML (Bioinformatic Sequence Markup Language)
            1. http://www.bsml.org/
          4. CML (Chemical Markup Language)
            1. http://www.xml-cml.org/
          5. MAGE-ML (Microarray Gene Expression - Markup Language
            1. http://www.mged.org/Workgroups/MAGE/mage-ml.html
            2. Sample
              1. http://www.omg.org/cgi-bin/apps/doc?dtc/02-09-04.zip
          6. SBML (Systems Biology Markup Language)
            1. http://sbml.org/index.psp
        6. Controlled vocabulary
          1. Gene Ontology
            1. http://www.geneontology.org/index.shtml
            2. http://www.geneontology.org/ontology/gene_ontology.obo
  3. Visualization
    1. Cn3D from NCBI
      1. About
        1. http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml
      2. Tutorial
        1. http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3dtut.shtml
      3. Download
        1. ftp://ftp.ncbi.nih.gov/cn3d/
      4. Entrez Search Engine
        1. http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi
        2. Select Structure
        3. Search Structure for PTEN or hemoglobin
        4. Select hit [1D5R]
        5. Sequence browser
        6. Vew 3D Strcutre of Best Model with Cn3D Display
Back to Top