Index of /literature_curation
Name Last modified Size Description
Parent Directory -
archive/ 21-Nov-2009 08:07 -
biochemical_pathways.tab 21-Nov-2009 20:05 112K
gene_association.sgd.gz 16-Nov-2009 07:50 1.2M GZIP compressed docume>
gene_literature.tab 21-Nov-2009 07:56 53M
go_protein_complex_slim.tab 21-Nov-2009 08:00 203K
go_slim_mapping.tab 21-Nov-2009 07:59 2.3M
go_terms.tab 21-Nov-2009 06:36 5.8M
interaction_data.tab 21-Nov-2009 08:07 40M
interactions_is_obsolete.tab 12-Jul-2008 07:03 24M
orf_geneontology_is_obsolete.tab 30-Oct-2007 01:30 476K
phenotype_data.tab 21-Nov-2009 07:53 6.7M
phenotypes_is_obsolete.tab 12-Jul-2008 06:54 1.8M
yeastcyc12.0.tar.200906.gz 08-Jun-2009 11:53 23M GZIP compressed docume>
The files in this directory contain information assembled by the SGD
staff. The schema and specifications for our Oracle tables are
available at:
http://db.yeastgenome.org/schema/SgdSchema.html
For more information on the Gene Ontology (GO) project, see:
http://www.geneontology.org/
Note that on 2001-06-07, file extensions were changed from .txt to
.tab to be more consistent with the rest of the files on the ftp site.
===========================================================================
Files from the Gene Ontology (GO) project to associate gene products
with the GO ontologies at SGD. All files in this section are TAB
delimited.
===========================================================================
gene_association.sgd.gz Contains all GO annotations for yeast genes (protein and RNA)
The gene_association.sgd.gz file uses the standard file format for
gene_association files of the Gene Ontology (GO) Consortium. A more
complete description of the file format is found here:
http://www.geneontology.org/doc/GO.annotation.html#file
Columns are: Contents:
1) DB - database contributing the file (always "SGD" for this file)
2) DB_Object_ID - SGDID
3) DB_Object_Symbol - see below
4) NOT (optional) - 'NOT', 'contributes_to', or 'colocalizes_with' qualifier for a GO annotation, when needed
5) GO ID - unique numeric identifier for the GO term
6) DB:Reference(|DB:Reference) - the reference associated with the GO annotation
7) Evidence - the evidence code for the GO annotation
8) With (or) From (optional) - any With or From qualifier for the GO annotation
9) Aspect - which ontology the GO term belongs in
10) DB_Object_Name(|Name) (optional) - a name for the gene product in words, e.g. 'acid phosphatase'
11) DB_Object_Synonym(|Synonym) (optional) - see below
12) DB_Object_Type - type of object annotated, e.g. gene, protein, etc.
13) taxon(|taxon) - taxonomic identifier of species encoding gene product
14) Date - date GO annotation was made
15) Assigned_by - source of the annotation (e.g. SGD, UniProtKB, YeastFunc, bioPIXIE_MEFIT)
Note on SGD nomenclature (pertaining to columns 3 and 11):
Column 3 - When a Standard Gene Name (e.g. CDC28, COX2) has been
conferred, it will be present in Column 3. When no Gene Name
has been conferred, the Systematic Name (e.g. YAL001C,
YGR116W, YAL034W-A) will be present in column 3.
Column 11 - The Systematic Name (e.g. YAL001C, YGR116W, YAL034W-A,
Q0010) will be the first name present in Column 11. Any other
names (except the Standard Name, which will be in Column 3 if
one exists), including Aliases used for the gene will also be
present in this column.
Please note that ORFs classified as 'dubious' are not included in this file, as there is currently
no experimental evidence that the gene product is produced in S.cerevisiae.
This file is updated nightly.
---
go_slim_mapping.tab Contains the mapping of all yeast gene products (protein or RNA)
to a GO-Slim term.
Columns:
1) ORF (mandatory) - Systematic name of the gene
2) Gene (optional) - Gene name, if one exists
3) SGDID (mandatory) - the SGDID, unique database identifier for the gene
4) GO_Aspect (mandatory) - which ontology: P=Process, F=Function, C=Component
5) GO Slim term (mandatory) - the name of the GO term that was selected as a GO-Slim term
6) GOID (optional) - the unique numerical identifier of the GO term
7) Feature type (mandatory) - a description of the sequence feature, such as ORF or tRNA
A GO-Slim is a subset of GO Terms that can be from the Biological
Process, Molecular Function, and Cellular Component ontologies. These
may be general, high-level GO terms that represent major branches in
each ontology, as they are in go_slim_mapping.tab, or they may be more
granular terms that are used for a specific purpose (as in
the go_protein_complex.tab file below).
To determine the correct GO-Slim term in the go_slim_mapping.tab file,
all GO annotations for a gene product are traced to a GO-Slim term.
As of December 2007, please note that the go_slim_mapping.tab file, SGD's "GO-slim Mapper tool"
(http://db.yeastgenome.org/cgi-bin/GO/goTermMapper) and the "Genome
Snapshot" (http://www.yeastgenome.org/cache/genomeSnapshot.html)
handle parentage in the same way. Annotations are
mapped to all available GO-slim terms, regardless of parentage. For
example, if something is annotated to "meiosis" it will also be
annotated to "cell cycle".
Each line contains the selected GO-slim term from the stated ontology
(P, F, or C) for that gene product. Due to the structure of GO, the
GO annotations for a gene product may map to multiple GO-Slim terms
for a single ontology. Therefore, a gene product may be associated
with multiple GO-Slim terms for a single ontology.
For those genes annotated to a non-GO-slim term, column 5 (GO_slim term) displays 'Other' and in these cases GOID column
is blank.
Annotations made by manually curated and high-throughput methods are only included in this file.
This file is updated weekly.
---
go_protein_complex_slim.tab
Contains the mapping of all yeast gene products (protein or RNA) to
the Macromolecular Complex GO-Slim term set.
Columns, separated by tabs:
1) Ontology: GO Term/GOID (mandatory)
2) /gene (optional)/ORF/SGDID/feature type/
The first column is the GO Aspect (Component) followed by the GO term
and its GOID.
The second column contains all the genes associated to the GO Term or
children of the GO Term. Multiple genes are separated by a pipe (|),
and multiple feature types are separated by a comma (,).
Note on GO Slim mapping files: A GO-Slim is a subset of GO Terms that
can be derived from the Biological Process, Molecular Function, and/or
Cellular Component ontologies. These may be general, high-level GO
terms that represent major branches in each ontology, as they are in
go_slim_mapping.tab described above, or they may be more granular
terms that are used for a specific purpose (as in the
go_protein_complex.tab file).
As of December 2007, the "Macromolecular complex terms" GO-Slim set used to
generate go_protein_complex_slim.tab, are terms that are direct children of the
GO component ontology term: macromolecular complex (GOID:32991 ) and are used to annotate S. cerevisiae gene products.
The macromolecular complex terms can indicate functional relationship among gene products
that are annotated to a particular term.
For example, gene products that are co-localized to the ribosome are
likely to play a part in protein biosynthesis, so the term "ribosome"
is part of this set. However, the component term "nucleus" is not,
because it is too broad a term to be able to imply that genes
co-localized to the nucleus are likely to share the same cellular
role. Note that multiple parentage is not resolved, so that if both a
child and parent term are in the term set, genes directly annotated to
the child will be mapped to both the child and parent terms in this
file. Also note that negative (NOT) GO annotations are not included in
this file.
Annotations made by manually curated and high-throughput methods are only included in this file.
This file is updated weekly.
---
orf_geneontology.all.tab
This file has been made obsolete.
For a file containing all GO annotations for genes (both for protein
and RNA) in SGD, please use the gene_association.sgd file described
above.
---
go_annotation.tab Contains the GO annotations for yeast genes.
This file has been made obsolete.
For a file containing all GO annotations for genes (both for protein
and RNA) in SGD, please use the gene_association.sgd file described
above.
============================================================================
Files representing data from the Phenotype-GO (Gene Ontology) section
of the SGD oracle database. All files in this section are updated
weekly and are TAB delimited.
============================================================================
go_terms.tab Contains the GO terms and their definitions.
Columns are: Contents:
1) GOID (mandatory) - the unique numerical identifer of the GO term
2) GO_Term (mandatory) - the name of the GO term
3) GO_Aspect (mandatory) - which ontology: P=Process, F=Function, C=Component
4) GO_Term_Definition - the full definition of the GO term
(optional)
---
phenotype_data.tab Contains curated phenotype data.
1) Feature Name (Mandatory) -The feature name of the gene
2) Feature Type (Mandatory) -The feature type of the gene
3) Gene Name (Optional) -The standard name of the gene
4) SGDID (Mandatory) -The SGDID of the gene
5) Reference (SGD_REF Required, PMID optional) -PMID: #### SGD_REF: #### (separated by pipe)(one reference per row)
6) Experiment Type (Mandatory) -The method used to detect and analyze the phenotype
7) Mutant Type (Mandatory) -Description of the impact of the mutation on activity of the gene product
8) Allele (Optional) -Allele name and description, if applicable
9) Strain Background (Optional) -Genetic background in which the phenotype was analyzed
10) Phenotype (Mandatory) -The feature observed and the direction of change relative to wild type
11) Chemical (Optional) -Any chemicals relevant to the phenotype
12) Condition (Optional) -Condition under which the phenotype was observed
13) Details (Optional) -Details about the phenotype
14) Reporter (Optional) -The protein(s) or RNA(s) used in an experiment to track a process
For further details about how phenotype information is recorded, please see:
http://www.yeastgenome.org/help/PhenoHelp.html
---
phenotypes_is_obsolete.tab (was phenotypes.tab)
# THIS FILE IS NO LONGER UPDATED AS OF 7/14/2008.
# THIS FILE WAS OBSOLETED ON OCTOBER 31, 2008.
# Please use the phenotype_data.tab file.
Contains phenotype data, the majority of
which is the data from the systematic deletion project. These data
also include results from the Genetic Footprinting study;
additional data from the footprinting study can be found at:
ftp://ftp.yeastgenome.org/pub/yeast/data_download/systematic_results/genetic_footprinting/
Columns are: Contents:
1) ORF (mandatory) - Systematic name of the ORF
2) Gene (optional) - Gene name, if one exists
3) SGDID (mandatory) - the SGDID, unique database identifier, for the ORF
4) Phenotype_type (mandatory) - classification of the phenotype
5) Phenotype (mandatory) - the phenotype
6) Description (optional) - a description of the type of experiment, if available
7) Reference (optional) - the unique PubMed identifer (PMID:) or
the unique SGD identifier (SGD_ref:) for a reference
---------
============================================================================
Information about manually curated papers from the SGD's Scientific
Curation staff.
============================================================================
gene_literature.tab :
Columns are : Contents:
1) PubMed ID (optional) - the unique PubMed identifer for a reference
2) citation (mandatory) - the citation for the publication, as stored in SGD
3) gene name (optional) - Gene name, if one exists
4) feature (optional) - Systematic name, if one exists
5) literature_topic (mandatory) - all associated Literature Topics of the SGD Literature Guide
relevant to this gene/feature within this paper
Multiple literature topics are separated by a '|' character.
6) SGDID (mandatory) - the SGDID, unique database identifier, for the gene/feature
Either a gene name (column 3) or a feature name, eg. systematic ORF
name, (column 4), or both, will be contained within a line (aka row).
[Note on July 10, 2002 the filename was changed from
gene_reference.tab]
---------
=============================================================================
Information from the Yeast Biochemical Pathways
===========================================================================
biochemical_pathways.tab:
Columns are: Contents:
1) biochemical pathway common name - name of the biochemical pathway, as stored in SGD
(mandatory)
2) enzyme name (optional) - name of a specific enzyme (may be single or multi subunit)
3) E.C number of reaction (optional) - Enzyme Commission identifier of the reaction, e.g. EC:1.1.1.1
4) gene name (optional) - Gene name, if one has been identified to catalyze the reaction
5) reference (optional) - if the pathway has been curated from the literature, the SGDID
of the reference (prefaced by SGD_REF:) or the Pubmed ID
of the reference (prefaced by PMID:) will be listed
yeastcyc12.0.tar.200809.gz
A compressed archive that contains all the files required to install the Yeast Biochemical Pathways using the Pathway Tools software.
These files are compatable with Pathway Tools version 12.0
More information about installation of the Yeast Biochemcial Pathways data can be found here:
http://bioinformatics.ai.sri.com/ptools/copy-patho-db.html#Method5
More information about downloading the Pathway Tools software can be found here:
http://BioCyc.org/download.shtml
More information about the Pathway Tools can be found here:
http://bioinformatics.ai.sri.com/ptools/
These files are updated monthly.
=============================================================================
Interaction data
===========================================================================
interactions_data.tab
Contains interaction data incorporated into SGD from BioGRID (http://www.thebiogrid.org/). Tab-separated columns are:
1) Feature Name (Bait) (Required) - The feature name of the gene used as the bait
2) Standard Gene Name (Bait) (Optional) - The standard gene name of the gene used as the bait
3) Feature Name (Hit) (Required) - The feature name of the gene that interacts with the bait
4) Standard Gene Name (Hit) (Optional) - The standard gene name of the gene that interacts with the bait
5) Experiment Type (Required) - A description of the experimental used to identify the interaction
6) Genetic or Physical Interaction (Required) - Indicates whether the experimental method is a genetic or physical interaction
7) Source (Required) - Lists the database source for the interaction
8) Manually curated or High-throughput (Required) - Lists whether the interaction was manually curated from a publication or added as part of a high-throughput dataset
9) Notes (Optional) - Free text field that contains additional information about the interaction
10) Phenotype (Optional) - Contains the phenotype of the interaction
11) Reference (Required) - Lists the identifiers for the reference as an SGDID (SGD_REF:) or a PubMed ID (PMID:)
12) Citation (Required) - Lists the citation for the reference
--
interactions_is_obsolete.tab (was interactions.tab)
# THIS FILE IS NO LONGER UPDATED AS OF 7/14/2008.
# THIS FILE WAS OBSOLETED ON OCTOBER 31, 2008.
# Please use the interaction_data.tab file instead.
Contains interaction data incorporated into SGD from BioGRID (http://www.thebiogrid.org/). Tab-separated columns are:
interaction_type (mandatory)
genes involved and their mutation type, in the format: ORF
(mutation_type, action), with multiples separated by a |
phenotype (optional, multiples separated by |)
description (optional)
citation (multiples separated by |)
PubMed ID (optional, multiples separated by |)
This file is updated weekly. Please note that the obsolete file interactions.tab file generated by SGD cannot be directly read in its current format by Osprey or Cytoscape, though it can be re-formatted via a Perl script. Alternatively, files provided for download at the BioGRID site can be read by both programs with smaller modifications using a program like Excel to re-order the columns. These interaction files can be downloaded from BioGRID at the following URL: http://www.thebiogrid.org/downloads.php.
orf_geneontology_is_obsolete.tab Contains a select set of GO annotations for ORFs
#This file orf_geneontology.tab has been made OBSOLETE as of 10/11/2007.
#Please refer to the gene_association.sgd file to download a complete set of GO
#annotations for S. cerevisiae or the go_slim_mapping.tab file to download the
#mappings of the S. cerevisiae gene products to GO slim terms (higher level terms).
Columns: Contents:
1) ORF (mandatory) - Systematic name of the ORF
2) Gene (optional) - Gene name, if one exists
3) Length (mandatory) - length of the ORF, in nucleotides, including introns
4) Process (mandatory) - see below
5) Function (mandatory) - see below
6) Component (mandatory) - see below
7) SGDID (mandatory) - the SGDID, unique database identifier, for the ORF
Contains one line per ORF and one selected Gene Ontology (GO)
annotation for each ontology: Biological Process (column 4), Molecular
Function (column 5), and Cellular Component (column 6). An asterisk
(*) adjacent to the GO term in any of these three columns indicates
that the gene product has more than one associated GO term in that
particular ontology, and only the term most commonly used for
annotation is shown.
This file includes only protein coding ORFs (nuclear and mitochondrial).
Please note that ORFs classified as 'dubious' are
not included in this file, as there is currently no experimental evidence
that the gene product is produced in S.cerevisiae.
Note that the Process and Function columns in the orf_descriptions.txt
and orf_descriptions.tab files are now derived from this file.
This file is NOT updated.