OrthoXML Schema Documentation

Table of Contents

top

Schema Document Properties

Target Namespace http://orthoXML.org/2011/
Version 0.3
Element and Attribute Namespaces
  • Global element and attribute declarations belong to this schema's target namespace.
  • By default, local element declarations belong to this schema's target namespace.
  • By default, local attribute declarations have no namespace.
Documentation This Schema defines the OrthoXML version 0.3. Author(s): Sanjit Roopra, Dave Messina, Fabian Schreiber, Thomas Schmitt, and Erik Sonnhammer. SBC - Stockholm Bioinformatics Centre. 2011. More info at http://orthoxml.org
Application Data OrthoXML Schema

Declared Namespaces

Prefix Namespace
xml http://www.w3.org/XML/1998/namespace
xs http://www.w3.org/2001/XMLSchema
ortho http://orthoXML.org/2011/
Schema Component Representation
<xs:schema version="0.3" targetNamespace="http://orthoXML.org/2011/" elementFormDefault="qualified">
...
</xs:schema>
top

Global Declarations

Element: orthoXML

Name orthoXML
Type Locally-defined complex type
Nillable no
Abstract no
Documentation The OrthoXML root element.
Attributes
origin The source program/database of the file for instance OMA or InParanoid.
Type: xs:string Use: required
version The version number of the file.
Type: xs:decimal Use: required
originVersion The version or release number of the source program/database at time the file was generated.
Type: xs:token Use: required
XML Instance Representation
<ortho:orthoXML
origin="xs:string [1] ?"
version="xs:decimal [1] ?"
originVersion="xs:token [1] ?">
<!--
Key Constraint - geneidKey
Selector - ortho:species/ortho:database/ortho:genes/ortho:gene
Field(s) - @id
-->
<!--
Key Constraint - scoreidKey
Selector - ortho:scores/ortho:scoreDef
Field(s) - @id
-->
<!--
Key Reference Constraint - geneidRef
Selector - .//ortho:geneRef
Field(s) - @id
Refers to - ortho:geneidKey
-->
<!--
Key Reference Constraint - scoreidRef
Selector - .//ortho:score
Field(s) - @id
Refers to - ortho:scoreidKey
-->
<!--
Uniqueness Constraint - uniqueGroupId
Selector - .//ortho:paralogGroup | .//ortho:orthologGroup
Field(s) - @id
-->

<ortho:notes> ortho:notes </ortho:notes> [0..1]
<ortho:species> ortho:species </ortho:species> [1..*]
<ortho:scores> ortho:scores </ortho:scores> [0..1]
<ortho:groups> ortho:groups </ortho:groups> [1]
</ortho:orthoXML>
Schema Component Representation
<xs:element name="orthoXML">
<xs:complexType>
<xs:sequence>
<xs:element name="notes" type="ortho:notes" minOccurs="0"/>
<xs:element name="species" type="ortho:species" minOccurs="1" maxOccurs="unbounded"/>
<xs:element name="scores" type="ortho:scores" minOccurs="0"/>
<xs:element name="groups" type="ortho:groups"/>
</xs:sequence>
<xs:attribute name="origin" type="xs:string" use="required"/>
<xs:attribute name="version" type="xs:decimal" use="required"/>
<xs:attribute name="originVersion" type="xs:token" use="required"/>
</xs:complexType>
<xs:key name="geneidKey">
<xs:selector xpath="ortho:species/ortho:database/ortho:genes/ortho:gene"/>
<xs:field xpath="@id"/>
</xs:key>
<xs:key name="scoreidKey">
<xs:selector xpath="ortho:scores/ortho:scoreDef"/>
<xs:field xpath="@id"/>
</xs:key>
<xs:keyref name="geneidRef" refer="ortho:geneidKey">
<xs:selector xpath=".//ortho:geneRef"/>
<xs:field xpath="@id"/>
</xs:keyref>
<xs:keyref name="scoreidRef" refer="ortho:scoreidKey">
<xs:selector xpath=".//ortho:score"/>
<xs:field xpath="@id"/>
</xs:keyref>
<xs:unique name="uniqueGroupId">
<xs:selector xpath=".//ortho:paralogGroup | .//ortho:orthologGroup"/>
<xs:field xpath="@id"/>
</xs:unique>
</xs:element>
top

Global Definitions

Complex Type: database

Name database
Abstract no
Documentation A database element contains all genes from a single database/source.
Attributes
geneLink A Uniform Resource Identifier (URI) pointing to the gene. In the simplest case one could imagine a URL which in concatenation with the gene identifier links to the website of the gene in the source database. However, how this is used depends on the source of the orthoXML file.
Type: xs:anyURI Use: optional
name Name of the database.
Type: xs:string Use: required
protLink A Uniform Resource Identifier (URI) pointing to the protein.
Type: xs:anyURI Use: optional
transcriptLink A Uniform Resource Identifier (URI) pointing to the transcript.
Type: xs:anyURI Use: optional
version Version number of the database.
Type: anySimpleType Use: required
XML Instance Representation
<...
geneLink="xs:anyURI [0..1] ?"
name="xs:string [1] ?"
protLink="xs:anyURI [0..1] ?"
transcriptLink="xs:anyURI [0..1] ?"
version="anySimpleType [1] ?">
<ortho:genes> ortho:genes </ortho:genes> [1]
</...>
Schema Component Representation
<xs:complexType name="database">
<xs:sequence>
<xs:element name="genes" type="ortho:genes"/>
</xs:sequence>
<xs:attribute name="geneLink" type="xs:anyURI"/>
<xs:attribute name="name" type="xs:string" use="required"/>
<xs:attribute name="protLink" type="xs:anyURI"/>
<xs:attribute name="transcriptLink" type="xs:anyURI"/>
<xs:attribute name="version" use="required"/>
</xs:complexType>
top

Complex Type: gene

Name gene
Abstract no
Documentation The gene element represents a single gene, protein or transcript. It is in fact a set of identifiers: one internal identifier that is used to link from geneRef elements in ortholog clusters and gene identifiers, transcript identifiers and protein identifiers to identify the molecule. The proper term for this element would therefore rather be molecule. However, as the general purpose of orthoXML is to represent orthology data for genes the term gene is used instead. Gene, protein and transcipt identifiers are optional but at least one of the three should be given. The source database of the gene is defined through the database element in which the gene element lies and the identifiers should stem from this source.
Attributes
geneId Identifier of the gene in the source database. Multiple splice forms are possible by having the same geneId more than once.
Type: xs:string Use: optional
id Internal identifier to link to the gene via the geneRef elements.
Type: xs:integer Use: required
protId Identifier of the protein in the source database.
Type: xs:string Use: optional
transcriptId Identifier of the transcript in the source database.
Type: xs:string Use: optional
XML Instance Representation
<...
geneId="xs:string [0..1] ?"
id="xs:integer [1] ?"
protId="xs:string [0..1] ?"
transcriptId="xs:string [0..1] ?"/>
Schema Component Representation
<xs:complexType name="gene">
<xs:attribute name="geneId" type="xs:string"/>
<xs:attribute name="id" type="xs:integer" use="required"/>
<xs:attribute name="protId" type="xs:string"/>
<xs:attribute name="transcriptId" type="xs:string"/>
</xs:complexType>
top

Complex Type: geneRef

Name geneRef
Abstract no
Documentation The geneRef element is a link to the gene definition under the species element. It defines the members of an ortholog or paralog group. The same gene can be referenced muliple times. The geneRef element can have multiple score elements and a notes elements as children. The notes element can for instance be used for special, ortholog-database-specific information (with InParanoid, for example, we could use it to mark the seed orthologs).
Attributes
id Internal identifier for a gene element defined under the species element.
Type: xs:integer Use: required
XML Instance Representation
<...
id="xs:integer [1] ?">
<ortho:score> ortho:score </ortho:score> [0..*]
<ortho:notes> ortho:notes </ortho:notes> [0..1]
</...>
Schema Component Representation
<xs:complexType name="geneRef">
<xs:sequence>
<xs:element name="score" type="ortho:score" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="notes" type="ortho:notes" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="id" type="xs:integer" use="required"/>
</xs:complexType>
top

Complex Type: genes

Name genes
Abstract no
Documentation A gene element represents a list of genes.
Attributes
XML Instance Representation
<...>
<ortho:gene> ortho:gene </ortho:gene> [1..*]
</...>
Schema Component Representation
<xs:complexType name="genes">
<xs:sequence>
<xs:element name="gene" type="ortho:gene" minOccurs="1" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
top

Complex Type: group

Name group
Abstract no
Documentation A group of genes or nested groups. In case of a orothologGroup element, all genes in the group or in the nested groups are orthologs to each other i.e. stem from the same gene in the last common ancester of the species. In case of a paralogGroup the genes are paralogs to each other. Subgroups within the group allow the represention of phylogenetic trees. For more details and examples see http://orthoxml.org/orthoxml_doc.html.
Attributes
id Identifier for the group in context of the resource. This attribute is not required but if your resource provides identifiers for the ortholog groups we strongly recommend to use it at least for the top level groups.
Type: xs:string Use: optional
XML Instance Representation
<...
id="xs:string [0..1] ?">
<ortho:score> ortho:score </ortho:score> [0..*]
<ortho:property> ortho:property </ortho:property> [0..*]
Start Choice [2..*] ?
<ortho:geneRef> ortho:geneRef </ortho:geneRef> [1]
<ortho:paralogGroup> ortho:group </ortho:paralogGroup> [1]
<ortho:orthologGroup> ortho:group </ortho:orthologGroup> [1]
End Choice
<ortho:notes> ortho:notes </ortho:notes> [0..1]
</...>
Schema Component Representation
<xs:complexType name="group">
<xs:sequence>
<xs:element name="score" type="ortho:score" minOccurs="0" maxOccurs="unbounded"/>
<xs:element name="property" type="ortho:property" minOccurs="0" maxOccurs="unbounded"/>
<xs:choice minOccurs="2" maxOccurs="unbounded">
<xs:element name="geneRef" type="ortho:geneRef"/>
<xs:element name="paralogGroup" type="ortho:group"/>
<xs:element name="orthologGroup" type="ortho:group"/>
</xs:choice>
<xs:element name="notes" type="ortho:notes" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="id" type="xs:string"/>
</xs:complexType>
top

Complex Type: groups

Name groups
Abstract no
Documentation Represents the list of ortholog groups. Note that the purpose of OrthoXML is to store orthology assignment hence on the top level only ortholog groups are allowed.
Attributes
XML Instance Representation
<...>
<ortho:orthologGroup> ortho:group </ortho:orthologGroup> [1..*]
</...>
Schema Component Representation
<xs:complexType name="groups">
<xs:sequence>
<xs:element name="orthologGroup" type="ortho:group" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
top

Complex Type: notes

Name notes
Abstract no
Documentation The notes element is a special element, which allows adding information that is not general enough to be part of the standard. I.e. something specific to a particular ortholog database or algorithm. Notes elements will not be validated, so any child elements are legal. Notes elements can be children of the root element orthoXML, the species element, the orthologGroup element, the paralogGroup element, or the geneRef element.
Attributes
XML Instance Representation
<...>
<!-- Mixed content -->
Allow any elements from any namespace (skip validation). [0..*]
</...>
Schema Component Representation
<xs:complexType name="notes" mixed="true">
<xs:sequence>
<xs:any processContents="skip" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
top

Complex Type: property

Name property
Abstract no
Documentation Key-value pair for group annotations, for instance statistics about the group members.
Attributes
name The key of the key-value annotation pair.
Type: xs:string Use: required
value The value of the key-value annotation pair. Optional to allow flag like annotations.
Type: anySimpleType Use: optional
XML Instance Representation
<...
name="xs:string [1] ?"
value="anySimpleType [0..1] ?"/>
Schema Component Representation
<xs:complexType name="property">
<xs:attribute name="name" type="xs:string" use="required"/>
<xs:attribute name="value"/>
</xs:complexType>
top

Complex Type: score

Name score
Abstract no
Documentation The score element gives the value of a score and links it to the scoreDef element, which defines the score. It can be child of a group or a geneRef element to allow scoring on different levels.
Attributes
id An identifier linking to the scoreDef element, which defines the score.
Type: xs:NCName Use: required
value The actual value of the score. For instance a confidence score of a group member.
Type: xs:decimal Use: required
XML Instance Representation
<...
id="xs:NCName [1] ?"
value="xs:decimal [1] ?"/>
Schema Component Representation
<xs:complexType name="score">
<xs:attribute name="id" type="xs:NCName" use="required"/>
<xs:attribute name="value" type="xs:decimal" use="required"/>
</xs:complexType>
top

Complex Type: scoreDef

Name scoreDef
Abstract no
Documentation The scoreDef element defines a score. One of the concepts of orthoXML is to be as flexible as possible but still uniformly parsable. Part of this is to allow every ortholog resource to give their own types of scores for groups or group members, which is done using score elements. Score elements can be defined to apply to either groups or geneRefs. It is possible to define multiple scores.
Attributes
id An internal identifier to link to the scoreDef from a score element.
Type: xs:NCName Use: required
desc Description of the score.
Type: anySimpleType Use: required
XML Instance Representation
<...
id="xs:NCName [1] ?"
desc="anySimpleType [1] ?"/>
Schema Component Representation
<xs:complexType name="scoreDef">
<xs:attribute name="id" type="xs:NCName" use="required"/>
<xs:attribute name="desc" use="required"/>
</xs:complexType>
top

Complex Type: scores

Name scores
Abstract no
Documentation A list of score definitions.
Attributes
XML Instance Representation
<...>
<ortho:scoreDef> ortho:scoreDef </ortho:scoreDef> [1..*]
</...>
Schema Component Representation
<xs:complexType name="scores">
<xs:sequence>
<xs:element name="scoreDef" type="ortho:scoreDef" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
top

Complex Type: species

Name species
Abstract no
Documentation The species element contains all sequences of one species.
Attributes
NCBITaxId The NCBI Taxonomy identifier of the species to identify it unambiguously.
Type: xs:integer Use: required
name The name of the species.
Type: anySimpleType Use: required
XML Instance Representation
<...
NCBITaxId="xs:integer [1] ?"
name="anySimpleType [1] ?">
<ortho:database> ortho:database </ortho:database> [1..*]
<ortho:notes> ortho:notes </ortho:notes> [0..1]
</...>
Schema Component Representation
<xs:complexType name="species">
<xs:sequence>
<xs:element name="database" type="ortho:database" minOccurs="1" maxOccurs="unbounded"/>
<xs:element name="notes" type="ortho:notes" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="NCBITaxId" type="xs:integer" use="required"/>
<xs:attribute name="name" use="required"/>
</xs:complexType>
top