From: David Bird

Hi Paul,
I wouldn't get too hung-up over "jurisdiction". The reality is that C. elegans community is much larger and much greater users of gene nomenclature, and in reality, they drive the practical usage. There is an established nomenclature (the paper that Don Riddle and I had in J. Nematol in 1994) which attempts to codify nomenclature for non-C. elegans species, and that has worked reasonably well, with a number of influential labs using the general guidelines for parasitic species. SON also has an ad hoc committee on nomenclature (of which I am chair), but that committee has been fairly quiet (although I get an occasional e-mail for advice on nomenclature; maybe 5/year). I agree that the "parallel" systems in elegans and briggsae is nuts (did this come from Dave Baillie's lab per chance?), and need to be redressed.

In general, the SON nomenclature copies the C. elegans rules as much as possible, with the major difference in how alleles are handled, because we don't have a reference strain for each species (i.e., an N2 equivalent). We assign gene names based on phenotype as much as possible (e.g, sec for secreted; col for collagen, etc). Because there is no easy way to assign orthology between different nematode species (and obviously C. elegans and C. briggsae will be the closest exception, and other will join as they are done), we don't worry about that. So col-1 from a particular parasite has no relationship to col-1 from another (other than they both should be collagen genes). On the other hand, ama-1 from another nematode species almost certainly is THE orthologue of ama-1 in C. elegans. I stress that one can't be too worried about orthology (and actually don't understand your philosophical point "There can multiple orthologs for a gene"; that is not possible).

As you also propose, we use a species identifier. This has proven to be our most difficult and contentious point. Our 2-letter code only allows for 676 species, and obviously there are millions. But is is worse than that. There are more than 60 species of Meloidogyne, and we can only accommodate 26 (and egos have been bruised over M. aranaria and M. artellia).

[As an aside, my prediction is that there will be up to 10 new nematode genomes in progress this time next year, including a number of parasites].

Specific comments

The proposal:

1.  Orthologs will be given the same name but with a species prefix.  For example, cb-tra-1 is the C. briggsae ortholog of C. elegans tra-1.  In some cases, there will be paralogs and some confusion; we expect this to be minor compared to the convenience of having orthologs having the same names.  For paralogs, a "dot and number" can be appended to distinguish paralogs,  e.g., hsp-16.1, hsp-16.2.


2. When a gene is identified in another species that belongs to a gene class with a clear equivalent in C. elegans, it should be given the same gene class name, but with a unique symbol as a postfix to the number.  The symbol will include one or more letters followed by a number. For example, C. briggsae genes could be dpy-cb1 OR dpy-B1 OR dpy-CAENORHABDITISBRIGGSAE000000001 etc.  The organism's community should decide on the exact implementation; this choice will be tracked by the CGC or WormBase.  A species prefix could be added but will be redundant, e.g., cb-dpy-cb1 OR cb-dpy-B1.  OR ce-dpy-1.

Agree that each community should decide on precise details. Either of the elegans/briggsae options seem OK. The species identifiers need to be set on stone somehow within and across communities. This is a major issue. At some level (certainly in wormbase), the full species binomial needs to be linked to each gene. I would support having wormbase as being the central repository for the binomials, and the accepted 2-letter abbreviations. Some abbreviations should be for ever assigned to some species, and that list could reasonably be those species with full genomes done, in progress, or likely to be cone soon (C. elegans, C. briggsae, Brugia malayi, Meloidogyne hapla, Heteriodera glycines, etc, etc). Thus, Ce would never be used for any other species. But Ma might be. The downside is that it would not be immediately apparent what species Ma really related to (it may be Meloidogyne aranaeria, but it might be something else). As long as one can click on it, and show the full binomial, that won't matter. And the individual researcher working on Meloidogyne aranaeria will know what Ma means in their circumstance instance)

3.  Gene classes with no equivalent in C. elegans or other species will be given unique three-letter-number names.  


4.  For alleles, strains, polymorphisms, rearrangements, transgenes, and other variants, unique numbers (unique across all species) will be assigned by the relevant laboratory using the standard C. elegans nomenclature.  In all cases, a species prefix can be used, but is redundant.  For example, "syIs802" is an integrated transgene in C. briggsae from the Sternberg laboratory; it could be referred to as cb-syIs802.   syIs802will never be used for something else, especially a C. elegans transgene.

Concur. The SON system follows this idea (but in parrallel, which is probably not a good idea). Will this require that non-C. elegans labs be registered in the C. elegans system? I think this should be encouraged.

Existing gene classes used in other species could be retained (but retired) since there are not too many of them (e.g., cby, mip).

I concur.

Responsibility for the numbering of a gene class will reside with the assigning laboratory, unless transferred by them to WormBase and the Caenorhabditis Genetics Center.  (As in the present practice, in some cases, if desirable, a small block of numbers can be assigned to another laboratory.)

Concur. Labs generating large numbers of genes (mainly through genome projects) should establish special relationships with WormBase.