Number of Proteins in TreeFam

Andreas Schüler's Avatar

Andreas Schüler

14 Jul, 2013 03:51 PM

the TreeFam help page states that TreeFam v9 is based on 2243919 protein sequences. Within the alignments and trees I downloaded however, I can only find 1088507 unique proteins. I realize that orphans or proteins which have no homologs in at least two other species will not be included in these datasets, but 1155412 proteins being orphans or having only an ortholog in one other species seems to be quite a lot. My questions are: are the numbers correct and are there other potential causes that could lead to proteins not being included in the alignments/trees?
Best wishes,

  1. 1 Posted by Fabian on 22 Jul, 2013 03:03 PM

    Fabian's Avatar

    Dear Andy,
    thanks for your message and sorry for the late reply.
    You are right, the number of orphan genes is quite high.

    The answer is that some of those genes should be in a family
    but are currently not.
    We are looking into ways of building many new families in an automated way.

    But once again, currently this number is high and we expect to reduce it soon.

    Hope that answers your question.


  2. Fabian closed this discussion on 22 Jul, 2013 03:03 PM.

Comments are currently closed for this discussion. You can start a new one.

Keyboard shortcuts


? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac