Number of Proteins in TreeFam
Hi,
the TreeFam help page states that TreeFam v9 is based on 2243919 protein sequences. Within the alignments and trees I downloaded however, I can only find 1088507 unique proteins. I realize that orphans or proteins which have no homologs in at least two other species will not be included in these datasets, but 1155412 proteins being orphans or having only an ortholog in one other species seems to be quite a lot. My questions are: are the numbers correct and are there other potential causes that could lead to proteins not being included in the alignments/trees?
Best wishes,
Andy
Comments are currently closed for this discussion. You can start a new one.
Keyboard shortcuts
Generic
? | Show this help |
---|---|
ESC | Blurs the current field |
Comment Form
r | Focus the comment reply box |
---|---|
^ + ↩ | Submit the comment |
You can use Command ⌘
instead of Control ^
on Mac
1 Posted by Fabian on 22 Jul, 2013 03:03 PM
Dear Andy,
thanks for your message and sorry for the late reply.
You are right, the number of orphan genes is quite high.
The answer is that some of those genes should be in a family
but are currently not.
We are looking into ways of building many new families in an automated way.
But once again, currently this number is high and we expect to reduce it soon.
Hope that answers your question.
Cheers,
Fabian
Fabian closed this discussion on 22 Jul, 2013 03:03 PM.