Request for full TreeFam protein family alignment data
Dear TreeFam Developers,
I hope this email finds you well. I am a researcher interested in conducting protein family analysis using the alignment data from TreeFam. However, when downloading the compressed alignment package from the TreeFam website, I found that the data is missing for many protein families. Additionally, the alignment files are sometimes unavailable from the download link on individual family pages. (http://www.treefam.org/family/TF10634)
I would like to perform comprehensive analyses across all protein families in TreeFam. Is there a way for me to access or request the full set of alignment data for all families? I checked the documentation but did not see any mention of obtaining a complete dataset.
Your database has been extremely useful for my research so far. I would greatly appreciate if you could point me in the right direction for accessing a full alignment dataset. Please let me know if there are any other steps I need to take or policies to follow.
Thank you for your consideration and for all the work you do in maintaining TreeFam. I look forward to hearing back from you.
Regards, Ian Chen
Keyboard shortcuts
Generic
? | Show this help |
---|---|
ESC | Blurs the current field |
Comment Form
r | Focus the comment reply box |
---|---|
^ + ↩ | Submit the comment |
You can use Command ⌘
instead of Control ^
on Mac
1 Posted by Thomas Walsh on 20 Jun, 2025 02:55 PM
Dear Ian,
Firstly, I'd like to apologise for the delay in our response. This app is no longer actively monitored, so for future TreeFam-related queries, please reach out to us via the EBI helpdesk ( https://www.ebi.ac.uk/about/contact/support/ ) with topic “TreeFam - database of animal gene trees”.
I will try to respond to different parts of your query separately.
> However, when downloading the compressed alignment package from the TreeFam website, I found that the data is missing for many protein families.
> Additionally, the alignment files are sometimes unavailable from the download link on individual family pages. (http://www.treefam.org/family/TF10634)
Could you share an example of a particular protein family for which the data is missing?
I cannot find a record of the family you specified, TF10634. Would you have any other information about this family, such as the identifier of one of its genes or proteins?
> I would like to perform comprehensive analyses across all protein families in TreeFam. Is there a way for me to access or request the
> full set of alignment data for all families? I checked the documentation but did not see any mention of obtaining a complete dataset.
The file treefam_family_data.tar.gz ( http://www.treefam.org/static/download/treefam_family_data.tar.gz ) contains alignment data for 15,321 of 15,736 TreeFam families.
The 415 remaining families are represented by supertrees; because their subtrees do not have individual TreeFam identifiers, they are not included in the treefam_family_data.tar.gz file.
I've attached a file ( treefam_9_supertree_ids.txt ) listing the accessions of the supertree families in TreeFam 9. These should be accessible via the TreeFam website (e.g. http://www.treefam.org/family/TF106341 )
We are working on updating access options for TreeFam data in the near future. Please let us know if you urgently need access to this data in the meantime.
Regards,
Thomas.