Request for full TreeFam protein family alignment data

3170103839's Avatar

3170103839

02 Apr, 2024 02:23 AM

Dear TreeFam Developers,

I hope this email finds you well. I am a researcher interested in conducting protein family analysis using the alignment data from TreeFam. However, when downloading the compressed alignment package from the TreeFam website, I found that the data is missing for many protein families. Additionally, the alignment files are sometimes unavailable from the download link on individual family pages. (http://www.treefam.org/family/TF10634)

I would like to perform comprehensive analyses across all protein families in TreeFam. Is there a way for me to access or request the full set of alignment data for all families? I checked the documentation but did not see any mention of obtaining a complete dataset.

Your database has been extremely useful for my research so far. I would greatly appreciate if you could point me in the right direction for accessing a full alignment dataset. Please let me know if there are any other steps I need to take or policies to follow.

Thank you for your consideration and for all the work you do in maintaining TreeFam. I look forward to hearing back from you.

Regards, Ian Chen

  1. 1 Posted by Thomas Walsh on 20 Jun, 2025 02:55 PM

    Thomas Walsh's Avatar

    Dear Ian,

    Firstly, I'd like to apologise for the delay in our response. This app is no longer actively monitored, so for future TreeFam-related queries, please reach out to us via the EBI helpdesk ( https://www.ebi.ac.uk/about/contact/support/ ) with topic “TreeFam - database of animal gene trees”.

    I will try to respond to different parts of your query separately.

    > However, when downloading the compressed alignment package from the TreeFam website, I found that the data is missing for many protein families.
    > Additionally, the alignment files are sometimes unavailable from the download link on individual family pages. (http://www.treefam.org/family/TF10634)

    Could you share an example of a particular protein family for which the data is missing?

    I cannot find a record of the family you specified, TF10634. Would you have any other information about this family, such as the identifier of one of its genes or proteins?

    > I would like to perform comprehensive analyses across all protein families in TreeFam. Is there a way for me to access or request the
    > full set of alignment data for all families? I checked the documentation but did not see any mention of obtaining a complete dataset.

    The file treefam_family_data.tar.gz ( http://www.treefam.org/static/download/treefam_family_data.tar.gz ) contains alignment data for 15,321 of 15,736 TreeFam families.

    The 415 remaining families are represented by supertrees; because their subtrees do not have individual TreeFam identifiers, they are not included in the treefam_family_data.tar.gz file.

    I've attached a file ( treefam_9_supertree_ids.txt ) listing the accessions of the supertree families in TreeFam 9. These should be accessible via the TreeFam website (e.g. http://www.treefam.org/family/TF106341 )

    We are working on updating access options for TreeFam data in the near future. Please let us know if you urgently need access to this data in the meantime.

    Regards,

    Thomas.

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac