Issues with some of the TFs and perl API get_families.pl script
Hello,
I have found there to be issues with some of the treefams with regards to alignment files.
For example, if i use the treefam_scan.pl to find the best hit TF for the following sequence:
MQSRWWSCGIRLVTWTWIWSLAFLGAWCIPANEVNLLDSRSVMGDLGWVAYPKNGWEEIGEVDENYAPIH
TYQVCRVMEQNQNNWLHTNWILTEGAQRVFIELKFTLRDCNSLPGGVGTCKETFNMYYYETDGDEEEMEE
GEMRDGTGMTEEEDRAMKESRYIKIDTIAADESFTELDLGDRVMKLNTEVRDLGPLTRKGFYLAFQDLGA
CIALVSVRVFYKRCPFLVKSLAEFPDTIPGSEASQLVEVVGRCVNNSLPLYEPPRMHCSTEGEWLVPIGK
CVCQPGFEEINGSCQVCKVGFYRSLLESLACSKCPPHSVARQMGATACSCEDGYFKLDSDPSNMACTRPP
SAPRNAISNVNETSVFLEWSIPMDTGGRKDVRYNVICRQVLPDGRGLEECGPNVRFLPRRTGLSNTSVMV
ADLQSHTNYSFLLEAVNGVSDLAKGHAKQYVSLNVTTNQAAPSPVSVVRKGHTGKSSIALSWAEPDRPNG
IILEYEIKYFEKEQDSSYTIIKSKDTEMVVEGLKPSSAYIFQVRARTSAGYGAFSRRFEFQTSPYLTATS
ERAQASIVAVAITLALVLLAVVAGFLLSGRRCGYSKAKQDPEEEKMHFHNGHIKFPGVRTYIDPLTYEDP
NQAVHEFAQEIDVSYISIERIIGAGEFGEVCSGPLRLPGKREIQVAIKTLKAGYTEQQRRDFLWEASIMG
QFNHPNIIRLEGVVTKSKPVMIITEYMENGSLDTFLKKNDGQFTVIQLVGMLRGIASGMRYLSDMGYVHR
DLAARNILVNSNLVCKVSDFGLSRVLEDDPEAAYTTRGGKIPIRWTAPEAIAYRKFTSASDVWSYGIVMW
EVMSYGERPYWEMSNQDVIKAVEESYRLPGPMDCPEALYHLMMDCWQRERSNRPKFDEIVCLLDKFIRNP
SSLKKLVNSSHRVSNLLVEHASVEGNCSTQSQTVGEWLDSIKMGRYTELFMEGGYSSLETVAQMTSEDLR
RVGVNLAGHQKKIITSIQEMRVHMNSTNSTVNI*
I find TF314013 to be the best hit. However, when i search for the alignments and tree file from the bulk downloads (taken from here, I'm unable to find any of the relevant files for TF314013.
When i use the web API on that sequence, it again returns the TF314013 as the best hit, but is unable to add the sequence to the gene tree (consistent with the possibility that there is an issue with finding the alignment and/or gene tree for TF314013). Instead I get an error: "There was a Problem submitting the job. Try again"
I tried to use the perl script get_families.pl to download alignment and genetree files for TF314013 and get an error:
DBD::mysql::st execute failed: Unknown column
'gtr.gene_align_id' in 'field list' at
/opt/local/ensembl-v70/ensembl-compara-release-70/modules/Bio/EnsEMBL/Compara/DBSQL/BaseAdaptor.pm
line 147.
If i use mysql to show the columns for the gene_tree_root table in the treefam_production_9_69 database, there is no gene_align_id column.
Has anyone encountered similar issues or can recommend a solution? I am not very familiar with the Ensembl API, so i'm hopeful there might be an easy fix in modifying some of the parameters to fix the problem...
Thanks,
-Patrick
Comments are currently closed for this discussion. You can start a new one.
Keyboard shortcuts
Generic
? | Show this help |
---|---|
ESC | Blurs the current field |
Comment Form
r | Focus the comment reply box |
---|---|
^ + ↩ | Submit the comment |
You can use Command ⌘
instead of Control ^
on Mac
Support Staff 1 Posted by Matthieu Muffat... on 09 Jun, 2015 03:03 PM
Hi Patrick,
The TreeFam 9 database is using the database schema from Ensembl Compara 69, not 70. The e70 code is expecting a gene_align_id column which was not present in e69
Can you please select the "release/69" branch and try again ?
Matthieu
2 Posted by Patrick McGrath on 09 Jun, 2015 04:05 PM
Thanks... i'll try that. When i originally ran the script using ensembl 79,
i received an error:
For treefam_production_9_69 there is a difference in the software release
(79) and the database release (70). You should update one of these to
ensure that your script does not crash.
I assumed that meant i should use Ensembl Compara 70
Support Staff 3 Posted by Matthieu Muffat... on 09 Jun, 2015 04:09 PM
That was a sensible decision :)
I've looked at the database and there is conflicting information in the table that reports the schema version. It both says 69 and 70.
v69 has been used to produce the database, and should be used to read from it. I'll look into removing the bit that says 70.You can discard the warning in the meantime
Matthieu
4 Posted by Patrick McGrath on 27 Sep, 2017 01:19 AM
I was curious if this issue was ever determined (i.e. some of the Treefams identified by the treefam_scan.pl script (such as TF314013) do not have tree files available in the download section of this website.
I can use the MySQL database to get a newick format file for TF314013, however, it appears to lack support values for each node unlike the treefiles hosted on this web server. Is there a way to get the phylogeny files for TF314013 and the other missing treefams that include node support values?
Thanks....
Support Staff 5 Posted by Matthieu Muffat... on 02 Oct, 2017 01:19 PM
This is because of the way data have been generated for TF314013. The family (as others) was too big to directly be built (alignment, trees, orthologues), so it had to be broken down into smaller pieces (sub-families). As a result TF314013 doesn't have the same set of annotations as smaller families.
415 families are affected. They have the tree_type "supertree" in the database
Matthieu
6 Posted by Patrick McGrath on 02 Oct, 2017 02:24 PM
Thanks.... so what should you do if a the hmmscan hits one of these trees?
How do you know which subfamily to use?
Support Staff 7 Posted by Matthieu Muffat... on 16 Oct, 2017 12:01 PM
With the Perl API, one can fetch a tree for TF314013 that contains all the subfamilies.
As this stands, the subfamilies are just the product of our own limitation when computing the alignment, trees, etc, and displaying them. The only family that ought to be used is TF314013.
Matthieu Muffato closed this discussion on 16 Jun, 2020 04:08 PM.