[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with Ganon? #35

Closed
rjsorr opened this issue Feb 22, 2022 · 5 comments
Closed

Compatibility with Ganon? #35

rjsorr opened this issue Feb 22, 2022 · 5 comments

Comments

@rjsorr
Copy link
rjsorr commented Feb 22, 2022

Hi
I'm using Ganon (https://github.com/pirovc/ganon) to classify reads, and this program seems to be gaining popularity. As such, I'm wondering a) if recentrifuge will be updated to use the results of ganon directly, as centrifuge, clark etc? and b) if not, which output from ganon would be most compatable as an input to recentrifuge?

@khyox khyox self-assigned this Feb 22, 2022
@khyox khyox added the question label Feb 22, 2022
@khyox
Copy link
Owner
khyox commented Feb 22, 2022

Hi @rjsorr,
Thanks for the question about Ganon. I have been reviewing the public information about Ganon and I have been unable to find how Ganon scores the classification assignments that it does. One of the foundations of Recentrifuge is the score or confidence level for every result, so that you can have a kind of uncertainty quantification for every result. All the currently supported classifiers (Centrifuge, CLARK, Kraken...) provide a score for their classifications that Recentrifuge can use. I cannot find any kind of classification score provided by Ganon, therefore Recentrifuge wouldn't be able to support this classifier (or any other that does not provide a score for every classification it does).
I am closing the issue now, but please reopen it if you find or figure out how to get a classification score from Ganon's output —if not and you are an user of that software, you would probably want that feature in Ganon and I'd suggest that's something you may ask the authors to include in a future version.

@khyox khyox closed this as completed Feb 22, 2022
@rjsorr
Copy link
Author
rjsorr commented Feb 23, 2022

Hi @khyox,
Was in contact with the author of ganon and got the reply:
Both the lca (--output-lca) or complete output (--output-all) from ganon classify reports 3 fields: read id, target (taxid), k-mer/minimizer count. The later can be used as a score, but I'd have to match and add read length against id. I can send a copy of the file output if needs be?

@khyox khyox reopened this Feb 23, 2022
@khyox khyox changed the title compatability with Ganon? Compatibility with Ganon? Feb 23, 2022
@khyox
Copy link
Owner
khyox commented Feb 23, 2022

Thanks! As you have indicated in pirovc/ganon#198, the key is using Recentrifuge's generic classifier feature for parsing Ganon's output. Yes, please attach a file here (or the head of a file if it's too large) so that we can figure out the format string for the generic classifier.

@rjsorr
Copy link
Author
rjsorr commented Feb 23, 2022

cheers @khyox,
here is the head, 130mb file! This is 150PE data.

E00224:537:HK7G5CCXY:4:1101:1255:73035 104662 11
E00224:537:HK7G5CCXY:4:1101:1742:40091 307461 10
E00224:537:HK7G5CCXY:4:1101:1265:26642 1654883 9
E00224:537:HK7G5CCXY:4:1101:1742:47685 76775 34
E00224:537:HK7G5CCXY:4:1101:1265:30896 2032623 36
E00224:537:HK7G5CCXY:4:1101:1742:51131 1 7
E00224:537:HK7G5CCXY:4:1101:1265:42183 1 7
E00224:537:HK7G5CCXY:4:1101:1742:51342 1287275 8
E00224:537:HK7G5CCXY:4:1101:1265:48547 2657482 35
E00224:537:HK7G5CCXY:4:1101:1742:54295 1 7
E00224:537:HK7G5CCXY:4:1101:1265:59306 10256 12
E00224:537:HK7G5CCXY:4:1101:1742:56194 2032623 22
E00224:537:HK7G5CCXY:4:1101:1265:67287 1783272 7
E00224:537:HK7G5CCXY:4:1101:1742:59393 1 7
E00224:537:HK7G5CCXY:4:1101:1742:61890 249058 30
E00224:537:HK7G5CCXY:4:1101:1276:22686 1 7
E00224:537:HK7G5CCXY:4:1101:1753:10328 2053014 9
E00224:537:HK7G5CCXY:4:1101:1276:44943 2731360 10
E00224:537:HK7G5CCXY:4:1101:1753:25552 180230 11
E00224:537:HK7G5CCXY:4:1101:1276:49408 1 10

@khyox
Copy link
Owner
khyox commented Mar 2, 2022

Thanks @rjsorr.

One of the required fields for Recentrifuge's generic parser is the read length, and Ganon output is not providing it. Anyway, you can add a dummy value (or the real value if you know it, e.g. in case of short reads with a fixed length) using sed. For example, in bash, for adding 200 nt as length:

$ sed s/$/' '200/ ganon.ssv.gen >> ganon_rcf.ssv.gen

where ganon.ssv.gen would be the Ganon output (in my case, your example above) and ganon_rcf.ssv.gen the file for Recentrifuge to process:

E00224:537:HK7G5CCXY:4:1101:1255:73035 104662 11 200
E00224:537:HK7G5CCXY:4:1101:1742:40091 307461 10 200
E00224:537:HK7G5CCXY:4:1101:1265:26642 1654883 9 200
E00224:537:HK7G5CCXY:4:1101:1742:47685 76775 34 200
E00224:537:HK7G5CCXY:4:1101:1265:30896 2032623 36 200
E00224:537:HK7G5CCXY:4:1101:1742:51131 1 7 200
E00224:537:HK7G5CCXY:4:1101:1265:42183 1 7 200
E00224:537:HK7G5CCXY:4:1101:1742:51342 1287275 8 200
E00224:537:HK7G5CCXY:4:1101:1265:48547 2657482 35 200
E00224:537:HK7G5CCXY:4:1101:1742:54295 1 7 200
E00224:537:HK7G5CCXY:4:1101:1265:59306 10256 12 200
E00224:537:HK7G5CCXY:4:1101:1742:56194 2032623 22 200
E00224:537:HK7G5CCXY:4:1101:1265:67287 1783272 7 200
E00224:537:HK7G5CCXY:4:1101:1742:59393 1 7 200
E00224:537:HK7G5CCXY:4:1101:1742:61890 249058 30 200
E00224:537:HK7G5CCXY:4:1101:1276:22686 1 7 200
E00224:537:HK7G5CCXY:4:1101:1753:10328 2053014 9 200
E00224:537:HK7G5CCXY:4:1101:1276:44943 2731360 10 200
E00224:537:HK7G5CCXY:4:1101:1753:25552 180230 11 200
E00224:537:HK7G5CCXY:4:1101:1276:49408 1 10 200

The other required field is which code/character uses Ganon in the TID column when a read is unclassified. That case was not appearing in your example. Maybe Ganon is simply not listing those reads that are not classified. I have used an * for this field, since that code is used in other known classifiers, but you can change it accordingly if/when needed.

With all the above, the format string for Ganon, would be "TYP:ssv,TID:2,SCO:3,LEN:4,UNC:*", and an example of using that in the rcf script would be:

rcf -n /your/recentrifuge/taxdump/ -g ganon_rcf.ssv.gen --format "TYP:ssv,TID:2,SCO:3,LEN:4,UNC:*" -d -s GENERIC

and you should get the expected output after rcf's header:

INFO: Debugging mode activated
INFO: Active parameters:
	nodespath = /your/recentrifuge/taxdump/
	format = TYP:ssv,TID:2,SCO:3,LEN:4,UNC:*
	generic = ['ganon_rcf.ssv.gen']
	extra = FULL
	controls = 0
	scoring = GENERIC
	summary = add
	debug = True
INFO: Generic format = TYP:SSV, TID:2, LEN:4, SCO:3, UNC:*.
Loading NCBI nodes... OK! 
Loading NCBI names... OK! 
Building dict of parent to children taxa... OK! 

Please, wait, processing files in parallel...

Processing sample ganon_rcf.ssv.gen ...
Loading output file ganon_rcf.ssv.gen... OK!
  Seqs read: 20	[4.00 knt]
  Seqs clas: 20	(0.00% unclassified)
  Seqs pass: 20	(0.00% rejected)
  Scores: min = 7.0, max = 36.0, avr = 14.4
  Read length: min = 200 nt, max = 200 nt, avr = 200 nt
  TaxIds: by classifier = 14, by filter = 14
Building from raw data with mintaxa = 1 ... 
  Building ontology tree with all-in-1... OK! 
  Check for more seqs lost ([in/ex]clude affects)... OK!
  Checking taxid loss (orphans)... OK!
  Assess accumulation due to "folding the tree"...
  INFO: No migration! OK!
ganon_rcf.ssv sample OK!
Load elapsed time: 0.00185 sec


Building the taxonomy multiple tree... OK!
Generating final plot (ganon_rcf.ssv.gen.rcf.html)... OK!
INFO: Saving extra output as an Excel file.
Generating Excel full summary (ganon_rcf.ssv.gen.rcf.xlsx)... OK!
Total elapsed time: 00:00:09

Process finished with exit code 0

A screenshot of the resulting html file:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants