Page 1 of 1

About geo.npy file

Posted: Mon Dec 04, 2023 3:06 pm
by sawcheet
After obtaining the output of the sequence using DRfold, in geo.npy file there are array with name: pp,cc, nn, pcc, pnn, cnn, pccp,pnnp,cnnc.
I don't know what these terms mean, and I could not find any online resource stating what these are. Can anyone help me on this and help me understand what are the long list of the scores that we get in these array while plotting between the nucleotides.

Re: About geo.npy file

Posted: Mon Dec 04, 2023 6:56 pm
by dr.jawairia
ATACAAGAGATGTGAGAAGCACCATAAAAGGCGTTGTGAGGAGTTGTGGGGGAGTGAGGGAGAGAAGAGG
TTGAAAAGCTTATTAGCTGCTGTACGGTAAAACTCCTTCTTTCTGCAACATGGGGAAGAACAAACTCCTT
CATCCAAGTCTGGTTCTTCTCCTCTTGGTCCTCCTGCCCACAGACGCCTCAGTCTCTGGAAAACCGCAGT
ATATGGTTCTGGTCCCCTCCCTGCTCCACACTGAGACCACTGAGAAGGGCTGTGTCCTTCTGAGCTACCT
GAATGAGACAGTGACTGTAAGTGCTTCCTTGGAGTCTGTCAGGGGAAACAGGAGCCTCTTCACTGACCTG
GAGGCGGAGAATGACGTACTCCACTGTGTCGCCTTCGCTGTCCCAAAGTCTTCATCCAATGAGGAGGTAA
TGTTCCTCACTGTCCAAGTGAAAGGACCAACCCAAGAATTTAAGAAGCGGACCACAGTGATGGTTAAGAA
CGAGGACAGTCTGGTCTTTGTCCAGACAGACAAATCAATCTACAAACCAGGGCAGACAGTGAAATTTCGT
GTTGTCTCCATGGATGAAAACTTTCACCCCCTGAATGAGTTGATTCCACTAGTATACATTCAGGATCCCA
AAGGAAATCGCATCGCACAATGGCAGAGTTTCCAGTTAGAGGGTGGCCTCAAGCAATTTTCTTTTCCCCT
CTCATCAGAGCCCTTCCAGGGCTCCTACAAGGTGGTGGTACAGAAGAAATCAGGTGGAAGGACAGAGCAC
CCTTTCACCGTGGAGGAATTTGTTCTTCCCAAGTTTGAAGTACAAGTAACAGTGCCAAAGATAATCACCA
TCTTGGAAGAAGAGATGAATGTATCAGTGTGTGGCCTATACACATATGGGAAGCCTGTCCCTGGACATGT
GACTGTGAGCATTTGCAGAAAGTATAGTGACGCTTCCGACTGCCACGGTGAAGATTCACAGGCTTTCTGT
GAGAAATTCAGTGGACAGCTAAACAGCCATGGCTGCTTCTATCAGCAAGTAAAAACCAAGGTCTTCCAGC
TGAAGAGGAAGGAGTATGAAATGAAACTTCACACTGAGGCCCAGATCCAAGAAGAAGGAACAGTGGTGGA
ATTGACTGGAAGGCAGTCCAGTGAAATCACAAGAACCATAACCAAACTCTCATTTGTGAAAGTGGACTCA
CACTTTCGACAGGGAATTCCCTTCTTTGGGCAGGTGCGCCTAGTAGATGGGAAAGGCGTCCCTATACCAA
ATAAAGTCATATTCATCAGAGGAAATGAAGCAAACTATTACTCCAATGCTACCACGGATGAGCATGGCCT
TGTACAGTTCTCTATCAACACCACCAATGTTATGGGTACCTCTCTTACTGTTAGGGTCAATTACAAGGAT
CGTAGTCCCTGTTACGGCTACCAGTGGGTGTCAGAAGAACACGAAGAGGCACATCACACTGCTTATCTTG
TGTTCTCCCCAAGCAAGAGCTTTGTCCACCTTGAGCCCATGTCTCATGAACTACCCTGTGGCCATACTCA
GACAGTCCAGGCACATTATATTCTGAATGGAGGCACCCTGCTGGGGCTGAAGAAGCTCTCCTTCTATTAT
CTGATAATGGCAAAGGGAGGCATTGTCCGAACTGGGACTCATGGACTGCTTGTGAAGCAGGAAGACATGA
AGGGCCATTTTTCCATCTCAATCCCTGTGAAGTCAGACATTGCTCCTGTCGCTCGGTTGCTCATCTATGC
TGTTTTACCTACCGGGGACGTGATTGGGGATTCTGCAAAATATGATGTTGAAAATTGTCTGGCCAACAAG
GTGGATTTGAGCTTCAGCCCATCACAAAGTCTCCCAGCCTCACACGCCCACCTGCGAGTCACAGCGGCTC
CTCAGTCCGTCTGCGCCCTCCGTGCTGTGGACCAAAGCGTGCTGCTCATGAAGCCTGATGCTGAGCTCTC
GGCGTCCTCGGTTTACAACCTGCTACCAGAAAAGGACCTCACTGGCTTCCCTGGGCCTTTGAATGACCAG
GACAATGAAGACTGCATCAATCGTCATAATGTCTATATTAATGGAATCACATATACTCCAGTATCAAGTA
CAAATGAAAAGGATATGTACAGCTTCCTAGAGGACATGGGCTTAAAGGCATTCACCAACTCAAAGATTCG
TAAACCCAAAATGTGTCCACAGCTTCAACAGTATGAAATGCATGGACCTGAAGGTCTACGTGTAGGTTTT
TATGAGTCAGATGTAATGGGAAGAGGCCATGCACGCCTGGTGCATGTTGAAGAGCCTCACACGGAGACCG
TACGAAAGTACTTCCCTGAGACATGGATCTGGGATTTGGTGGTGGTAAACTCAGCAGGTGTGGCTGAGGT
AGGAGTAACAGTCCCTGACACCATCACCGAGTGGAAGGCAGGGGCCTTCTGCCTGTCTGAAGATGCTGGA
CTTGGTATCTCTTCCACTGCCTCTCTCCGAGCCTTCCAGCCCTTCTTTGTGGAGCTCACAATGCCTTACT
CTGTGATTCGTGGAGAGGCCTTCACACTCAAGGCCACGGTCCTAAACTACCTTCCCAAATGCATCCGGGT
CAGTGTGCAGCTGGAAGCCTCTCCCGCCTTCCTAGCTGTCCCAGTGGAGAAGGAACAAGCGCCTCACTGC
ATCTGTGCAAACGGGCGGCAAACTGTGTCCTGGGCAGTAACCCCAAAGTCATTAGGAAATGTGAATTTCA
CTGTGAGCGCAGAGGCACTAGAGTCTCAAGAGCTGTGTGGGACTGAGGTGCCTTCAGTTCCTGAACACGG
AAGGAAAGACACAGTCATCAAGCCTCTGTTGGTTGAACCTGAAGGACTAGAGAAGGAAACAACATTCAAC
TCCCTACTTTGTCCATCAGGTGGTGAGGTTTCTGAAGAATTATCCCTGAAACTGCCACCAAATGTGGTAG
AAGAATCTGCCCGAGCTTCTGTCTCAGTTTTGGGAGACATATTAGGCTCTGCCATGCAAAACACACAAAA
TCTTCTCCAGATGCCCTATGGCTGTGGAGAGCAGAATATGGTCCTCTTTGCTCCTAACATCTATGTACTG
GATTATCTAAATGAAACACAGCAGCTTACTCCAGAGATCAAGTCCAAGGCCATTGGCTATCTCAACACTG
GTTACCAGAGACAGTTGAACTACAAACACTATGATGGCTCCTACAGCACCTTTGGGGAGCGATATGGCAG
GAACCAGGGCAACACCTGGCTCACAGCCTTTGTTCTGAAGACTTTTGCCCAAGCTCGAGCCTACATCTTC
ATCGATGAAGCACACATTACCCAAGCCCTCATATGGCTCTCCCAGAGGCAGAAGGACAATGGCTGTTTCA
GGAGCTCTGGGTCACTGCTCAACAATGCCATAAAGGGAGGAGTAGAAGATGAAGTGACCCTCTCCGCCTA
TATCACCATCGCCCTTCTGGAGATTCCTCTCACAGTCACTCACCCTGTTGTCCGCAATGCCCTGTTTTGC
CTGGAGTCAGCCTGGAAGACAGCACAAGAAGGGGACCATGGCAGCCATGTATATACCAAAGCACTGCTGG
CCTATGCTTTTGCCCTGGCAGGTAACCAGGACAAGAGGAAGGAAGTACTCAAGTCACTTAATGAGGAAGC
TGTGAAGAAAGACAACTCTGTCCATTGGGAGCGCCCTCAGAAACCCAAGGCACCAGTGGGGCATTTTTAC
GAACCCCAGGCTCCCTCTGCTGAGGTGGAGATGACATCCTATGTGCTCCTCGCTTATCTCACGGCCCAGC
CAGCCCCAACCTCGGAGGACCTGACCTCTGCAACCAACATCGTGAAGTGGATCACGAAGCAGCAGAATGC
CCAGGGCGGTTTCTCCTCCACCCAGGACACAGTGGTGGCTCTCCATGCTCTGTCCAAATATGGAGCAGCC
ACATTTACCAGGACTGGGAAGGCTGCACAGGTGACTATCCAGTCTTCAGGGACATTTTCCAGCAAATTCC
AAGTGGACAACAACAACCGCCTGTTACTGCAGCAGGTCTCATTGCCAGAGCTGCCTGGGGAATACAGCAT
GAAAGTGACAGGAGAAGGATGTGTCTACCTCCAGACATCCTTGAAATACAATATTCTCCCAGAAAAGGAA
GAGTTCCCCTTTGCTTTAGGAGTGCAGACTCTGCCTCAAACTTGTGATGAACCCAAAGCCCACACCAGCT
TCCAAATCTCCCTAAGTGTCAGTTACACAGGGAGCCGCTCTGCCTCCAACATGGCGATCGTTGATGTGAA
GATGGTCTCTGGCTTCATTCCCCTGAAGCCAACAGTGAAAATGCTTGAAAGATCTAACCATGTGAGCCGG
ACAGAAGTCAGCAGCAACCATGTCTTGATTTACCTTGATAAGGTGTCAAATCAGACACTGAGCTTGTTCT
TCACGGTTCTGCAAGATGTCCCAGTAAGAGATCTGAAACCAGCCATAGTGAAAGTCTATGATTACTACGA
GACGGATGAGTTTGCAATTGCTGAGTACAATGCTCCTTGCAGCAAAGATCTTGGAAATGCTTGAAGACCA
CAAGGCTGAAAAGTGCTTTGCTGGAGTCCTGTTCTCAGAGCTCCACAGAAGACACGTGTTTTTGTATCTT
TAAAGACTTGATGAATAAACACTTTTTCTGGTCAATGTC

Re: About geo.npy file

Posted: Mon Dec 04, 2023 6:57 pm
by dr.jawairia
ATACAAGAGATGTGAGAAGCACCATAAAAGGCGTTGTGAGGAGTTGTGGGGGAGTGAGGGAGAGAAGAGG
TTGAAAAGCTTATTAGCTGCTGTACGGTAAAACTCCTTCTTTCTGCAACATGGGGAAGAACAAACTCCTT
CATCCAAGTCTGGTTCTTCTCCTCTTGGTCCTCCTGCCCACAGACGCCTCAGTCTCTGGAAAACCGCAGT
ATATGGTTCTGGTCCCCTCCCTGCTCCACACTGAGACCACTGAGAAGGGCTGTGTCCTTCTGAGCTACCT
GAATGAGACAGTGACTGTAAGTGCTTCCTTGGAGTCTGTCAGGGGAAACAGGAGCCTCTTCACTGACCTG
GAGGCGGAGAATGACGTACTCCACTGTGTCGCCTTCGCTGTCCCAAAGTCTTCATCCAATGAGGAGGTAA
TGTTCCTCACTGTCCAAGTGAAAGGACCAACCCAAGAATTTAAGAAGCGGACCACAGTGATGGTTAAGAA
CGAGGACAGTCTGGTCTTTGTCCAGACAGACAAATCAATCTACAAACCAGGGCAGACAGTGAAATTTCGT
GTTGTCTCCATGGATGAAAACTTTCACCCCCTGAATGAGTTGATTCCACTAGTATACATTCAGGATCCCA
AAGGAAATCGCATCGCACAATGGCAGAGTTTCCAGTTAGAGGGTGGCCTCAAGCAATTTTCTTTTCCCCT
CTCATCAGAGCCCTTCCAGGGCTCCTACAAGGTGGTGGTACAGAAGAAATCAGGTGGAAGGACAGAGCAC
CCTTTCACCGTGGAGGAATTTGTTCTTCCCAAGTTTGAAGTACAAGTAACAGTGCCAAAGATAATCACCA
TCTTGGAAGAAGAGATGAATGTATCAGTGTGTGGCCTATACACATATGGGAAGCCTGTCCCTGGACATGT
GACTGTGAGCATTTGCAGAAAGTATAGTGACGCTTCCGACTGCCACGGTGAAGATTCACAGGCTTTCTGT
GAGAAATTCAGTGGACAGCTAAACAGCCATGGCTGCTTCTATCAGCAAGTAAAAACCAAGGTCTTCCAGC
TGAAGAGGAAGGAGTATGAAATGAAACTTCACACTGAGGCCCAGATCCAAGAAGAAGGAACAGTGGTGGA
ATTGACTGGAAGGCAGTCCAGTGAAATCACAAGAACCATAACCAAACTCTCATTTGTGAAAGTGGACTCA
CACTTTCGACAGGGAATTCCCTTCTTTGGGCAGGTGCGCCTAGTAGATGGGAAAGGCGTCCCTATACCAA
ATAAAGTCATATTCATCAGAGGAAATGAAGCAAACTATTACTCCAATGCTACCACGGATGAGCATGGCCT
TGTACAGTTCTCTATCAACACCACCAATGTTATGGGTACCTCTCTTACTGTTAGGGTCAATTACAAGGAT
CGTAGTCCCTGTTACGGCTACCAGTGGGTGTCAGAAGAACACGAAGAGGCACATCACACTGCTTATCTTG
TGTTCTCCCCAAGCAAGAGCTTTGTCCACCTTGAGCCCATGTCTCATGAACTACCCTGTGGCCATACTCA
GACAGTCCAGGCACATTATATTCTGAATGGAGGCACCCTGCTGGGGCTGAAGAAGCTCTCCTTCTATTAT
CTGATAATGGCAAAGGGAGGCATTGTCCGAACTGGGACTCATGGACTGCTTGTGAAGCAGGAAGACATGA
AGGGCCATTTTTCCATCTCAATCCCTGTGAAGTCAGACATTGCTCCTGTCGCTCGGTTGCTCATCTATGC
TGTTTTACCTACCGGGGACGTGATTGGGGATTCTGCAAAATATGATGTTGAAAATTGTCTGGCCAACAAG
GTGGATTTGAGCTTCAGCCCATCACAAAGTCTCCCAGCCTCACACGCCCACCTGCGAGTCACAGCGGCTC
CTCAGTCCGTCTGCGCCCTCCGTGCTGTGGACCAAAGCGTGCTGCTCATGAAGCCTGATGCTGAGCTCTC
GGCGTCCTCGGTTTACAACCTGCTACCAGAAAAGGACCTCACTGGCTTCCCTGGGCCTTTGAATGACCAG
GACAATGAAGACTGCATCAATCGTCATAATGTCTATATTAATGGAATCACATATACTCCAGTATCAAGTA
CAAATGAAAAGGATATGTACAGCTTCCTAGAGGACATGGGCTTAAAGGCATTCACCAACTCAAAGATTCG
TAAACCCAAAATGTGTCCACAGCTTCAACAGTATGAAATGCATGGACCTGAAGGTCTACGTGTAGGTTTT
TATGAGTCAGATGTAATGGGAAGAGGCCATGCACGCCTGGTGCATGTTGAAGAGCCTCACACGGAGACCG
TACGAAAGTACTTCCCTGAGACATGGATCTGGGATTTGGTGGTGGTAAACTCAGCAGGTGTGGCTGAGGT
AGGAGTAACAGTCCCTGACACCATCACCGAGTGGAAGGCAGGGGCCTTCTGCCTGTCTGAAGATGCTGGA
CTTGGTATCTCTTCCACTGCCTCTCTCCGAGCCTTCCAGCCCTTCTTTGTGGAGCTCACAATGCCTTACT
CTGTGATTCGTGGAGAGGCCTTCACACTCAAGGCCACGGTCCTAAACTACCTTCCCAAATGCATCCGGGT
CAGTGTGCAGCTGGAAGCCTCTCCCGCCTTCCTAGCTGTCCCAGTGGAGAAGGAACAAGCGCCTCACTGC
ATCTGTGCAAACGGGCGGCAAACTGTGTCCTGGGCAGTAACCCCAAAGTCATTAGGAAATGTGAATTTCA
CTGTGAGCGCAGAGGCACTAGAGTCTCAAGAGCTGTGTGGGACTGAGGTGCCTTCAGTTCCTGAACACGG
AAGGAAAGACACAGTCATCAAGCCTCTGTTGGTTGAACCTGAAGGACTAGAGAAGGAAACAACATTCAAC
TCCCTACTTTGTCCATCAGGTGGTGAGGTTTCTGAAGAATTATCCCTGAAACTGCCACCAAATGTGGTAG
AAGAATCTGCCCGAGCTTCTGTCTCAGTTTTGGGAGACATATTAGGCTCTGCCATGCAAAACACACAAAA
TCTTCTCCAGATGCCCTATGGCTGTGGAGAGCAGAATATGGTCCTCTTTGCTCCTAACATCTATGTACTG
GATTATCTAAATGAAACACAGCAGCTTACTCCAGAGATCAAGTCCAAGGCCATTGGCTATCTCAACACTG
GTTACCAGAGACAGTTGAACTACAAACACTATGATGGCTCCTACAGCACCTTTGGGGAGCGATATGGCAG
GAACCAGGGCAACACCTGGCTCACAGCCTTTGTTCTGAAGACTTTTGCCCAAGCTCGAGCCTACATCTTC
ATCGATGAAGCACACATTACCCAAGCCCTCATATGGCTCTCCCAGAGGCAGAAGGACAATGGCTGTTTCA
GGAGCTCTGGGTCACTGCTCAACAATGCCATAAAGGGAGGAGTAGAAGATGAAGTGACCCTCTCCGCCTA
TATCACCATCGCCCTTCTGGAGATTCCTCTCACAGTCACTCACCCTGTTGTCCGCAATGCCCTGTTTTGC
CTGGAGTCAGCCTGGAAGACAGCACAAGAAGGGGACCATGGCAGCCATGTATATACCAAAGCACTGCTGG
CCTATGCTTTTGCCCTGGCAGGTAACCAGGACAAGAGGAAGGAAGTACTCAAGTCACTTAATGAGGAAGC
TGTGAAGAAAGACAACTCTGTCCATTGGGAGCGCCCTCAGAAACCCAAGGCACCAGTGGGGCATTTTTAC
GAACCCCAGGCTCCCTCTGCTGAGGTGGAGATGACATCCTATGTGCTCCTCGCTTATCTCACGGCCCAGC
CAGCCCCAACCTCGGAGGACCTGACCTCTGCAACCAACATCGTGAAGTGGATCACGAAGCAGCAGAATGC
CCAGGGCGGTTTCTCCTCCACCCAGGACACAGTGGTGGCTCTCCATGCTCTGTCCAAATATGGAGCAGCC
ACATTTACCAGGACTGGGAAGGCTGCACAGGTGACTATCCAGTCTTCAGGGACATTTTCCAGCAAATTCC
AAGTGGACAACAACAACCGCCTGTTACTGCAGCAGGTCTCATTGCCAGAGCTGCCTGGGGAATACAGCAT
GAAAGTGACAGGAGAAGGATGTGTCTACCTCCAGACATCCTTGAAATACAATATTCTCCCAGAAAAGGAA
GAGTTCCCCTTTGCTTTAGGAGTGCAGACTCTGCCTCAAACTTGTGATGAACCCAAAGCCCACACCAGCT
TCCAAATCTCCCTAAGTGTCAGTTACACAGGGAGCCGCTCTGCCTCCAACATGGCGATCGTTGATGTGAA
GATGGTCTCTGGCTTCATTCCCCTGAAGCCAACAGTGAAAATGCTTGAAAGATCTAACCATGTGAGCCGG
ACAGAAGTCAGCAGCAACCATGTCTTGATTTACCTTGATAAGGTGTCAAATCAGACACTGAGCTTGTTCT
TCACGGTTCTGCAAGATGTCCCAGTAAGAGATCTGAAACCAGCCATAGTGAAAGTCTATGATTACTACGA
GACGGATGAGTTTGCAATTGCTGAGTACAATGCTCCTTGCAGCAAAGATCTTGGAAATGCTTGAAGACCA
CAAGGCTGAAAAGTGCTTTGCTGGAGTCCTGTTCTCAGAGCTCCACAGAAGACACGTGTTTTTGTATCTT
TAAAGACTTGATGAATAAACACTTTTTCTGGTCAATGTC

Re: About geo.npy file

Posted: Tue Dec 05, 2023 11:09 am
by liyangum
sawcheet wrote: Mon Dec 04, 2023 3:06 pm After obtaining the output of the sequence using DRfold, in geo.npy file there are array with name: pp,cc, nn, pcc, pnn, cnn, pccp,pnnp,cnnc.
I don't know what these terms mean, and I could not find any online resource stating what these are. Can anyone help me on this and help me understand what are the long list of the scores that we get in these array while plotting between the nucleotides.
Hi,

The detailed descriptions of terms in geo.npy can be found in Section Prediction terms of geometry models in the DRfold paper (https://www.nature.com/articles/s41467-023-41303-9).
Normally for RNA structure prediction, you can simply ignore the gep.npy since our progam has already added them into the final prediction. However, if you want to analysis the distance or the orientation, you can refer the code from LINE 180 at https://github.com/leeyang/DRfold/blob/ ... ld/Fold.py to check how to utilize those geometry predictions.

Thanks,
Yang LI