Pipeline flow

CD-box snoRNAs

1. Split the input

At first split the input unmapped sequences into manageable chunks.

Split fasta file into batches

usage: rg_split_fasta [-h] [-v] [--input INPUT] [--output-dir OUTPUT_DIR]
                      [--batch-size BATCH_SIZE] [--prefix PREFIX]
                      [--suffix SUFFIX]
Options:
-v=False, --verbose=False
 Be loud!
--input=<open file '<stdin>', mode 'r' at 0x7f0cd5f1d0c0>
 Input file in fasta format. Defaults to sys.stdin.
--output-dir=./
 Output directory for split files. Defaults to .
--batch-size=100
 Batch size to split, defaults to 100
--prefix=part_ Prefix to file name , defaults to part_
--suffix=inputfasta
 Suffix (extension) to the file name , defaults to inputfasta

2. Generate various files from snoRNAs

i. Make FASTA

Generate fasta file from snoRNA input

usage: rg_generate_fasta [-h] [-v] --input INPUT [--output OUTPUT] --type
                         {CD,HACA} [--switch-boxes]
Options:
-v=False, --verbose=False
 Be loud!
--input Input file in tab format.
--output Output file in fasta format.
--type

Type of snoRNA

Possible choices: CD, HACA

--switch-boxes=False
 If the CD box is located wrongly it will try to relabel it

ii. Generate separate files

Generate fasta files for PLEXY from snoRNA input

usage: rg_generate_input_for_plexy_or_rnasnoop [-h] [-v] --input INPUT --type
                                               {CD,HACA} [--dir DIR]
                                               [--switch-boxes]
Options:
-v=False, --verbose=False
 Be loud!
--input Input file in tab format.
--type

Type of snoRNA. If CD is chosen an input for PLEXY will be generated. If HACA is chosen two stems for RNASnoop will be saved.

Possible choices: CD, HACA

--dir=Input Directory to put output , defaults to Plexy
--switch-boxes=False
 If the CD box is located wrongly it will try to relabel it

iii. Make BED

Generate fasta file from snoRNA input

usage: rg_generate_snoRNA_bed [-h] [-v] --input INPUT [--output OUTPUT] --type
                              {CD,HACA} [--switch-boxes]
Options:
-v=False, --verbose=False
 Be loud!
--input Input file in tab format.
--output Output file in fasta format.
--type

Type of snoRNA

Possible choices: CD, HACA

--switch-boxes=False
 If the CD box is located wrongly it will try to relabel it

3. Annotate with snoRNAs

Annotate input BED file used for generation of clusters with snoRNAs.

Annotate bed file with another bed file containing annotations

usage: rg_annotate_bed [-h] [-v] --input FILE [--output FILE] --annotations
                       FILE [--fraction FLOAT] [--placeholder STRING]
                       [--un_stranded] [--filter-by FILTER_BY]
Options:
-v=0, --verbose=0
 Print more verbose messages for each additional verbose level.
--input a bed file that you want to annotate
--output=output.tab
 an output table with annotations
--annotations a bed file with annotations
--fraction=0.25
 Fraction of read that must overlap the feature to be accepted
--placeholder=.
 A placeholder for empty annotations
--un_stranded=False
 Pass if your protocol is un-stranded
--filter-by Filter by these (coma separated) list of annotation types

########################## FILE DESCRIPTION ###################################################

BED FILE FOR WITH ANNOTATION EXAMPLE
1 24740163 24740215 miRNA:ENST00000003583 0 - 1 24727808 24727946 miRNA:ENST00000003583 0 - 1 24710391 24710493 miRNA:ENST00000003583 0 -

fields: chr start end annot_type:annot_name num strand”]

INPUT BED FILE EXAMPLE
1 24685109 24687340 ENST00000003583 0 - 1 24687531 24696163 ENST00000003583 0 - 1 24696329 24700191 ENST00000003583 0 -

########################## FILE DESCRIPTION ###################################################

4. Calculate snoRNA expression

Based on annotations calculate RPKM values for each snoRNA and filter all that falls below given quantile.

RPKM = (10^9 * C)/(N * L)

where:
C = Number of reads mapped to a gene N = Total mapped reads in the experiment (library size) L = Length of the feature (in this case snoRNA length)

usage: rg_calculate_snoRNA_RPKM [-h] [-v] --input INPUT [--output OUTPUT]
                                --library LIBRARY --snoRNAs SNORNAS
                                [--quantile QUANTILE] [--type {CD,HACA}]
Options:
-v=False, --verbose=False
 Be loud!
--input Part of the library that is annotated as snoRNA
--output Output file in tab format.
--library Library from which the annotations were generated (in bed format)
--snoRNAs BED file with snoRNAs
--quantile=0.25
 Quantile for the expression cut-off, defaults to 0.25
--type=CD

Type of snoRNA, defaults to CD

Possible choices: CD, HACA

5. Prepare anchors

Prepare anchor sequences from provided fasta

usage: rg_prepare_anchors [-h] [-v] [--fasta-to-anchor FASTA_TO_ANCHOR]
                          [--anchor-length ANCHOR_LENGTH] [--output OUTPUT]
                          --expressed-snoRNAs EXPRESSED_SNORNAS
Options:
-v=False, --verbose=False
 Be loud!
--fasta-to-anchor
 Fasta to anchor
--anchor-length=12
 Anchor length, defaults to 12
--output Output file name
--expressed-snoRNAs
 A list with expressed snoRNAs with RPKMs in form of: snoR_ID RPKM

6. Build Bowtie2 index

i. Cluster reads

Cluster reads into more convinient bed file

usage: rg_cluster_reads [-h] [-v] --input INPUT [--bed]
                        [--cluster-size CLUSTER_SIZE] [--overlap OVERLAP]
                        [--expand-cluster EXPAND_CLUSTER]
                        [--expand-read EXPAND_READ] [--output OUTPUT]
                        [--asmbed] [--rRNAs RRNAS] [--tRNAs TRNAS]
                        [--snRNAs SNRNAS]
                        [--filter-by FILTER_BY | --filter-except FILTER_EXCEPT]
Options:
-v=False, --verbose=False
 Be loud!
--input Input file in special asmbed format or in bed format
--bed=False Specifies if the input file is in bed format
--cluster-size=1
 Number of reads necessary for a group to be considered a cluster. eg. 2 returns all groups with 2 or more overlapping reads, defaults to 1
--overlap=-1 Distance in basepairs for two reads to be in the same cluster. For instance 20 would group all reads with 20bp of each other. Negative number means overlap eg. -10 - read must overlap at leas 10 basepairs, defaults to -1
--expand-cluster=0
 Expand cluster in both directions, defaults to 0
--expand-read=15
 Expand read in both directions (some alternative to expand cluseter), defaults to 15
--output=output.bed
 Output file in bed format , defaults to output.bed
--asmbed=False Write in asmbed format for fasta extraction
--rRNAs rRNAs to add in the end of the clusters
--tRNAs tRNAs to add in the end of the clusters
--snRNAs snRNAs to add in the end of the clusters
--filter-by Keep only read with these tags in read_ids. Input is coma separated list of tags
--filter-except
 Keep read except with these tags in read_ids. Input is coma separated list of tags

ii. Make FASTA

Prepare FASTA file from clustered reads

Given bed file extract sequences according to chromosome and strand and save it as additional column in input file or fasta

usage: rg_extract_sequences [-h] [-v] [--input INPUT] [--output OUTPUT]
                            [--format {bed,fasta}]
                            [--sequence-length SEQUENCE_LENGTH] --genome-dir
                            GENOME_DIR [--window-left WINDOW_LEFT]
                            [--window-right WINDOW_RIGHT]
                            [--adjust-coordinates]
Options:
-v=False, --verbose=False
 Be loud!
--input=<open file '<stdin>', mode 'r' at 0x7f0cd5f1d0c0>
 Input file in Bed format. Defaults to stdin
--output=<open file '<stdout>', mode 'w' at 0x7f0cd5f1d150>
 Output file in Bed format. Defaults to stdout
--format=bed

Output format, defaults to bed

Possible choices: bed, fasta

--sequence-length
 Final length of sequence to extract independently of coordinates.
--genome-dir Directory where the fasta sequences with all the chromosomes are stored
--window-left=0
 Add nucleotides to the left (upstream). This option does not work if sequence-length is specified, defaults to 0
--window-right=0
 Add nucleotides to the right (downstream). This option does not work if sequence-length is specified, defaults to 0
--adjust-coordinates=False
 Adjust coordinates to new values dictated by windows length, defaults to False

iii. Build index

The index is build with following command:

bowtie2-build input.fa path/to/index/bowtie_index 2> /dev/null

7. Run analysis

For each part split in first task an analysis is run.

i. Search anchors

For each read in the file check if there is an anchor sequence and if this is the case make local alignment (SW) for each associated sequence. As a sequence in the read take only that with the best score.

usage: rg_search_anchor_and_make_alignments [-h] [-v] [--anchors ANCHORS]
                                            [--anchor-sequences ANCHOR_SEQUENCES]
                                            [--reads READS] [--match MATCH]
                                            [--mismatch MISMATCH]
                                            [--gap-open GAP_OPEN]
                                            [--gap-extend GAP_EXTEND]
                                            [--output OUTPUT] [--RNase-T1]
Options:
-v=False, --verbose=False
 Be loud!
--anchors File with anchors (tab-separated)
--anchor-sequences
 Sequences from which anchors were generated
--reads File with reads
--match=2 Match score, defaults to 2
--mismatch=-5 Mismatch penalty, defaults to -5
--gap-open=-6 Open gap penalty, defaults to -6
--gap-extend=-4
 Gap extension penalty, defaults to -4
--output Output table
--RNase-T1=False
 Indicates if in the experiment RNase T1 was used

ii. Make statistics

This is set of two tasks:
  1. Merging the files from anchor search
  2. Making statistics with following script:

Make statistic, prepare plots and evaluate thresholds

usage: rg_make_stats_for_search [-h] [-v] --input INPUT [--output OUTPUT]
                                [--dir DIR] [--length LENGTH] [--fpr FPR]
Options:
-v=False, --verbose=False
 Be loud!
--input Input file in tab format.
--output Output file in tab format.
--dir=Plots Directory to store the plots , defaults to Plots
--length=15 Threshold for length of the target site, defaults to 15
--fpr=0.05 False positive rate threshold, defaults to 0.05

iii. Convert to FASTA

Convert output table from alignment search into fasta

usage: rg_convert_tab_to_fasta [-h] [-v] [--input INPUT] [--output OUTPUT]
                               [--stats STATS] [--length LENGTH]
                               [--assign-score-threshold] [--filter-ambiguous]
                               [--five-prime-adapter FIVE_PRIME_ADAPTER]
                               [--three-prime-adapter THREE_PRIME_ADAPTER]
                               [--five-prime-adapter-threshold FIVE_PRIME_ADAPTER_THRESHOLD]
                               [--three-prime-adapter-threshold THREE_PRIME_ADAPTER_THRESHOLD]
Options:
-v=False, --verbose=False
 Be loud!
--input Input table
--output Output fasta file
--stats Undocumented
--length=15 Length of the target site to keep, defaults to 15
--assign-score-threshold=False
 Undocumented
--filter-ambiguous=False
 Filter reads that can be assigned to more than one snoRNA
--five-prime-adapter
 Five prime adapter sequence used in experiment - will be used to remove reads that are similar
--three-prime-adapter
 Three prime adapter sequence used in experiment - will be used to remove reads that are similar
--five-prime-adapter-threshold=0.8
 Threshold of the identity to the 5’ adapter, defaults to 0.8
--three-prime-adapter-threshold=0.8
 Threshold of the identity to the 3’ adapter, defaults to 0.8

iv. Map reads

Map target parts to the cluster with following command:

bowtie2 -x ./index/bowtie_index -f -D100 -L 13 -i C,1 --local -k 10 -U input.anchorfasta -S output.sam

v. Convert result to BED

Convert result from mapping into BED file with following command:

samtools view -S input.sam -b -u | bamToBed -tag AS | grep -P "\t\+" > output

vi. Filter BED

Filter bed file based on the alignment score/number of reads in cluster/number of mutations

usage: rg_filter_bed [-h] [-v] --input INPUT --output OUTPUT
                     [--filter-multimappers]
Options:
-v=False, --verbose=False
 Be loud!
--input Input bed file with special fields
--output Output file
--filter-multimappers=False
 Filter chimeras that can be mapped to multiple places in the genome (with exception of mapping to cannonical targets)

vi. Reasign chromosome

From the bed from FilterBed step get the positions of the found target sites in terms of real chromosomes not clusters.

usage: rg_get_true_chromosome_positions [-h] [-v] [--input INPUT]
                                        [--output OUTPUT]
Options:
-v=False, --verbose=False
 Be loud!
--input=<open file '<stdin>', mode 'r' at 0x7f0cd5f1d0c0>
 Input file in special bed format. Defaults to sys.stdin.
--output=<open file '<stdout>', mode 'w' at 0x7f0cd5f1d150>
 Output file in special bed format. Defaults to sys.stdout.

vii. Append sequence

The same script as for the FASTA extraction from Bowtie2 index.

viii. Calculate PLEXY

RNA5-8S5|NR_003285.2 15 30 SNORD16 1 + RNA5-8S5|NR_003285.2 86 105 SNORD16 1 + RNA28S5|NR_003287.2 1563 1582 SNORD56B 1 +

SNORD50A|chr7|+|57640816|57640830|20|20 SNORD50A TCATGCTTTGTGTTGTGAAGACCGCCTGGGACTACCGGGCAGGGTGTAGTAGGCA SNORD50A|chr7|+|68527467|68527482|20|20 SNORD50A ACTGAAGAAATTCAGTGAAATGCGGGTAAACGGCGGGAGTAACTATGACTCTCTTA SNORD50A|chr7|+|68527638|68527654|20|20 SNORD50A AATCAGCGGGGAAAGAAGACCCTGTTGAGTTTGACTCTAGTCTGGCATGGTGAAGAG

usage: rg_check_hybrids_with_plexy [-h] [-v] --input INPUT --output OUTPUT
                                   [--snoRNA-paths SNORNA_PATHS]
                                   [--plexy-tmp PLEXY_TMP] --plexy-bin
                                   PLEXY_BIN
Options:
-v=False, --verbose=False
 Be loud!
--input Input file in tab format.
--output Output file in tab format.
--snoRNA-paths=./Plexy/
 Path to snoRNAs with Plexy , defaults to ./Plexy/
--plexy-tmp=temp/
 Plexy temporary directory , defaults to temp/
--plexy-bin Path to PLEXY binary

ix. Calculate RNAduplex

RNA5-8S5|NR_003285.2 15 30 SNORD16 1 + RNA5-8S5|NR_003285.2 86 105 SNORD16 1 + RNA28S5|NR_003287.2 1563 1582 SNORD56B 1 +

SNORD50A|chr7|+|57640816|57640830|20|20 SNORD50A TCATGCTTTGTGTTGTGAAGACCGCCTGGGACTACCGGGCAGGGTGTAGTAGGCA SNORD50A|chr7|+|68527467|68527482|20|20 SNORD50A ACTGAAGAAATTCAGTGAAATGCGGGTAAACGGCGGGAGTAACTATGACTCTCTTA SNORD50A|chr7|+|68527638|68527654|20|20 SNORD50A AATCAGCGGGGAAAGAAGACCCTGTTGAGTTTGACTCTAGTCTGGCATGGTGAAGAG

usage: rg_check_hybrids_with_rnaduplex [-h] [-v] --input INPUT --output OUTPUT
                                       [--snoRNA-paths SNORNA_PATHS]
                                       [--RNAduplex-bin RNADUPLEX_BIN]
Options:
-v=False, --verbose=False
 Be loud!
--input Input file in tab format.
--output Output file in tab format.
--snoRNA-paths=./Plexy/
 Path to snoRNAs with Plexy , defaults to ./Plexy/
--RNAduplex-bin=RNAduplex
 Path to RNAduplex binary, defaults to RNAcofold

8. Analyse RNAduplex results

RNAduplex and PLEXY results goes slightly different analysis.

i. Merge results

Nothing to add

ii. Cluster results

Cluster results according to the position of the hit and miRNA The input file looks like that:

chr6 99846856 99846871 2628039_1-Unique-1:hsa-miR-129-3p:8 30 - chr3 30733346 30733368 2630171_1-Unique-1:hsa-miR-93:N 36 + chr17 3627403 3627417 2632714_1-Unique-1:hsa-miR-186:N 28 + chr17 3627403 3627417 2639898_1-Unique-1:hsa-miR-16:N 28 +

usage: rg_cluster_results [-h] [-v] --input INPUT [--output OUTPUT]
                          [--cluster-size CLUSTER_SIZE] [--overlap OVERLAP]
Options:
-v=False, --verbose=False
 Be loud!
--input Input table file in bed like format
--output=output.tab
 Output table , defaults to output.tab
--cluster-size=1
 Number of reads necessary for a group to be considered a cluster. eg. 2 returns all groups with 2 or more overlapping reads, defaults to 1
--overlap=-40 Distance in basepairs for two reads to be in the same cluster. For instance 20 would group all reads with 20bp of each other. Negative number means overlap eg. -10 - read must overlap at leas 10 basepairs, defaults to -1

iii. Annotate results

Annotate found snoRNA target sites

usage: rg_annotate_positions [-h] [-v] --input INPUT [--output OUTPUT]
                             --regions REGIONS --genes GENES
                             [--snoRNAs SNORNAS] --repeats REPEATS
Options:
-v=False, --verbose=False
 Be loud!
--input Input file in tab format.
--output Output file in tab format.
--regions GFF file with annotations for different gene regions eg. UTRs
--genes Positions of all genes in GFF format
--snoRNAs GFF file with annotations for snoRNAs in the same format as genes file
--repeats GTF file with annotations for repeats in the format from rmsk table in UCSC

iv. Make statistics

Make some useful plots for RNAduplex results

usage: rg_make_plots_for_rnaduplex [-h] [-v] --input INPUT --snoRNAs SNORNAS
                                   --type {CD,HACA} [--dir DIR]
                                   [--threshold THRESHOLD]
Options:
-v=False, --verbose=False
 Be loud!
--input Input file in TAB
--snoRNAs Table with snoRNAs
--type

Type of snoRNA

Possible choices: CD, HACA

--dir=Plots Directory to store plots, defaults to Plots
--threshold=-25.0
 Threshold for RNAduplex energy, defaults to -25.0

9. Analyse PLEXY

i. Merge results

cat output/*.scorebed > results_with_score.tab

ii. Merge raw results

cat output/*.truechrombed > raw_reds_results.tab

iii. Append RPKM

Append rpkm values to the plexy predictions

usage: rg_add_rpkm_to_score [-h] [-v] --input INPUT [--output OUTPUT] --rpkm
                            RPKM --annotated-reads ANNOTATED_READS
                            [--type {CD,HACA}]
Options:
-v=False, --verbose=False
 Be loud!
--input Input file in tab format.
--output Output file in tab format.
--rpkm File with rpkms of snoRNAs
--annotated-reads
 Mapped reads annotated as snoRNAs
--type=CD

Type of snoRNAs , defaults to CD

Possible choices: CD, HACA

iv. Aggregate results by site

Divide plexy output into positives and negatives set

usage: rg_aggregate_scored_results [-h] [-v] --input INPUT [--output OUTPUT]
                                   [--threshold THRESHOLD] [--type {CD,HACA}]
Options:
-v=False, --verbose=False
 Be loud!
--input Input file in Tab format.
--output Output file in Tab format.
--threshold=-1.0
 Threshold for the site, defaults to -1.0
--type=CD

Type of snoRNA , defaults to CD

Possible choices: CD, HACA

v. Calculate features

For each of the site calculate features: accessibility and flanks composition. The PLEXY is already calculated.

vi. Calculate probability

Calculate probability of snoRNA methylation being functional

usage: rg_calculate_probability [-h] [-v] --input INPUT --output OUTPUT
                                --accessibility ACCESSIBILITY --flanks FLANKS
                                --model MODEL
Options:
-v=False, --verbose=False
 Be loud!
--input Input file in tab format. Defaults to sys.stdin.
--output Output file in tab format. Defaults to sys.stdout.
--accessibility
 File with calculated accessibility
--flanks File with calculated flanks composition
--model Statsmodel binary file with the model for snoRNA

vii. Make plots

Make some useful plots for results

usage: rg_make_stats_for_results [-h] [-v] --results-probability-complex
                                 RESULTS_PROBABILITY_COMPLEX --results-raw
                                 RESULTS_RAW --snoRNAs SNORNAS --type
                                 {CD,HACA} [--dir DIR] --genome-dir GENOME_DIR
Options:
-v=False, --verbose=False
 Be loud!
--results-probability-complex
 Main part of the results
--results-raw Row results
--snoRNAs Table with snoRNAs
--type

Type of snoRNA

Possible choices: CD, HACA

--dir=Plots Directory to store plots, defaults to Plots
--genome-dir Path to genome directory where the chromosomes are stored

viii. Convert to BED

Convert Probability results into bed for annotations

usage: rg_convert_to_bed [-h] [-v] --input INPUT --output OUTPUT
Options:
-v=False, --verbose=False
 Be loud!
--input Input file
--output Output file

ix. Annotate results

Annotate found snoRNA target sites

usage: rg_annotate_positions [-h] [-v] --input INPUT [--output OUTPUT]
                             --regions REGIONS --genes GENES
                             [--snoRNAs SNORNAS] --repeats REPEATS
Options:
-v=False, --verbose=False
 Be loud!
--input Input file in tab format.
--output Output file in tab format.
--regions GFF file with annotations for different gene regions eg. UTRs
--genes Positions of all genes in GFF format
--snoRNAs GFF file with annotations for snoRNAs in the same format as genes file
--repeats GTF file with annotations for repeats in the format from rmsk table in UCSC

Miscellaneous

Those scripts are not used (yet) or are used to calculate HACA-box snoRNAs chimeras. For the sake of documentation they are placed here.

rg-annotate-bed.py @Author: Rafal Gumienny (gumiennr@unibas.ch) @Created: 12-Dec-12 @Description: Annotate bed file with another bed file containing annotations @Usage: python rg-annotate-bed.py -h

usage: rg_annotate_results_bed [-h] [-v] --input FILE [--output FILE]
                               --annotations FILE [--fraction FLOAT]
                               [--placeholder STRING] [--un_stranded]
                               [--filter-by FILTER_BY]
Options:
-v=0, --verbose=0
 Print more verbose messages for each additional verbose level.
--input a bed file that you want to annotate
--output=output.tab
 an output table with annotations
--annotations a bed file with annotations
--fraction=0.1 Fraction of read that must overlap the feature to be accepted
--placeholder=.
 A placeholder for empty annotations
--un_stranded=False
 Pass if your protocol is un-stranded
--filter-by Filter by these (coma separated) list of annotation types

########################## FILE DESCRIPTION ###################################################

BED FILE FOR WITH ANNOTATION EXAMPLE
1 24740163 24740215 miRNA:ENST00000003583 0 - 1 24727808 24727946 miRNA:ENST00000003583 0 - 1 24710391 24710493 miRNA:ENST00000003583 0 -

fields: chr start end annot_type:annot_name num strand”]

INPUT BED FILE EXAMPLE
1 24685109 24687340 ENST00000003583 0 - 1 24687531 24696163 ENST00000003583 0 - 1 24696329 24700191 ENST00000003583 0 -

########################## FILE DESCRIPTION ###################################################

usage: rg_append_genes_and_names [-h] [-v] --input INPUT [--output OUTPUT]
                                 [--mapping MAPPING]
Options:
-v=False, --verbose=False
 Be loud!
--input Input file in tab format.
--output Output file in tab format.
--mapping=/import/bc2/home/zavolan/gumiennr/Pipelines/Pipelines/pipeline_snoRNASearch/data/Annotations/transcript_2_gene_mapping.txt.clean
 Mapping from ENSEMBL transcript to gene, defaults to /import/bc2/home/zavolan/gumiennr/Pipelines/Pipelines/pipeline_snoRNASearch/data/Annotations/transcript_2_gene_mapping.txt.clean

RNA5-8S5|NR_003285.2 15 30 SNORD16 1 + RNA5-8S5|NR_003285.2 86 105 SNORD16 1 + RNA28S5|NR_003287.2 1563 1582 SNORD56B 1 +

SNORD50A|chr7|+|57640816|57640830|20|20 SNORD50A TCATGCTTTGTGTTGTGAAGACCGCCTGGGACTACCGGGCAGGGTGTAGTAGGCA SNORD50A|chr7|+|68527467|68527482|20|20 SNORD50A ACTGAAGAAATTCAGTGAAATGCGGGTAAACGGCGGGAGTAACTATGACTCTCTTA SNORD50A|chr7|+|68527638|68527654|20|20 SNORD50A AATCAGCGGGGAAAGAAGACCCTGTTGAGTTTGACTCTAGTCTGGCATGGTGAAGAG

usage: rg_check_hybrids_with_rnasnoop [-h] [-v] --input INPUT --output OUTPUT
                                      [--rnasnoop RNASNOOP] --snoRNA-paths
                                      SNORNA_PATHS
Options:
-v=False, --verbose=False
 Be loud!
--input Input file in tab format.
--output Output file in tab format.
--rnasnoop=RNAsnoop
 Path to RNAsnoop binary, defaults to RNAsnoop
--snoRNA-paths Path to snoRNAs stems

Compare output results with original data

usage: rg_compare_results_to_original [-h] [-v] --input INPUT [--only-chrom]
Options:
-v=False, --verbose=False
 Be loud!
--input Bed file with special fields
--only-chrom=False
 If there is a bed file with only chromosome information use this flag

Convert result to asmbed and in the same time extend sequences to be equal desired length

usage: rg_convert_to_asmbed [-h] [-v] --input INPUT [--output OUTPUT]
                            [--length LENGTH]
Options:
-v=False, --verbose=False
 Be loud!
--input Input table
--output=output.asmbed
 Output asmbed file , defaults to output.asmbed
--length=50 Desired read length, defaults to 50

Convert result to coordinate file

usage: rg_convert_to_coords [-h] [-v] --input INPUT --sequences SEQUENCES
                            [--output OUTPUT]
Options:
-v=False, --verbose=False
 Be loud!
--input Input result file
--sequences File with sequences
--output=coords.tab
 Output coordinate file , defaults to coords.tab

convert unmapped sequences to fasta

usage: rg_convert_unmapped_to_fasta [-h] [-v] --input INPUT --output OUTPUT
Options:
-v=False, --verbose=False
 Be loud!
--input Coma separated list of files
--output Output name

Make some plots of the results

usage: rg_correlate_expression_with_hybrids [-h] [-v] --input INPUT
                                            [--clustered] --expressions
                                            EXPRESSIONS [--level LEVEL]
                                            [--top TOP]
Options:
-v=False, --verbose=False
 Be loud!
--input Input table
--clustered=False
 Is the result clustered?
--expressions File with miRNA expression
--level=0 Expression level (in log scale), defaults to 0
--top=20 Show top mirnas and number of hybrids found, defaults to 20

Filter reads based on annotation in the last column

usage: rg_filter_reads_for_clustering [-h] [-v] --input INPUT --output OUTPUT
                                      [--annotations ANNOTATIONS]
Options:
-v=False, --verbose=False
 Be loud!
--input Input table
--output Output table
--annotations=None
 Coma separated list of annotations to consider, defaults to None

Generate fasta files for PLEXY from snoRNA input

usage: rg_generate_haca_stems_for_rnasnoop [-h] [-v] --input INPUT --type
                                           {CD,HACA} [--dir DIR]
                                           [--switch-boxes]
Options:
-v=False, --verbose=False
 Be loud!
--input Input file in tab format.
--type

Type of snoRNA

Possible choices: CD, HACA

--dir=Plexy Directory to put output , defaults to Plexy
--switch-boxes=False
 If the CD box is located wrongly it will try to relabel it

usage: rg_get_search_info [-h] [-v] --snoRNAs SNORNAS --input INPUT
                          [--output OUTPUT] --type {CD,HACA} [--window WINDOW]
                          [--smooth-window SMOOTH_WINDOW] [--dir DIR]
Options:
-v=False, --verbose=False
 Be loud!
--snoRNAs Table with snoRNAs
--input Input file in tab format.
--output Output file in tab format.
--type

Type of snoRNA

Possible choices: CD, HACA

--window=100 Window, defaults to 100
--smooth-window=1
 Smoothing window length, defaults to 1
--dir=Plots Direcory for plots, defaults to Plots

Generate fasta file from snoRNA input

usage: rg_get_snoRNA_gff [-h] [-v] --input INPUT [--output OUTPUT]
Options:
-v=False, --verbose=False
 Be loud!
--input Input file in tab format.
--output Output file in fasta format.

Generate fasta file from snoRNA input

usage: rg_make_cd_snoRNAs_families [-h] [-v] --input INPUT [--output OUTPUT]
                                   --type {CD,HACA} [--switch-boxes]
                                   [--length LENGTH]
Options:
-v=False, --verbose=False
 Be loud!
--input Input file in tab format.
--output Output file in fasta format.
--type

Type of snoRNA

Possible choices: CD, HACA

--switch-boxes=False
 If the CD box is located wrongly it will try to relabel it
--length=20 Length of interaction element (seed) to be extracted, defaults to 20

Shuffle fasta sequences in the file

usage: rg_shuffle_fasta_sequences [-h] [-v] --input INPUT [--output OUTPUT]
                                  [--let-size LET_SIZE]
Options:
-v=False, --verbose=False
 Be loud!
--input Input fasta file
--output=output_shuffled.fa
 Output fasta file , defaults to output_shuffled.fa
--let-size=2 Let size to preserve, defaults to 2

Split text file into files with desired number of lines

usage: rg_split_file_into_chunks [-h] [-v] --input INPUT --lines LINES
                                 [--prefix PREFIX] [--dir DIR]
                                 [--suffix SUFFIX]
Options:
-v=False, --verbose=False
 Be loud!
--input Input file in txt format. Defaults to sys.stdin.
--lines Number of lines in each file
--prefix=file_ Prefix to the file, defaults to file_
--dir=./ Directory to put files, defaults to ./
--suffix=.part Suffix to the file, defaults to .part