Pipeline flow¶

CD-box snoRNAs¶

1. Split the input¶

At first split the input unmapped sequences into manageable chunks.

Split fasta file into batches

usage: rg_split_fasta [-h] [-v] [--input INPUT] [--output-dir OUTPUT_DIR]
                      [--batch-size BATCH_SIZE] [--prefix PREFIX]
                      [--suffix SUFFIX]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input=<open file '<stdin>', mode 'r' at 0x7f0cd5f1d0c0>`
	Input file in fasta format. Defaults to sys.stdin.
`--output-dir=./`
	Output directory for split files. Defaults to .
`--batch-size=100`
	Batch size to split, defaults to 100
`--prefix=part_`	Prefix to file name , defaults to part_
`--suffix=inputfasta`
	Suffix (extension) to the file name , defaults to inputfasta

2. Generate various files from snoRNAs¶

i. Make FASTA¶

Generate fasta file from snoRNA input
usage: rg_generate_fasta [-h] [-v] --input INPUT [--output OUTPUT] --type
                         {CD,HACA} [--switch-boxes]
Options:

-v=False, --verbose=False

Be loud!

--input Input file in tab format.

--output Output file in fasta format.

--type
Type of snoRNA

Possible choices: CD, HACA

--switch-boxes=False

If the CD box is located wrongly it will try to relabel it

ii. Generate separate files¶

Generate fasta files for PLEXY from snoRNA input

usage: rg_generate_input_for_plexy_or_rnasnoop [-h] [-v] --input INPUT --type
                                               {CD,HACA} [--dir DIR]
                                               [--switch-boxes]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input file in tab format.
`--type`	Type of snoRNA. If CD is chosen an input for PLEXY will be generated. If HACA is chosen two stems for RNASnoop will be saved. Possible choices: CD, HACA
`--dir=Input`	Directory to put output , defaults to Plexy
`--switch-boxes=False`
	If the CD box is located wrongly it will try to relabel it

iii. Make BED¶

Generate fasta file from snoRNA input

usage: rg_generate_snoRNA_bed [-h] [-v] --input INPUT [--output OUTPUT] --type
                              {CD,HACA} [--switch-boxes]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input file in tab format.
`--output`	Output file in fasta format.
`--type`	Type of snoRNA Possible choices: CD, HACA
`--switch-boxes=False`
	If the CD box is located wrongly it will try to relabel it

3. Annotate with snoRNAs¶

Annotate input BED file used for generation of clusters with snoRNAs.

Annotate bed file with another bed file containing annotations

usage: rg_annotate_bed [-h] [-v] --input FILE [--output FILE] --annotations
                       FILE [--fraction FLOAT] [--placeholder STRING]
                       [--un_stranded] [--filter-by FILTER_BY]

Options:

`-v=0, --verbose=0`
	Print more verbose messages for each additional verbose level.
`--input`	a bed file that you want to annotate
`--output=output.tab`
	an output table with annotations
`--annotations`	a bed file with annotations
`--fraction=0.25`
	Fraction of read that must overlap the feature to be accepted
`--placeholder=.`
	A placeholder for empty annotations
`--un_stranded=False`
	Pass if your protocol is un-stranded
`--filter-by`	Filter by these (coma separated) list of annotation types

########################## FILE DESCRIPTION ###################################################

BED FILE FOR WITH ANNOTATION EXAMPLE

1 24740163 24740215 miRNA:ENST00000003583 0 - 1 24727808 24727946 miRNA:ENST00000003583 0 - 1 24710391 24710493 miRNA:ENST00000003583 0 -

fields: chr start end annot_type:annot_name num strand”]

INPUT BED FILE EXAMPLE

1 24685109 24687340 ENST00000003583 0 - 1 24687531 24696163 ENST00000003583 0 - 1 24696329 24700191 ENST00000003583 0 -

########################## FILE DESCRIPTION ###################################################

4. Calculate snoRNA expression¶

Based on annotations calculate RPKM values for each snoRNA and filter all that falls below given quantile.

RPKM = (10^9 * C)/(N * L)

where:: C = Number of reads mapped to a gene N = Total mapped reads in the experiment (library size) L = Length of the feature (in this case snoRNA length)

usage: rg_calculate_snoRNA_RPKM [-h] [-v] --input INPUT [--output OUTPUT]
                                --library LIBRARY --snoRNAs SNORNAS
                                [--quantile QUANTILE] [--type {CD,HACA}]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Part of the library that is annotated as snoRNA
`--output`	Output file in tab format.
`--library`	Library from which the annotations were generated (in bed format)
`--snoRNAs`	BED file with snoRNAs
`--quantile=0.25`
	Quantile for the expression cut-off, defaults to 0.25
`--type=CD`	Type of snoRNA, defaults to CD Possible choices: CD, HACA

5. Prepare anchors¶

Prepare anchor sequences from provided fasta

usage: rg_prepare_anchors [-h] [-v] [--fasta-to-anchor FASTA_TO_ANCHOR]
                          [--anchor-length ANCHOR_LENGTH] [--output OUTPUT]
                          --expressed-snoRNAs EXPRESSED_SNORNAS

Options:

`-v=False, --verbose=False`
	Be loud!
`--fasta-to-anchor`
	Fasta to anchor
`--anchor-length=12`
	Anchor length, defaults to 12
`--output`	Output file name
`--expressed-snoRNAs`
	A list with expressed snoRNAs with RPKMs in form of: snoR_ID RPKM

6. Build Bowtie2 index¶

i. Cluster reads¶

Cluster reads into more convinient bed file

usage: rg_cluster_reads [-h] [-v] --input INPUT [--bed]
                        [--cluster-size CLUSTER_SIZE] [--overlap OVERLAP]
                        [--expand-cluster EXPAND_CLUSTER]
                        [--expand-read EXPAND_READ] [--output OUTPUT]
                        [--asmbed] [--rRNAs RRNAS] [--tRNAs TRNAS]
                        [--snRNAs SNRNAS]
                        [--filter-by FILTER_BY | --filter-except FILTER_EXCEPT]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input file in special asmbed format or in bed format
`--bed=False`	Specifies if the input file is in bed format
`--cluster-size=1`
	Number of reads necessary for a group to be considered a cluster. eg. 2 returns all groups with 2 or more overlapping reads, defaults to 1
`--overlap=-1`	Distance in basepairs for two reads to be in the same cluster. For instance 20 would group all reads with 20bp of each other. Negative number means overlap eg. -10 - read must overlap at leas 10 basepairs, defaults to -1
`--expand-cluster=0`
	Expand cluster in both directions, defaults to 0
`--expand-read=15`
	Expand read in both directions (some alternative to expand cluseter), defaults to 15
`--output=output.bed`
	Output file in bed format , defaults to output.bed
`--asmbed=False`	Write in asmbed format for fasta extraction
`--rRNAs`	rRNAs to add in the end of the clusters
`--tRNAs`	tRNAs to add in the end of the clusters
`--snRNAs`	snRNAs to add in the end of the clusters
`--filter-by`	Keep only read with these tags in read_ids. Input is coma separated list of tags
`--filter-except`
	Keep read except with these tags in read_ids. Input is coma separated list of tags

ii. Make FASTA¶

Prepare FASTA file from clustered reads

Given bed file extract sequences according to chromosome and strand and save it as additional column in input file or fasta

usage: rg_extract_sequences [-h] [-v] [--input INPUT] [--output OUTPUT]
                            [--format {bed,fasta}]
                            [--sequence-length SEQUENCE_LENGTH] --genome-dir
                            GENOME_DIR [--window-left WINDOW_LEFT]
                            [--window-right WINDOW_RIGHT]
                            [--adjust-coordinates]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input=<open file '<stdin>', mode 'r' at 0x7f0cd5f1d0c0>`
	Input file in Bed format. Defaults to stdin
`--output=<open file '<stdout>', mode 'w' at 0x7f0cd5f1d150>`
	Output file in Bed format. Defaults to stdout
`--format=bed`	Output format, defaults to bed Possible choices: bed, fasta
`--sequence-length`
	Final length of sequence to extract independently of coordinates.
`--genome-dir`	Directory where the fasta sequences with all the chromosomes are stored
`--window-left=0`
	Add nucleotides to the left (upstream). This option does not work if sequence-length is specified, defaults to 0
`--window-right=0`
	Add nucleotides to the right (downstream). This option does not work if sequence-length is specified, defaults to 0
`--adjust-coordinates=False`
	Adjust coordinates to new values dictated by windows length, defaults to False

iii. Build index¶

The index is build with following command:

bowtie2-build input.fa path/to/index/bowtie_index 2> /dev/null

7. Run analysis¶

For each part split in first task an analysis is run.

i. Search anchors¶

For each read in the file check if there is an anchor sequence and if this is the case make local alignment (SW) for each associated sequence. As a sequence in the read take only that with the best score.

usage: rg_search_anchor_and_make_alignments [-h] [-v] [--anchors ANCHORS]
                                            [--anchor-sequences ANCHOR_SEQUENCES]
                                            [--reads READS] [--match MATCH]
                                            [--mismatch MISMATCH]
                                            [--gap-open GAP_OPEN]
                                            [--gap-extend GAP_EXTEND]
                                            [--output OUTPUT] [--RNase-T1]

Options:

`-v=False, --verbose=False`
	Be loud!
`--anchors`	File with anchors (tab-separated)
`--anchor-sequences`
	Sequences from which anchors were generated
`--reads`	File with reads
`--match=2`	Match score, defaults to 2
`--mismatch=-5`	Mismatch penalty, defaults to -5
`--gap-open=-6`	Open gap penalty, defaults to -6
`--gap-extend=-4`
	Gap extension penalty, defaults to -4
`--output`	Output table
`--RNase-T1=False`
	Indicates if in the experiment RNase T1 was used

ii. Make statistics¶

This is set of two tasks:

Merging the files from anchor search
Making statistics with following script:

Make statistic, prepare plots and evaluate thresholds

usage: rg_make_stats_for_search [-h] [-v] --input INPUT [--output OUTPUT]
                                [--dir DIR] [--length LENGTH] [--fpr FPR]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input file in tab format.
`--output`	Output file in tab format.
`--dir=Plots`	Directory to store the plots , defaults to Plots
`--length=15`	Threshold for length of the target site, defaults to 15
`--fpr=0.05`	False positive rate threshold, defaults to 0.05

iii. Convert to FASTA¶

Convert output table from alignment search into fasta

usage: rg_convert_tab_to_fasta [-h] [-v] [--input INPUT] [--output OUTPUT]
                               [--stats STATS] [--length LENGTH]
                               [--assign-score-threshold] [--filter-ambiguous]
                               [--five-prime-adapter FIVE_PRIME_ADAPTER]
                               [--three-prime-adapter THREE_PRIME_ADAPTER]
                               [--five-prime-adapter-threshold FIVE_PRIME_ADAPTER_THRESHOLD]
                               [--three-prime-adapter-threshold THREE_PRIME_ADAPTER_THRESHOLD]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input table
`--output`	Output fasta file
`--stats`	Undocumented
`--length=15`	Length of the target site to keep, defaults to 15
`--assign-score-threshold=False`
	Undocumented
`--filter-ambiguous=False`
	Filter reads that can be assigned to more than one snoRNA
`--five-prime-adapter`
	Five prime adapter sequence used in experiment - will be used to remove reads that are similar
`--three-prime-adapter`
	Three prime adapter sequence used in experiment - will be used to remove reads that are similar
`--five-prime-adapter-threshold=0.8`
	Threshold of the identity to the 5’ adapter, defaults to 0.8
`--three-prime-adapter-threshold=0.8`
	Threshold of the identity to the 3’ adapter, defaults to 0.8

iv. Map reads¶

Map target parts to the cluster with following command:

bowtie2 -x ./index/bowtie_index -f -D100 -L 13 -i C,1 --local -k 10 -U input.anchorfasta -S output.sam

v. Convert result to BED¶

Convert result from mapping into BED file with following command:

samtools view -S input.sam -b -u | bamToBed -tag AS | grep -P "\t\+" > output

vi. Filter BED¶

Filter bed file based on the alignment score/number of reads in cluster/number of mutations

usage: rg_filter_bed [-h] [-v] --input INPUT --output OUTPUT
                     [--filter-multimappers]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input bed file with special fields
`--output`	Output file
`--filter-multimappers=False`
	Filter chimeras that can be mapped to multiple places in the genome (with exception of mapping to cannonical targets)

vi. Reasign chromosome¶

From the bed from FilterBed step get the positions of the found target sites in terms of real chromosomes not clusters.

usage: rg_get_true_chromosome_positions [-h] [-v] [--input INPUT]
                                        [--output OUTPUT]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input=<open file '<stdin>', mode 'r' at 0x7f0cd5f1d0c0>`
	Input file in special bed format. Defaults to sys.stdin.
`--output=<open file '<stdout>', mode 'w' at 0x7f0cd5f1d150>`
	Output file in special bed format. Defaults to sys.stdout.

vii. Append sequence¶

The same script as for the FASTA extraction from Bowtie2 index.

viii. Calculate PLEXY¶

RNA5-8S5|NR_003285.2 15 30 SNORD16 1 + RNA5-8S5|NR_003285.2 86 105 SNORD16 1 + RNA28S5|NR_003287.2 1563 1582 SNORD56B 1 +

SNORD50A|chr7|+|57640816|57640830|20|20 SNORD50A TCATGCTTTGTGTTGTGAAGACCGCCTGGGACTACCGGGCAGGGTGTAGTAGGCA SNORD50A|chr7|+|68527467|68527482|20|20 SNORD50A ACTGAAGAAATTCAGTGAAATGCGGGTAAACGGCGGGAGTAACTATGACTCTCTTA SNORD50A|chr7|+|68527638|68527654|20|20 SNORD50A AATCAGCGGGGAAAGAAGACCCTGTTGAGTTTGACTCTAGTCTGGCATGGTGAAGAG

usage: rg_check_hybrids_with_plexy [-h] [-v] --input INPUT --output OUTPUT
                                   [--snoRNA-paths SNORNA_PATHS]
                                   [--plexy-tmp PLEXY_TMP] --plexy-bin
                                   PLEXY_BIN

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input file in tab format.
`--output`	Output file in tab format.
`--snoRNA-paths=./Plexy/`
	Path to snoRNAs with Plexy , defaults to ./Plexy/
`--plexy-tmp=temp/`
	Plexy temporary directory , defaults to temp/
`--plexy-bin`	Path to PLEXY binary

ix. Calculate RNAduplex¶

RNA5-8S5|NR_003285.2 15 30 SNORD16 1 + RNA5-8S5|NR_003285.2 86 105 SNORD16 1 + RNA28S5|NR_003287.2 1563 1582 SNORD56B 1 +

SNORD50A|chr7|+|57640816|57640830|20|20 SNORD50A TCATGCTTTGTGTTGTGAAGACCGCCTGGGACTACCGGGCAGGGTGTAGTAGGCA SNORD50A|chr7|+|68527467|68527482|20|20 SNORD50A ACTGAAGAAATTCAGTGAAATGCGGGTAAACGGCGGGAGTAACTATGACTCTCTTA SNORD50A|chr7|+|68527638|68527654|20|20 SNORD50A AATCAGCGGGGAAAGAAGACCCTGTTGAGTTTGACTCTAGTCTGGCATGGTGAAGAG

usage: rg_check_hybrids_with_rnaduplex [-h] [-v] --input INPUT --output OUTPUT
                                       [--snoRNA-paths SNORNA_PATHS]
                                       [--RNAduplex-bin RNADUPLEX_BIN]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input file in tab format.
`--output`	Output file in tab format.
`--snoRNA-paths=./Plexy/`
	Path to snoRNAs with Plexy , defaults to ./Plexy/
`--RNAduplex-bin=RNAduplex`
	Path to RNAduplex binary, defaults to RNAcofold

8. Analyse RNAduplex results¶

RNAduplex and PLEXY results goes slightly different analysis.

i. Merge results¶

Nothing to add

ii. Cluster results¶

Cluster results according to the position of the hit and miRNA The input file looks like that:

chr6 99846856 99846871 2628039_1-Unique-1:hsa-miR-129-3p:8 30 - chr3 30733346 30733368 2630171_1-Unique-1:hsa-miR-93:N 36 + chr17 3627403 3627417 2632714_1-Unique-1:hsa-miR-186:N 28 + chr17 3627403 3627417 2639898_1-Unique-1:hsa-miR-16:N 28 +

usage: rg_cluster_results [-h] [-v] --input INPUT [--output OUTPUT]
                          [--cluster-size CLUSTER_SIZE] [--overlap OVERLAP]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input table file in bed like format
`--output=output.tab`
	Output table , defaults to output.tab
`--cluster-size=1`
	Number of reads necessary for a group to be considered a cluster. eg. 2 returns all groups with 2 or more overlapping reads, defaults to 1
`--overlap=-40`	Distance in basepairs for two reads to be in the same cluster. For instance 20 would group all reads with 20bp of each other. Negative number means overlap eg. -10 - read must overlap at leas 10 basepairs, defaults to -1

iii. Annotate results¶

Annotate found snoRNA target sites

usage: rg_annotate_positions [-h] [-v] --input INPUT [--output OUTPUT]
                             --regions REGIONS --genes GENES
                             [--snoRNAs SNORNAS] --repeats REPEATS

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input file in tab format.
`--output`	Output file in tab format.
`--regions`	GFF file with annotations for different gene regions eg. UTRs
`--genes`	Positions of all genes in GFF format
`--snoRNAs`	GFF file with annotations for snoRNAs in the same format as genes file
`--repeats`	GTF file with annotations for repeats in the format from rmsk table in UCSC

iv. Make statistics¶

Make some useful plots for RNAduplex results

usage: rg_make_plots_for_rnaduplex [-h] [-v] --input INPUT --snoRNAs SNORNAS
                                   --type {CD,HACA} [--dir DIR]
                                   [--threshold THRESHOLD]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input file in TAB
`--snoRNAs`	Table with snoRNAs
`--type`	Type of snoRNA Possible choices: CD, HACA
`--dir=Plots`	Directory to store plots, defaults to Plots
`--threshold=-25.0`
	Threshold for RNAduplex energy, defaults to -25.0

9. Analyse PLEXY¶

i. Merge results¶

cat output/*.scorebed > results_with_score.tab

ii. Merge raw results¶

cat output/*.truechrombed > raw_reds_results.tab

iii. Append RPKM¶

Append rpkm values to the plexy predictions

usage: rg_add_rpkm_to_score [-h] [-v] --input INPUT [--output OUTPUT] --rpkm
                            RPKM --annotated-reads ANNOTATED_READS
                            [--type {CD,HACA}]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input file in tab format.
`--output`	Output file in tab format.
`--rpkm`	File with rpkms of snoRNAs
`--annotated-reads`
	Mapped reads annotated as snoRNAs
`--type=CD`	Type of snoRNAs , defaults to CD Possible choices: CD, HACA

iv. Aggregate results by site¶

Divide plexy output into positives and negatives set

usage: rg_aggregate_scored_results [-h] [-v] --input INPUT [--output OUTPUT]
                                   [--threshold THRESHOLD] [--type {CD,HACA}]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input file in Tab format.
`--output`	Output file in Tab format.
`--threshold=-1.0`
	Threshold for the site, defaults to -1.0
`--type=CD`	Type of snoRNA , defaults to CD Possible choices: CD, HACA

v. Calculate features¶

For each of the site calculate features: accessibility and flanks composition. The PLEXY is already calculated.

vi. Calculate probability¶

Calculate probability of snoRNA methylation being functional

usage: rg_calculate_probability [-h] [-v] --input INPUT --output OUTPUT
                                --accessibility ACCESSIBILITY --flanks FLANKS
                                --model MODEL

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input file in tab format. Defaults to sys.stdin.
`--output`	Output file in tab format. Defaults to sys.stdout.
`--accessibility`
	File with calculated accessibility
`--flanks`	File with calculated flanks composition
`--model`	Statsmodel binary file with the model for snoRNA

vii. Make plots¶

Make some useful plots for results

usage: rg_make_stats_for_results [-h] [-v] --results-probability-complex
                                 RESULTS_PROBABILITY_COMPLEX --results-raw
                                 RESULTS_RAW --snoRNAs SNORNAS --type
                                 {CD,HACA} [--dir DIR] --genome-dir GENOME_DIR

Options:

`-v=False, --verbose=False`
	Be loud!
`--results-probability-complex`
	Main part of the results
`--results-raw`	Row results
`--snoRNAs`	Table with snoRNAs
`--type`	Type of snoRNA Possible choices: CD, HACA
`--dir=Plots`	Directory to store plots, defaults to Plots
`--genome-dir`	Path to genome directory where the chromosomes are stored

viii. Convert to BED¶

Convert Probability results into bed for annotations

usage: rg_convert_to_bed [-h] [-v] --input INPUT --output OUTPUT

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input file
`--output`	Output file

ix. Annotate results¶

Annotate found snoRNA target sites

usage: rg_annotate_positions [-h] [-v] --input INPUT [--output OUTPUT]
                             --regions REGIONS --genes GENES
                             [--snoRNAs SNORNAS] --repeats REPEATS

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input file in tab format.
`--output`	Output file in tab format.
`--regions`	GFF file with annotations for different gene regions eg. UTRs
`--genes`	Positions of all genes in GFF format
`--snoRNAs`	GFF file with annotations for snoRNAs in the same format as genes file
`--repeats`	GTF file with annotations for repeats in the format from rmsk table in UCSC

Miscellaneous¶

Those scripts are not used (yet) or are used to calculate HACA-box snoRNAs chimeras. For the sake of documentation they are placed here.

rg-annotate-bed.py @Author: Rafal Gumienny (gumiennr@unibas.ch) @Created: 12-Dec-12 @Description: Annotate bed file with another bed file containing annotations @Usage: python rg-annotate-bed.py -h

usage: rg_annotate_results_bed [-h] [-v] --input FILE [--output FILE]
                               --annotations FILE [--fraction FLOAT]
                               [--placeholder STRING] [--un_stranded]
                               [--filter-by FILTER_BY]

Options:

`-v=0, --verbose=0`
	Print more verbose messages for each additional verbose level.
`--input`	a bed file that you want to annotate
`--output=output.tab`
	an output table with annotations
`--annotations`	a bed file with annotations
`--fraction=0.1`	Fraction of read that must overlap the feature to be accepted
`--placeholder=.`
	A placeholder for empty annotations
`--un_stranded=False`
	Pass if your protocol is un-stranded
`--filter-by`	Filter by these (coma separated) list of annotation types

########################## FILE DESCRIPTION ###################################################

BED FILE FOR WITH ANNOTATION EXAMPLE

1 24740163 24740215 miRNA:ENST00000003583 0 - 1 24727808 24727946 miRNA:ENST00000003583 0 - 1 24710391 24710493 miRNA:ENST00000003583 0 -

fields: chr start end annot_type:annot_name num strand”]

INPUT BED FILE EXAMPLE

1 24685109 24687340 ENST00000003583 0 - 1 24687531 24696163 ENST00000003583 0 - 1 24696329 24700191 ENST00000003583 0 -

########################## FILE DESCRIPTION ###################################################

usage: rg_append_genes_and_names [-h] [-v] --input INPUT [--output OUTPUT]
                                 [--mapping MAPPING]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input file in tab format.
`--output`	Output file in tab format.
`--mapping=/import/bc2/home/zavolan/gumiennr/Pipelines/Pipelines/pipeline_snoRNASearch/data/Annotations/transcript_2_gene_mapping.txt.clean`
	Mapping from ENSEMBL transcript to gene, defaults to /import/bc2/home/zavolan/gumiennr/Pipelines/Pipelines/pipeline_snoRNASearch/data/Annotations/transcript_2_gene_mapping.txt.clean

RNA5-8S5|NR_003285.2 15 30 SNORD16 1 + RNA5-8S5|NR_003285.2 86 105 SNORD16 1 + RNA28S5|NR_003287.2 1563 1582 SNORD56B 1 +

SNORD50A|chr7|+|57640816|57640830|20|20 SNORD50A TCATGCTTTGTGTTGTGAAGACCGCCTGGGACTACCGGGCAGGGTGTAGTAGGCA SNORD50A|chr7|+|68527467|68527482|20|20 SNORD50A ACTGAAGAAATTCAGTGAAATGCGGGTAAACGGCGGGAGTAACTATGACTCTCTTA SNORD50A|chr7|+|68527638|68527654|20|20 SNORD50A AATCAGCGGGGAAAGAAGACCCTGTTGAGTTTGACTCTAGTCTGGCATGGTGAAGAG

usage: rg_check_hybrids_with_rnasnoop [-h] [-v] --input INPUT --output OUTPUT
                                      [--rnasnoop RNASNOOP] --snoRNA-paths
                                      SNORNA_PATHS

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input file in tab format.
`--output`	Output file in tab format.
`--rnasnoop=RNAsnoop`
	Path to RNAsnoop binary, defaults to RNAsnoop
`--snoRNA-paths`	Path to snoRNAs stems

Compare output results with original data

usage: rg_compare_results_to_original [-h] [-v] --input INPUT [--only-chrom]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Bed file with special fields
`--only-chrom=False`
	If there is a bed file with only chromosome information use this flag

Convert result to asmbed and in the same time extend sequences to be equal desired length

usage: rg_convert_to_asmbed [-h] [-v] --input INPUT [--output OUTPUT]
                            [--length LENGTH]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input table
`--output=output.asmbed`
	Output asmbed file , defaults to output.asmbed
`--length=50`	Desired read length, defaults to 50

Convert result to coordinate file

usage: rg_convert_to_coords [-h] [-v] --input INPUT --sequences SEQUENCES
                            [--output OUTPUT]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input result file
`--sequences`	File with sequences
`--output=coords.tab`
	Output coordinate file , defaults to coords.tab

convert unmapped sequences to fasta

usage: rg_convert_unmapped_to_fasta [-h] [-v] --input INPUT --output OUTPUT

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Coma separated list of files
`--output`	Output name

Make some plots of the results

usage: rg_correlate_expression_with_hybrids [-h] [-v] --input INPUT
                                            [--clustered] --expressions
                                            EXPRESSIONS [--level LEVEL]
                                            [--top TOP]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input table
`--clustered=False`
	Is the result clustered?
`--expressions`	File with miRNA expression
`--level=0`	Expression level (in log scale), defaults to 0
`--top=20`	Show top mirnas and number of hybrids found, defaults to 20

Filter reads based on annotation in the last column

usage: rg_filter_reads_for_clustering [-h] [-v] --input INPUT --output OUTPUT
                                      [--annotations ANNOTATIONS]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input table
`--output`	Output table
`--annotations=None`
	Coma separated list of annotations to consider, defaults to None

Generate fasta files for PLEXY from snoRNA input

usage: rg_generate_haca_stems_for_rnasnoop [-h] [-v] --input INPUT --type
                                           {CD,HACA} [--dir DIR]
                                           [--switch-boxes]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input file in tab format.
`--type`	Type of snoRNA Possible choices: CD, HACA
`--dir=Plexy`	Directory to put output , defaults to Plexy
`--switch-boxes=False`
	If the CD box is located wrongly it will try to relabel it

usage: rg_get_search_info [-h] [-v] --snoRNAs SNORNAS --input INPUT
                          [--output OUTPUT] --type {CD,HACA} [--window WINDOW]
                          [--smooth-window SMOOTH_WINDOW] [--dir DIR]

Options:

`-v=False, --verbose=False`
	Be loud!
`--snoRNAs`	Table with snoRNAs
`--input`	Input file in tab format.
`--output`	Output file in tab format.
`--type`	Type of snoRNA Possible choices: CD, HACA
`--window=100`	Window, defaults to 100
`--smooth-window=1`
	Smoothing window length, defaults to 1
`--dir=Plots`	Direcory for plots, defaults to Plots

Generate fasta file from snoRNA input

usage: rg_get_snoRNA_gff [-h] [-v] --input INPUT [--output OUTPUT]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input file in tab format.
`--output`	Output file in fasta format.

Generate fasta file from snoRNA input

usage: rg_make_cd_snoRNAs_families [-h] [-v] --input INPUT [--output OUTPUT]
                                   --type {CD,HACA} [--switch-boxes]
                                   [--length LENGTH]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input file in tab format.
`--output`	Output file in fasta format.
`--type`	Type of snoRNA Possible choices: CD, HACA
`--switch-boxes=False`
	If the CD box is located wrongly it will try to relabel it
`--length=20`	Length of interaction element (seed) to be extracted, defaults to 20

Shuffle fasta sequences in the file

usage: rg_shuffle_fasta_sequences [-h] [-v] --input INPUT [--output OUTPUT]
                                  [--let-size LET_SIZE]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input fasta file
`--output=output_shuffled.fa`
	Output fasta file , defaults to output_shuffled.fa
`--let-size=2`	Let size to preserve, defaults to 2

Split text file into files with desired number of lines

usage: rg_split_file_into_chunks [-h] [-v] --input INPUT --lines LINES
                                 [--prefix PREFIX] [--dir DIR]
                                 [--suffix SUFFIX]

Options:

`-v=False, --verbose=False`
	Be loud!
`--input`	Input file in txt format. Defaults to sys.stdin.
`--lines`	Number of lines in each file
`--prefix=file_`	Prefix to the file, defaults to file_
`--dir=./`	Directory to put files, defaults to ./
`--suffix=.part`	Suffix to the file, defaults to .part