Pipeline flow¶
CD-box snoRNAs¶
1. Split the input¶
At first split the input unmapped sequences into manageable chunks.
Split fasta file into batches
usage: rg_split_fasta [-h] [-v] [--input INPUT] [--output-dir OUTPUT_DIR]
[--batch-size BATCH_SIZE] [--prefix PREFIX]
[--suffix SUFFIX]
- Options:
-v=False, --verbose=False Be loud! --input=<open file '<stdin>', mode 'r' at 0x7f0cd5f1d0c0> Input file in fasta format. Defaults to sys.stdin. --output-dir=./ Output directory for split files. Defaults to . --batch-size=100 Batch size to split, defaults to 100 --prefix=part_ Prefix to file name , defaults to part_ --suffix=inputfasta Suffix (extension) to the file name , defaults to inputfasta
2. Generate various files from snoRNAs¶
i. Make FASTA¶
Generate fasta file from snoRNA input
usage: rg_generate_fasta [-h] [-v] --input INPUT [--output OUTPUT] --type {CD,HACA} [--switch-boxes]
- Options:
-v=False, --verbose=False Be loud! --input Input file in tab format. --output Output file in fasta format. --type Type of snoRNA
Possible choices: CD, HACA
--switch-boxes=False If the CD box is located wrongly it will try to relabel it
ii. Generate separate files¶
Generate fasta files for PLEXY from snoRNA input
usage: rg_generate_input_for_plexy_or_rnasnoop [-h] [-v] --input INPUT --type
{CD,HACA} [--dir DIR]
[--switch-boxes]
- Options:
-v=False, --verbose=False Be loud! --input Input file in tab format. --type Type of snoRNA. If CD is chosen an input for PLEXY will be generated. If HACA is chosen two stems for RNASnoop will be saved.
Possible choices: CD, HACA
--dir=Input Directory to put output , defaults to Plexy --switch-boxes=False If the CD box is located wrongly it will try to relabel it
iii. Make BED¶
Generate fasta file from snoRNA input
usage: rg_generate_snoRNA_bed [-h] [-v] --input INPUT [--output OUTPUT] --type
{CD,HACA} [--switch-boxes]
- Options:
-v=False, --verbose=False Be loud! --input Input file in tab format. --output Output file in fasta format. --type Type of snoRNA
Possible choices: CD, HACA
--switch-boxes=False If the CD box is located wrongly it will try to relabel it
3. Annotate with snoRNAs¶
Annotate input BED file used for generation of clusters with snoRNAs.
Annotate bed file with another bed file containing annotations
usage: rg_annotate_bed [-h] [-v] --input FILE [--output FILE] --annotations
FILE [--fraction FLOAT] [--placeholder STRING]
[--un_stranded] [--filter-by FILTER_BY]
- Options:
-v=0, --verbose=0 Print more verbose messages for each additional verbose level. --input a bed file that you want to annotate --output=output.tab an output table with annotations --annotations a bed file with annotations --fraction=0.25 Fraction of read that must overlap the feature to be accepted --placeholder=. A placeholder for empty annotations --un_stranded=False Pass if your protocol is un-stranded --filter-by Filter by these (coma separated) list of annotation types
########################## FILE DESCRIPTION ###################################################
- BED FILE FOR WITH ANNOTATION EXAMPLE
- 1 24740163 24740215 miRNA:ENST00000003583 0 - 1 24727808 24727946 miRNA:ENST00000003583 0 - 1 24710391 24710493 miRNA:ENST00000003583 0 -
fields: chr start end annot_type:annot_name num strand”]
- INPUT BED FILE EXAMPLE
- 1 24685109 24687340 ENST00000003583 0 - 1 24687531 24696163 ENST00000003583 0 - 1 24696329 24700191 ENST00000003583 0 -
########################## FILE DESCRIPTION ###################################################
4. Calculate snoRNA expression¶
Based on annotations calculate RPKM values for each snoRNA and filter all that falls below given quantile.
RPKM = (10^9 * C)/(N * L)
- where:
- C = Number of reads mapped to a gene N = Total mapped reads in the experiment (library size) L = Length of the feature (in this case snoRNA length)
usage: rg_calculate_snoRNA_RPKM [-h] [-v] --input INPUT [--output OUTPUT]
--library LIBRARY --snoRNAs SNORNAS
[--quantile QUANTILE] [--type {CD,HACA}]
- Options:
-v=False, --verbose=False Be loud! --input Part of the library that is annotated as snoRNA --output Output file in tab format. --library Library from which the annotations were generated (in bed format) --snoRNAs BED file with snoRNAs --quantile=0.25 Quantile for the expression cut-off, defaults to 0.25 --type=CD Type of snoRNA, defaults to CD
Possible choices: CD, HACA
5. Prepare anchors¶
Prepare anchor sequences from provided fasta
usage: rg_prepare_anchors [-h] [-v] [--fasta-to-anchor FASTA_TO_ANCHOR]
[--anchor-length ANCHOR_LENGTH] [--output OUTPUT]
--expressed-snoRNAs EXPRESSED_SNORNAS
- Options:
-v=False, --verbose=False Be loud! --fasta-to-anchor Fasta to anchor --anchor-length=12 Anchor length, defaults to 12 --output Output file name --expressed-snoRNAs A list with expressed snoRNAs with RPKMs in form of: snoR_ID RPKM
6. Build Bowtie2 index¶
i. Cluster reads¶
Cluster reads into more convinient bed file
usage: rg_cluster_reads [-h] [-v] --input INPUT [--bed]
[--cluster-size CLUSTER_SIZE] [--overlap OVERLAP]
[--expand-cluster EXPAND_CLUSTER]
[--expand-read EXPAND_READ] [--output OUTPUT]
[--asmbed] [--rRNAs RRNAS] [--tRNAs TRNAS]
[--snRNAs SNRNAS]
[--filter-by FILTER_BY | --filter-except FILTER_EXCEPT]
- Options:
-v=False, --verbose=False Be loud! --input Input file in special asmbed format or in bed format --bed=False Specifies if the input file is in bed format --cluster-size=1 Number of reads necessary for a group to be considered a cluster. eg. 2 returns all groups with 2 or more overlapping reads, defaults to 1 --overlap=-1 Distance in basepairs for two reads to be in the same cluster. For instance 20 would group all reads with 20bp of each other. Negative number means overlap eg. -10 - read must overlap at leas 10 basepairs, defaults to -1 --expand-cluster=0 Expand cluster in both directions, defaults to 0 --expand-read=15 Expand read in both directions (some alternative to expand cluseter), defaults to 15 --output=output.bed Output file in bed format , defaults to output.bed --asmbed=False Write in asmbed format for fasta extraction --rRNAs rRNAs to add in the end of the clusters --tRNAs tRNAs to add in the end of the clusters --snRNAs snRNAs to add in the end of the clusters --filter-by Keep only read with these tags in read_ids. Input is coma separated list of tags --filter-except Keep read except with these tags in read_ids. Input is coma separated list of tags
ii. Make FASTA¶
Prepare FASTA file from clustered reads
Given bed file extract sequences according to chromosome and strand and save it as additional column in input file or fasta
usage: rg_extract_sequences [-h] [-v] [--input INPUT] [--output OUTPUT]
[--format {bed,fasta}]
[--sequence-length SEQUENCE_LENGTH] --genome-dir
GENOME_DIR [--window-left WINDOW_LEFT]
[--window-right WINDOW_RIGHT]
[--adjust-coordinates]
- Options:
-v=False, --verbose=False Be loud! --input=<open file '<stdin>', mode 'r' at 0x7f0cd5f1d0c0> Input file in Bed format. Defaults to stdin --output=<open file '<stdout>', mode 'w' at 0x7f0cd5f1d150> Output file in Bed format. Defaults to stdout --format=bed Output format, defaults to bed
Possible choices: bed, fasta
--sequence-length Final length of sequence to extract independently of coordinates. --genome-dir Directory where the fasta sequences with all the chromosomes are stored --window-left=0 Add nucleotides to the left (upstream). This option does not work if sequence-length is specified, defaults to 0 --window-right=0 Add nucleotides to the right (downstream). This option does not work if sequence-length is specified, defaults to 0 --adjust-coordinates=False Adjust coordinates to new values dictated by windows length, defaults to False
iii. Build index¶
The index is build with following command:
bowtie2-build input.fa path/to/index/bowtie_index 2> /dev/null
7. Run analysis¶
For each part split in first task an analysis is run.
i. Search anchors¶
For each read in the file check if there is an anchor sequence and if this is the case make local alignment (SW) for each associated sequence. As a sequence in the read take only that with the best score.
usage: rg_search_anchor_and_make_alignments [-h] [-v] [--anchors ANCHORS]
[--anchor-sequences ANCHOR_SEQUENCES]
[--reads READS] [--match MATCH]
[--mismatch MISMATCH]
[--gap-open GAP_OPEN]
[--gap-extend GAP_EXTEND]
[--output OUTPUT] [--RNase-T1]
- Options:
-v=False, --verbose=False Be loud! --anchors File with anchors (tab-separated) --anchor-sequences Sequences from which anchors were generated --reads File with reads --match=2 Match score, defaults to 2 --mismatch=-5 Mismatch penalty, defaults to -5 --gap-open=-6 Open gap penalty, defaults to -6 --gap-extend=-4 Gap extension penalty, defaults to -4 --output Output table --RNase-T1=False Indicates if in the experiment RNase T1 was used
ii. Make statistics¶
- This is set of two tasks:
- Merging the files from anchor search
- Making statistics with following script:
Make statistic, prepare plots and evaluate thresholds
usage: rg_make_stats_for_search [-h] [-v] --input INPUT [--output OUTPUT]
[--dir DIR] [--length LENGTH] [--fpr FPR]
- Options:
-v=False, --verbose=False Be loud! --input Input file in tab format. --output Output file in tab format. --dir=Plots Directory to store the plots , defaults to Plots --length=15 Threshold for length of the target site, defaults to 15 --fpr=0.05 False positive rate threshold, defaults to 0.05
iii. Convert to FASTA¶
Convert output table from alignment search into fasta
usage: rg_convert_tab_to_fasta [-h] [-v] [--input INPUT] [--output OUTPUT]
[--stats STATS] [--length LENGTH]
[--assign-score-threshold] [--filter-ambiguous]
[--five-prime-adapter FIVE_PRIME_ADAPTER]
[--three-prime-adapter THREE_PRIME_ADAPTER]
[--five-prime-adapter-threshold FIVE_PRIME_ADAPTER_THRESHOLD]
[--three-prime-adapter-threshold THREE_PRIME_ADAPTER_THRESHOLD]
- Options:
-v=False, --verbose=False Be loud! --input Input table --output Output fasta file --stats Undocumented --length=15 Length of the target site to keep, defaults to 15 --assign-score-threshold=False Undocumented --filter-ambiguous=False Filter reads that can be assigned to more than one snoRNA --five-prime-adapter Five prime adapter sequence used in experiment - will be used to remove reads that are similar --three-prime-adapter Three prime adapter sequence used in experiment - will be used to remove reads that are similar --five-prime-adapter-threshold=0.8 Threshold of the identity to the 5’ adapter, defaults to 0.8 --three-prime-adapter-threshold=0.8 Threshold of the identity to the 3’ adapter, defaults to 0.8
iv. Map reads¶
Map target parts to the cluster with following command:
bowtie2 -x ./index/bowtie_index -f -D100 -L 13 -i C,1 --local -k 10 -U input.anchorfasta -S output.sam
v. Convert result to BED¶
Convert result from mapping into BED file with following command:
samtools view -S input.sam -b -u | bamToBed -tag AS | grep -P "\t\+" > output
vi. Filter BED¶
Filter bed file based on the alignment score/number of reads in cluster/number of mutations
usage: rg_filter_bed [-h] [-v] --input INPUT --output OUTPUT
[--filter-multimappers]
- Options:
-v=False, --verbose=False Be loud! --input Input bed file with special fields --output Output file --filter-multimappers=False Filter chimeras that can be mapped to multiple places in the genome (with exception of mapping to cannonical targets)
vi. Reasign chromosome¶
From the bed from FilterBed step get the positions of the found target sites in terms of real chromosomes not clusters.
usage: rg_get_true_chromosome_positions [-h] [-v] [--input INPUT]
[--output OUTPUT]
- Options:
-v=False, --verbose=False Be loud! --input=<open file '<stdin>', mode 'r' at 0x7f0cd5f1d0c0> Input file in special bed format. Defaults to sys.stdin. --output=<open file '<stdout>', mode 'w' at 0x7f0cd5f1d150> Output file in special bed format. Defaults to sys.stdout.
vii. Append sequence¶
The same script as for the FASTA extraction from Bowtie2 index.
viii. Calculate PLEXY¶
RNA5-8S5|NR_003285.2 15 30 SNORD16 1 + RNA5-8S5|NR_003285.2 86 105 SNORD16 1 + RNA28S5|NR_003287.2 1563 1582 SNORD56B 1 +
SNORD50A|chr7|+|57640816|57640830|20|20 SNORD50A TCATGCTTTGTGTTGTGAAGACCGCCTGGGACTACCGGGCAGGGTGTAGTAGGCA SNORD50A|chr7|+|68527467|68527482|20|20 SNORD50A ACTGAAGAAATTCAGTGAAATGCGGGTAAACGGCGGGAGTAACTATGACTCTCTTA SNORD50A|chr7|+|68527638|68527654|20|20 SNORD50A AATCAGCGGGGAAAGAAGACCCTGTTGAGTTTGACTCTAGTCTGGCATGGTGAAGAG
usage: rg_check_hybrids_with_plexy [-h] [-v] --input INPUT --output OUTPUT
[--snoRNA-paths SNORNA_PATHS]
[--plexy-tmp PLEXY_TMP] --plexy-bin
PLEXY_BIN
- Options:
-v=False, --verbose=False Be loud! --input Input file in tab format. --output Output file in tab format. --snoRNA-paths=./Plexy/ Path to snoRNAs with Plexy , defaults to ./Plexy/ --plexy-tmp=temp/ Plexy temporary directory , defaults to temp/ --plexy-bin Path to PLEXY binary
ix. Calculate RNAduplex¶
RNA5-8S5|NR_003285.2 15 30 SNORD16 1 + RNA5-8S5|NR_003285.2 86 105 SNORD16 1 + RNA28S5|NR_003287.2 1563 1582 SNORD56B 1 +
SNORD50A|chr7|+|57640816|57640830|20|20 SNORD50A TCATGCTTTGTGTTGTGAAGACCGCCTGGGACTACCGGGCAGGGTGTAGTAGGCA SNORD50A|chr7|+|68527467|68527482|20|20 SNORD50A ACTGAAGAAATTCAGTGAAATGCGGGTAAACGGCGGGAGTAACTATGACTCTCTTA SNORD50A|chr7|+|68527638|68527654|20|20 SNORD50A AATCAGCGGGGAAAGAAGACCCTGTTGAGTTTGACTCTAGTCTGGCATGGTGAAGAG
usage: rg_check_hybrids_with_rnaduplex [-h] [-v] --input INPUT --output OUTPUT
[--snoRNA-paths SNORNA_PATHS]
[--RNAduplex-bin RNADUPLEX_BIN]
- Options:
-v=False, --verbose=False Be loud! --input Input file in tab format. --output Output file in tab format. --snoRNA-paths=./Plexy/ Path to snoRNAs with Plexy , defaults to ./Plexy/ --RNAduplex-bin=RNAduplex Path to RNAduplex binary, defaults to RNAcofold
8. Analyse RNAduplex results¶
RNAduplex and PLEXY results goes slightly different analysis.
i. Merge results¶
Nothing to add
ii. Cluster results¶
Cluster results according to the position of the hit and miRNA The input file looks like that:
chr6 99846856 99846871 2628039_1-Unique-1:hsa-miR-129-3p:8 30 - chr3 30733346 30733368 2630171_1-Unique-1:hsa-miR-93:N 36 + chr17 3627403 3627417 2632714_1-Unique-1:hsa-miR-186:N 28 + chr17 3627403 3627417 2639898_1-Unique-1:hsa-miR-16:N 28 +
usage: rg_cluster_results [-h] [-v] --input INPUT [--output OUTPUT]
[--cluster-size CLUSTER_SIZE] [--overlap OVERLAP]
- Options:
-v=False, --verbose=False Be loud! --input Input table file in bed like format --output=output.tab Output table , defaults to output.tab --cluster-size=1 Number of reads necessary for a group to be considered a cluster. eg. 2 returns all groups with 2 or more overlapping reads, defaults to 1 --overlap=-40 Distance in basepairs for two reads to be in the same cluster. For instance 20 would group all reads with 20bp of each other. Negative number means overlap eg. -10 - read must overlap at leas 10 basepairs, defaults to -1
iii. Annotate results¶
Annotate found snoRNA target sites
usage: rg_annotate_positions [-h] [-v] --input INPUT [--output OUTPUT]
--regions REGIONS --genes GENES
[--snoRNAs SNORNAS] --repeats REPEATS
- Options:
-v=False, --verbose=False Be loud! --input Input file in tab format. --output Output file in tab format. --regions GFF file with annotations for different gene regions eg. UTRs --genes Positions of all genes in GFF format --snoRNAs GFF file with annotations for snoRNAs in the same format as genes file --repeats GTF file with annotations for repeats in the format from rmsk table in UCSC
iv. Make statistics¶
Make some useful plots for RNAduplex results
usage: rg_make_plots_for_rnaduplex [-h] [-v] --input INPUT --snoRNAs SNORNAS
--type {CD,HACA} [--dir DIR]
[--threshold THRESHOLD]
- Options:
-v=False, --verbose=False Be loud! --input Input file in TAB --snoRNAs Table with snoRNAs --type Type of snoRNA
Possible choices: CD, HACA
--dir=Plots Directory to store plots, defaults to Plots --threshold=-25.0 Threshold for RNAduplex energy, defaults to -25.0
9. Analyse PLEXY¶
i. Merge results¶
cat output/*.scorebed > results_with_score.tab
ii. Merge raw results¶
cat output/*.truechrombed > raw_reds_results.tab
iii. Append RPKM¶
Append rpkm values to the plexy predictions
usage: rg_add_rpkm_to_score [-h] [-v] --input INPUT [--output OUTPUT] --rpkm
RPKM --annotated-reads ANNOTATED_READS
[--type {CD,HACA}]
- Options:
-v=False, --verbose=False Be loud! --input Input file in tab format. --output Output file in tab format. --rpkm File with rpkms of snoRNAs --annotated-reads Mapped reads annotated as snoRNAs --type=CD Type of snoRNAs , defaults to CD
Possible choices: CD, HACA
iv. Aggregate results by site¶
Divide plexy output into positives and negatives set
usage: rg_aggregate_scored_results [-h] [-v] --input INPUT [--output OUTPUT]
[--threshold THRESHOLD] [--type {CD,HACA}]
- Options:
-v=False, --verbose=False Be loud! --input Input file in Tab format. --output Output file in Tab format. --threshold=-1.0 Threshold for the site, defaults to -1.0 --type=CD Type of snoRNA , defaults to CD
Possible choices: CD, HACA
v. Calculate features¶
For each of the site calculate features: accessibility and flanks composition. The PLEXY is already calculated.
vi. Calculate probability¶
Calculate probability of snoRNA methylation being functional
usage: rg_calculate_probability [-h] [-v] --input INPUT --output OUTPUT
--accessibility ACCESSIBILITY --flanks FLANKS
--model MODEL
- Options:
-v=False, --verbose=False Be loud! --input Input file in tab format. Defaults to sys.stdin. --output Output file in tab format. Defaults to sys.stdout. --accessibility File with calculated accessibility --flanks File with calculated flanks composition --model Statsmodel binary file with the model for snoRNA
vii. Make plots¶
Make some useful plots for results
usage: rg_make_stats_for_results [-h] [-v] --results-probability-complex
RESULTS_PROBABILITY_COMPLEX --results-raw
RESULTS_RAW --snoRNAs SNORNAS --type
{CD,HACA} [--dir DIR] --genome-dir GENOME_DIR
- Options:
-v=False, --verbose=False Be loud! --results-probability-complex Main part of the results --results-raw Row results --snoRNAs Table with snoRNAs --type Type of snoRNA
Possible choices: CD, HACA
--dir=Plots Directory to store plots, defaults to Plots --genome-dir Path to genome directory where the chromosomes are stored
viii. Convert to BED¶
Convert Probability results into bed for annotations
usage: rg_convert_to_bed [-h] [-v] --input INPUT --output OUTPUT
- Options:
-v=False, --verbose=False Be loud! --input Input file --output Output file
ix. Annotate results¶
Annotate found snoRNA target sites
usage: rg_annotate_positions [-h] [-v] --input INPUT [--output OUTPUT]
--regions REGIONS --genes GENES
[--snoRNAs SNORNAS] --repeats REPEATS
- Options:
-v=False, --verbose=False Be loud! --input Input file in tab format. --output Output file in tab format. --regions GFF file with annotations for different gene regions eg. UTRs --genes Positions of all genes in GFF format --snoRNAs GFF file with annotations for snoRNAs in the same format as genes file --repeats GTF file with annotations for repeats in the format from rmsk table in UCSC
Miscellaneous¶
Those scripts are not used (yet) or are used to calculate HACA-box snoRNAs chimeras. For the sake of documentation they are placed here.
rg-annotate-bed.py @Author: Rafal Gumienny (gumiennr@unibas.ch) @Created: 12-Dec-12 @Description: Annotate bed file with another bed file containing annotations @Usage: python rg-annotate-bed.py -h
usage: rg_annotate_results_bed [-h] [-v] --input FILE [--output FILE]
--annotations FILE [--fraction FLOAT]
[--placeholder STRING] [--un_stranded]
[--filter-by FILTER_BY]
- Options:
-v=0, --verbose=0 Print more verbose messages for each additional verbose level. --input a bed file that you want to annotate --output=output.tab an output table with annotations --annotations a bed file with annotations --fraction=0.1 Fraction of read that must overlap the feature to be accepted --placeholder=. A placeholder for empty annotations --un_stranded=False Pass if your protocol is un-stranded --filter-by Filter by these (coma separated) list of annotation types
########################## FILE DESCRIPTION ###################################################
- BED FILE FOR WITH ANNOTATION EXAMPLE
- 1 24740163 24740215 miRNA:ENST00000003583 0 - 1 24727808 24727946 miRNA:ENST00000003583 0 - 1 24710391 24710493 miRNA:ENST00000003583 0 -
fields: chr start end annot_type:annot_name num strand”]
- INPUT BED FILE EXAMPLE
- 1 24685109 24687340 ENST00000003583 0 - 1 24687531 24696163 ENST00000003583 0 - 1 24696329 24700191 ENST00000003583 0 -
########################## FILE DESCRIPTION ###################################################
usage: rg_append_genes_and_names [-h] [-v] --input INPUT [--output OUTPUT]
[--mapping MAPPING]
- Options:
-v=False, --verbose=False Be loud! --input Input file in tab format. --output Output file in tab format. --mapping=/import/bc2/home/zavolan/gumiennr/Pipelines/Pipelines/pipeline_snoRNASearch/data/Annotations/transcript_2_gene_mapping.txt.clean Mapping from ENSEMBL transcript to gene, defaults to /import/bc2/home/zavolan/gumiennr/Pipelines/Pipelines/pipeline_snoRNASearch/data/Annotations/transcript_2_gene_mapping.txt.clean
RNA5-8S5|NR_003285.2 15 30 SNORD16 1 + RNA5-8S5|NR_003285.2 86 105 SNORD16 1 + RNA28S5|NR_003287.2 1563 1582 SNORD56B 1 +
SNORD50A|chr7|+|57640816|57640830|20|20 SNORD50A TCATGCTTTGTGTTGTGAAGACCGCCTGGGACTACCGGGCAGGGTGTAGTAGGCA SNORD50A|chr7|+|68527467|68527482|20|20 SNORD50A ACTGAAGAAATTCAGTGAAATGCGGGTAAACGGCGGGAGTAACTATGACTCTCTTA SNORD50A|chr7|+|68527638|68527654|20|20 SNORD50A AATCAGCGGGGAAAGAAGACCCTGTTGAGTTTGACTCTAGTCTGGCATGGTGAAGAG
usage: rg_check_hybrids_with_rnasnoop [-h] [-v] --input INPUT --output OUTPUT
[--rnasnoop RNASNOOP] --snoRNA-paths
SNORNA_PATHS
- Options:
-v=False, --verbose=False Be loud! --input Input file in tab format. --output Output file in tab format. --rnasnoop=RNAsnoop Path to RNAsnoop binary, defaults to RNAsnoop --snoRNA-paths Path to snoRNAs stems
Compare output results with original data
usage: rg_compare_results_to_original [-h] [-v] --input INPUT [--only-chrom]
- Options:
-v=False, --verbose=False Be loud! --input Bed file with special fields --only-chrom=False If there is a bed file with only chromosome information use this flag
Convert result to asmbed and in the same time extend sequences to be equal desired length
usage: rg_convert_to_asmbed [-h] [-v] --input INPUT [--output OUTPUT]
[--length LENGTH]
- Options:
-v=False, --verbose=False Be loud! --input Input table --output=output.asmbed Output asmbed file , defaults to output.asmbed --length=50 Desired read length, defaults to 50
Convert result to coordinate file
usage: rg_convert_to_coords [-h] [-v] --input INPUT --sequences SEQUENCES
[--output OUTPUT]
- Options:
-v=False, --verbose=False Be loud! --input Input result file --sequences File with sequences --output=coords.tab Output coordinate file , defaults to coords.tab
convert unmapped sequences to fasta
usage: rg_convert_unmapped_to_fasta [-h] [-v] --input INPUT --output OUTPUT
- Options:
-v=False, --verbose=False Be loud! --input Coma separated list of files --output Output name
Make some plots of the results
usage: rg_correlate_expression_with_hybrids [-h] [-v] --input INPUT
[--clustered] --expressions
EXPRESSIONS [--level LEVEL]
[--top TOP]
- Options:
-v=False, --verbose=False Be loud! --input Input table --clustered=False Is the result clustered? --expressions File with miRNA expression --level=0 Expression level (in log scale), defaults to 0 --top=20 Show top mirnas and number of hybrids found, defaults to 20
Filter reads based on annotation in the last column
usage: rg_filter_reads_for_clustering [-h] [-v] --input INPUT --output OUTPUT
[--annotations ANNOTATIONS]
- Options:
-v=False, --verbose=False Be loud! --input Input table --output Output table --annotations=None Coma separated list of annotations to consider, defaults to None
Generate fasta files for PLEXY from snoRNA input
usage: rg_generate_haca_stems_for_rnasnoop [-h] [-v] --input INPUT --type
{CD,HACA} [--dir DIR]
[--switch-boxes]
- Options:
-v=False, --verbose=False Be loud! --input Input file in tab format. --type Type of snoRNA
Possible choices: CD, HACA
--dir=Plexy Directory to put output , defaults to Plexy --switch-boxes=False If the CD box is located wrongly it will try to relabel it
usage: rg_get_search_info [-h] [-v] --snoRNAs SNORNAS --input INPUT
[--output OUTPUT] --type {CD,HACA} [--window WINDOW]
[--smooth-window SMOOTH_WINDOW] [--dir DIR]
- Options:
-v=False, --verbose=False Be loud! --snoRNAs Table with snoRNAs --input Input file in tab format. --output Output file in tab format. --type Type of snoRNA
Possible choices: CD, HACA
--window=100 Window, defaults to 100 --smooth-window=1 Smoothing window length, defaults to 1 --dir=Plots Direcory for plots, defaults to Plots
Generate fasta file from snoRNA input
usage: rg_get_snoRNA_gff [-h] [-v] --input INPUT [--output OUTPUT]
- Options:
-v=False, --verbose=False Be loud! --input Input file in tab format. --output Output file in fasta format.
Generate fasta file from snoRNA input
usage: rg_make_cd_snoRNAs_families [-h] [-v] --input INPUT [--output OUTPUT]
--type {CD,HACA} [--switch-boxes]
[--length LENGTH]
- Options:
-v=False, --verbose=False Be loud! --input Input file in tab format. --output Output file in fasta format. --type Type of snoRNA
Possible choices: CD, HACA
--switch-boxes=False If the CD box is located wrongly it will try to relabel it --length=20 Length of interaction element (seed) to be extracted, defaults to 20
Shuffle fasta sequences in the file
usage: rg_shuffle_fasta_sequences [-h] [-v] --input INPUT [--output OUTPUT]
[--let-size LET_SIZE]
- Options:
-v=False, --verbose=False Be loud! --input Input fasta file --output=output_shuffled.fa Output fasta file , defaults to output_shuffled.fa --let-size=2 Let size to preserve, defaults to 2
Split text file into files with desired number of lines
usage: rg_split_file_into_chunks [-h] [-v] --input INPUT --lines LINES
[--prefix PREFIX] [--dir DIR]
[--suffix SUFFIX]
- Options:
-v=False, --verbose=False Be loud! --input Input file in txt format. Defaults to sys.stdin. --lines Number of lines in each file --prefix=file_ Prefix to the file, defaults to file_ --dir=./ Directory to put files, defaults to ./ --suffix=.part Suffix to the file, defaults to .part