>>>>>>>>>>>>>>>>>> Reference = [/project/shefflab/genomes/hg38/salmon_partial_sa_index/default/reference.masked.genome.fa] Query = [/project/shefflab/genomes/hg38/fasta_txome/default/hg38.fa] Kmer size = 16 Window size = 5 Segment length = 500 (read split allowed) Alphabet = DNA Percentage identity threshold = 80% Mapping output file = /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/mashmap.out Filter mode = 1 (1 = map, 2 = one-to-one, 3 = none) Execution threads = 8 >>>>>>>>>>>>>>>>>> INFO, skch::Sketch::build, minimizers picked from reference = 978213050 INFO, skch::Sketch::index, unique minimizers = 295967991 INFO, skch::Sketch::computeFreqHist, Frequency histogram of minimizers = (1, 143502113) ... (1069195, 1) INFO, skch::Sketch::computeFreqHist, With threshold 0.001%, ignore minimizers occurring >= 6681 times during lookup. INFO, skch::main, Time spent computing the reference index: 341.494 sec INFO, skch::Map::mapQuery, [count of mapped reads, reads qualified for mapping, total input reads] = [163086, 163086, 189154] INFO, skch::main, Time spent mapping the query : 10633.2 sec INFO, skch::main, mapping results saved in : /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/mashmap.outCommand completed. Elapsed time: 3:04:31. Running peak memory: 69.572GB. PID: 1068; Command: mashmap; Return code: 0; Memory used: 69.572GB > `awk -v OFS=' ' '{print $6,$8,$9}' /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/mashmap.out | sort -k1,1 -k2,2n - > /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/genome_found.sorted.bed` (19552,19557) Command completed. Elapsed time: 0:00:05. Running peak memory: 69.572GB. PID: 19552; Command: awk; Return code: 0; Memory used: 0.002GB PID: 19557; Command: sort; Return code: 0; Memory used: 0.011GB > `bedtools merge -i /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/genome_found.sorted.bed > /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/genome_found_merged.bed` (19567) Command completed. Elapsed time: 0:00:00. Running peak memory: 69.572GB. PID: 19567; Command: bedtools; Return code: 0; Memory used: 0.003GB > `bedtools getfasta -fi /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/reference.masked.genome.fa -bed /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/genome_found_merged.bed -fo /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/genome_found.fa` (19575)

index file /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/reference.masked.genome.fa.fai not found, generating...Command completed. Elapsed time: 0:00:04. Running peak memory: 69.572GB. PID: 19575; Command: bedtools; Return code: 0; Memory used: 0.004GB > `awk '{a=$0; getline;split(a, b, ":"); r[b[1]] = r[b[1]]""$0} END { for (k in r) { print k"\n"r[k] } }' /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/genome_found.fa > /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/decoy.fa` (19589) Command completed. Elapsed time: 0:00:54. Running peak memory: 69.572GB. PID: 19589; Command: awk; Return code: 0; Memory used: 0.132GB > `cat /project/shefflab/genomes/hg38/fasta_txome/default/hg38.fa /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/decoy.fa > /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/gentrome.fa` (19637) Command completed. Elapsed time: 0:00:02. Running peak memory: 69.572GB. PID: 19637; Command: cat; Return code: 0; Memory used: 0.003GB > `grep '>' /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/decoy.fa | awk '{print substr($1,2); }' > /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/decoys.txt` (19641,19642) Command completed. Elapsed time: 0:00:01. Running peak memory: 69.572GB. PID: 19641; Command: grep; Return code: 0; Memory used: 0.017GB PID: 19642; Command: awk; Return code: 0; Memory used: 0.003GB > `rm /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/exons.bed /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/reference.masked.genome.fa /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/mashmap.out /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/genome_found.sorted.bed /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/genome_found_merged.bed /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/genome_found.fa /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/decoy.fa /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/reference.masked.genome.fa.fai` (19644) Command completed. Elapsed time: 0:00:01. Running peak memory: 69.572GB. PID: 19644; Command: rm; Return code: 0; Memory used: 0.001GB > `salmon index -t /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/gentrome.fa -d /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/decoys.txt -i /project/shefflab/genomes/hg38/salmon_partial_sa_index/default -p 8` (19646)

Version Info: Could not resolve upgrade information in the alotted time. [2020-01-05 19:54:29.333] [puff::index::jointLog] [warning] If you wish to retain duplicate transcripts, please use the `--keepDuplicates` flag [2020-01-05 19:54:29.350] [puff::index::jointLog] [info] Replaced 5 non-ATCG nucleotides [2020-01-05 19:54:29.350] [puff::index::jointLog] [info] Clipped poly-A tails from 1439 transcripts wrote 177459 cleaned references seqHash 256 : c1756fb43024b82a5b8e36e241ca4a80c929a709068f34268abe455a7ce60fa4 seqHash 512 : 70845f3ebeddff820cec258679907bc2cd2e2e9d19546ef6da1284dc3d82a634a2bf4c544e4ede2b5ba54524fcb798d4f5b01cc53866adbd58322e96e7e78621 nameHash 256 : 3323ffa57919053350df6fed6cbf403b27a4a1c75da3b98adba96bf2e4c371cf nameHash 512 : 848c52d665a10be5aec32a09becca6160d7873c9f061d4e76d3593dc914d447590a086416067699eabab76d0eb24c1b793d23e9d77b9e855180191fc97d35d88 [2020-01-05 19:54:31.512] [puff::index::jointLog] [info] Filter size not provided; estimating from number of distinct k-mers [2020-01-05 19:54:35.189] [puff::index::jointLog] [info] ntHll estimated 159634130 distinct k-mers, setting filter size to 2^32 Threads = 8 Vertex length = 31 Hash functions = 5 Filter size = 4294967296 Capacity = 2 Files: /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/ref_k31_fixed.fa -------------------------------------------------------------------------------- Round 0, 0:4294967296 Pass Filling Filtering 1 15 628 2 17 0 True junctions count = 1200901 False junctions count = 214015 Hash table size = 1414916 Candidate marks count = 11887531 -------------------------------------------------------------------------------- Reallocating bifurcations time: 1 True marks count: 11470861 Edges construction time: 383 -------------------------------------------------------------------------------- Distinct junctions = 1200901 approximateContigTotalLength: 131492280 counters: 84233 906 898 75 contig count: 1790981 element count: 215123398 complex nodes: 86112 size: 215123398 # of ones in rank vector: 1790980 size: 215123398 [2020-01-05 20:12:26.388] [puff::index::jointLog] [info] Setting the index/BinaryGfa directory /project/shefflab/genomes/hg38/salmon_partial_sa_index/default size = 215123398 ----------------------------------------- | Loading contigs | Time = 22.58 ms ----------------------------------------- size = 215123398 ----------------------------------------- | Loading contig boundaries | Time = 11.776 ms ----------------------------------------- Number of ones: 1790980 Number of ones per inventory item: 512 Inventory entries filled: 3499 [2020-01-05 20:12:26.769] [puff::index::jointLog] [info] Done wrapping the rank vector with a rank9sel structure. [2020-01-05 20:12:27.184] [puff::index::jointLog] [info] contig count for validation: 1790980 [2020-01-05 20:12:28.212] [puff::index::jointLog] [info] Total # of Contigs : 1790980 [2020-01-05 20:12:28.212] [puff::index::jointLog] [info] Total # of numerical Contigs : 1790980 [2020-01-05 20:12:31.263] [puff::index::jointLog] [info] Total # of segments we have position for : 1790980 [2020-01-05 20:12:31.374] [puff::index::jointLog] [info] total contig vec entries 11692510 [2020-01-05 20:12:31.374] [puff::index::jointLog] [info] bits per offset entry 24 [2020-01-05 20:12:32.726] [puff::index::jointLog] [info] there were 935225 equivalence classes [2020-01-05 20:12:37.431] [puff::index::jointLog] [info] # segments = 1790980 [2020-01-05 20:12:37.431] [puff::index::jointLog] [info] total length = 215123398 [2020-01-05 20:12:37.508] [puff::index::jointLog] [info] Reading the reference files ... [puff::index::jointLog] [info] finished populating pos vector [2020-01-05 20:12:50.592] [puff::index::jointLog] [info] writing index components [2020-01-05 20:12:53.036] [puff::index::jointLog] [info] finished writing dense pufferfish index [2020-01-05 20:12:53.121] [jLog] [info] done building index for info, total work write each : 2.331 total work inram from level 3 : 4.322 total work raw : 25.000 Bitarray 845658496 bits (100.00 %) (array + ranks ) final hash 0 bits (0.00 %) (nb in final hash 0)Command completed. Elapsed time: 0:18:33. Running peak memory: 69.572GB. PID: 19646; Command: salmon; Return code: 0; Memory used: 0.589GB > `touch /project/shefflab/genomes/hg38/salmon_partial_sa_index/default/_refgenie_build/hg38_salmon_partial_sa_index__default.flag` (21732) Command completed. Elapsed time: 0:00:00. Running peak memory: 69.572GB. PID: 21732; Command: touch; Return code: 0; Memory used: 0.001GB > `cd /project/shefflab/genomes/hg38/salmon_partial_sa_index/default; find . -type f -not -path './_refgenie_build*' -exec md5sum {} \; | sort -k 2 | awk '{print $1}' | md5sum` Asset digest: 2e6ba9e7daff61d2ba05420967eae02b Default tag for 'hg38/salmon_partial_sa_index' set to: default ### Pipeline completed. Epilogue * Elapsed time (this run): 3:24:48 * Total elapsed time (all runs): 3:24:47 * Peak memory (this run): 69.5719 GB * Pipeline completed time: 2020-01-05 20:12:58 Finished building 'salmon_partial_sa_index' asset