Can I visualize RNA-SEQ data on the Repeat Browser?
It is possible to map coverage tracks or other aspects of your RNA-SEQ data to the Repeat Browser, but it’s not really clear what this means. The consensuses we provide have unequal coverage from all their genomic instances (see “Mapping Coverage” track in “Mapping to the Human Genome”). In simpler terms, if you look at all L1PA elements, there are many, many more bits of the 3′ end scattered throughout the genome then full length elements so some sort of normalization would be needed to meaningfully interpret results mapped to the consensus.
If you are interested in quantifying the expression of TE families we recommend using existing tools designed for that purpose. TETranscripts, REdiscoverTE, and SalmonTE all do this in slightly different ways and can tell you if TEs are expressed in your dataset.
Can I visualize non-reference TE insertions on the Repeat Browser?
Depends on what you mean. Liftover files only exist for hg19 and hg38, so non-reference TEs don’t have pre-computed coordinates on our consensuses and therefore can’t be lifted. You can generate your own liftOver file for another assembly. Instructions on how to do that coming soon!
What you can also do is BLAT your TEs to the references and visualize them, similar to what we did for non-human primate TEs in our paper.
1) Download BLAT.
2) Then download the consensus sequence you wish to map to (you can View DNA for the element of interest in the browser, download the sequences from the Table Browser or download this file. You can also map to all the consensus sequences. Note mapping to all consensus sequences is slightly different than what we do when lifting. That is we only ever lift from an annotated L1PA2 to the L1PA2 consensus. If you blat a list of L1PA2 sequences against all consensuses (hg38reps) you will almost certainly get mappings to other L1PAs. Be sure that’s what you want to do.
3) BLAT them to create a psl file.
4) Load the psl as a custom track.
You can see map_primates.sh as an example.
Are your references the RepBase consensuses?
No. Our consensuses are generated from RepeatMasker output. RepBase consensuses are available from GIRI with a subscription. The Repeat Browser uses the Dfam consensuses as much as possible, but generates its own when necessary. See the sequences and lifting pages for more.
Where is <my favorite repeat>?
Your repeat probably isn’t found in the RepeatMasker output for hg38 or hg19 or may have been filtered out for other reasons (e.g. we discarded all simple repeats). If you believe your favorite repeat has been unfairly excluded let us know, and we can add the consensus and liftOver in a later update.
My <factor of interest> definitely binds a repeat type of instance, but the coverage on the RepeatBrowser is underwhelming. Why could this be?
There are a few possibilities. The first is that, although the factor may bind many repeats of a certain family/class/type, it may not do so in a localized or sequence-specific manner, in which case the Repeat Browser signal could be quite “diffuse”. Another possibility is that your factor may bind a certain subset of repeats that are not represented well by the consensus. In this case it is possible that they might not be lifted to the consensus, or they may be lifted in such a way that insertions and deletions relative to the consensus produce a strange signal.
Does this exist for mouse (mm10) or <favorite species>?
No. Would you find a mouse browser useful? Let us know! The code used to generate the browser requires only the genome assembly, a RepeatMasker track and a key of mappings between Dfam and RepeatMasker output. The last of the three requires a bit of thought, and is therefore the most difficult to arrive at. If you have all three things and would like to run the code, or work with us to build a browser you can find the code on github or email us (see below).
I have a question! I guess it’s not frequently asked.
jferna10 at ucsc.edu
max at soe. ucsc.edu