Dynamic Transcriptional and Chromatin Accessibility Landscape of Medaka Embryogenesis

Yingshu Li, Yongjie Liu, Hang Yang, Ting Zhang, Kiyoshi Naruse, Qiang Tu

Genome Research


Medaka (Oryzias latipes) has become an important vertebrate model widely used in genetics, developmental biology, environmental sciences, and many other fields. A high-quality genome sequence and a variety of genetic tools are available for this model organism. However, existing genome annotation is still rudimentary, as it was mainly based on computational prediction and short-read RNA-seq data. Here we report a dynamic transcriptome landscape of medaka embryogenesis profiled by long-read RNA-seq, short-read RNA-seq, and ATAC-seq. Integrating these datasets, we constructed a much-improved gene model set including about 17,000 novel isoforms, identified 1600 transcription factors, 1100 long non-coding RNAs, and 150,000 potential cis-regulatory elements as well. Time-series datasets provided another dimension of information. With the expression dynamics of genes and accessibility dynamics of cis-regulatory elements, we investigated isoform switching, regulatory logic between accessible elements and genes during embryogenesis. We built a user-friend medaka omics data portal to present these datasets. This resource provides the first comprehensive omics datasets of medaka embryogenesis. Ultimately, we term these three assays as the minimum ENCODE toolbox and propose the use of it as the initial and essential profiling genomic assays for model organisms that have limited data available. This work will be of great value for the research community using medaka as the model organism and many others as well.