Yes, we can generate synthetic DNA sequence motifs datasets in the following way -- i.i.d. background, and a profile that represents a motif that corresponds to a product multinomial (i.e., PWMs) -- and then plant a realization of that profile at some randomly chosen position for each generated background sequence. But this motif problem is way too easy to tackle. How about we simulate the motif as a mixture of profiles, where each profile may share some identical patterns (i.e., overlaps)? Moreover, what if a motif has a blocked structure such that variable spacings exist between each two adajacent blocks (i.e., gaps)? Maybe let's simulate a mixture of blocked-structured profiles as our ground truth motif? This package creates such patterns.
Coming soon
Coming soon