Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concat sequences in the same file, and output BED file with concat regions #508

Open
fgvieira opened this issue Feb 7, 2025 · 1 comment

Comments

@fgvieira
Copy link

fgvieira commented Feb 7, 2025

seqkit concat concatenates sequences with the same ID but, only between files. Also, after concatenating, it is not possible to know the coordinates of the original sequences. Would it be possible to also allow concatenation of sequences in the same file, and to get a (e.g.) BED file with the coordinates?

For example:

$ seqkit concat --all-seqs a.fa b.fa 
>A 1|x|2|x
a1-ax-a2-ax-

And with the the option:

$ seqkit concat --all-seqs --out-bed regions.bed a.fa b.fa 
>A 1|x|2|x
a1-ax-a2-ax-

would also output the file:

$ cat regions.bed
A    0    3   1
A    3    6   x
A    6    9   2
A    9    18   x
@shenwei356
Copy link
Owner

seqkit concat concatenates sequences with the same ID but, only between files.

Can split records into individual files.

left

It's quite tricky, highly custom operations. Sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants