-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download Symbol via EFetch #59
Comments
You can use NCBI Datasets for this.
Since the gene DocSum does not have transcript accessions, a bash for loop can be used to map acc.ver to gene symbols:
|
Thank you. This is extremely helpful. I'm working with a list of 177,816 accession.versions (attached), the number of reference transcripts in the GRC38 release on the NCBI Human Genome Resources page when I downloaded it. While NCBI Datasets worked fine for a small test file of ten accession.versions, I let the complete file run overnight, but it never finished. Was I just not patient enough? I would really like this to work. Thanks for the scripts. I'm currently running the second. When I tested it, it worked fine but slowly: it seemed like I was only getting about one result per second, meaning my full list will take over two days to complete. Is there any way to speed this up? |
using esummary may be faster: epost -db nuccore -id NM_001318896.2 | elink -target gene | esummary -format text The DocSum is HUGE and is going to be slow. But if you need to preserve 1:1 linkages between input and output, I believe you have to send each request one at a time, unfortunately |
Not using EntrezDirect. As I had mentioned earlier, NCBI Datasets is a good choice for this. For example, I was able to download the data for the entire list in 6 min using the following command:
|
Thanks for information. I downloaded the |
You are not doing anything wrong. Currently As a workaround, you can "download" the package without any sequence data and use
|
Again, thanks. Unfortunately, this doesn't appear to work as there is extraneous information in the result. I need the official symbol of the accession numbers in the file in order. It looks like the results are all the records for the genes with the accession.version numbers in my list. For example, XM_017000093.3 is the first accession number, which is a transcript for gene AP4B1. However, in the result provided by |
Ah, the details! Yes, |
Thanks again for all your help.
where BTW, when I ran |
Given a list of accession.version numbers, is there a way to download the official gene symbol (only) of the corresponding gene using one of the EDirect utilities? If not, any thoughts on how this might be best accomplished?
The text was updated successfully, but these errors were encountered: