Skip to content

How to count pombe IDs in PMC using the ePMC web service

Kim Rutherford edited this page Mar 30, 2023 · 1 revision

In Chado:

\pset tuples_only
\o /tmp/pom_ids.txt
select uniquename from feature where organism_id = 1 and type_id in (select cvterm_id from cvterm where name = 'gene');
\o

Then in the shell:

(cd /tmp/; split -l 500 pom_ids.txt pom_ids.txt)
(cd /tmp/
for year in 2019 2020 2021 2022 2023; do for i in `seq a z`
do
perl -ne 's/^\s+//; chomp; print qq| OR "$_"|; END { print "
" }' pom_ids.txta$i  | perl -ne 's/^ OR //; chomp; print "format=json&resultType=idlist&query=PUB_YEAR:'$year' AND (pombe OR \"fission yeast\") AND ($_)
"' > query_$year.$i.txt
done
done)
for year in 2019 2020 2021 2022 2023; do
   for i in `seq a z`
   do
   sleep 3
   echo "query: $year $i"
   curl -s --data-binary '@/tmp/'query_$year.$i.txt -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9' -H 'Accept-Language: en-GB,en-US;q=0.9,en;q=0.8'  -H 'Cache-Control: no-cache'  -H 'Connection: keep-alive'  -H 'Content-Type: application/x-www-form-urlencoded' https://www.ebi.ac.uk/europepmc/webservices/rest/searchPOST | jq . | grep hitCount
   done > /tmp/year_$year.txt
done
for i in 2019 2020 2021 2022 2023
do
echo -n "$i: "; perl -ne '$sum += $1 if /"hitCount":\s*(\d+)/; END { print "$sum\n"; }' year_$i.txt
done