This is an automatically generated1 ranked list of open source software from pharmaceutical companies and cross organizations, biotechnology companies, research institutes, open source communities and individuals, plus some life-science software from technological companies.
It's made from a curated list of GitHub accounts, and will be periodically refreshed from these sources' repositories.
You can also access what they have updated lately and which topics are covered by these software.
Note
stars - number of people who especially appreciated the repository
forks - number of people who have cloned the repository in order to modify it
watchers - number of people who are monitoring changes in the repository
main programming language
license
last update date & time
Rank | Software |
---|---|
1 | google-deepmind/alphafold Open source code for AlphaFold. ![]() ![]() ![]() ![]() ![]() ![]() |
2 | deepchem/deepchem Democratizing Deep-Learning for Drug Discovery, Quantum Chemistry, Materials Science and Biology biology , deep-learning , drug-discovery , hacktoberfest , materials-science , quantum-chemistry ![]() ![]() ![]() ![]() ![]() |
3 | biopython/biopython Official git repository for Biopython (originally converted from CVS) bioinformatics , biopython , dna , genomics , phylogenetics , protein , protein-structure , python , sequence-alignment ![]() ![]() ![]() ![]() ![]() |
4 | google/deepvariant DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data. bioinformatics , deep-learning , deep-neural-network , deepvariant , dna , genome , genomics , machine-learning , ngs , science , sequencing , tensorflow ![]() ![]() ![]() ![]() ![]() ![]() |
5 | facebookresearch/esm Evolutionary Scale Modeling (esm): Pretrained language models for proteins ![]() ![]() ![]() ![]() ![]() ![]() |
6 | aqlaboratory/openfold Trainable, memory-efficient, and GPU-friendly PyTorch reproduction of AlphaFold 2 alphafold2 , protein-structure , pytorch ![]() ![]() ![]() ![]() ![]() |
7 | rdkit/rdkit The official sources for the RDKit library c-plus-plus , cheminformatics , python , rdkit ![]() ![]() ![]() ![]() ![]() |
8 | AstraZeneca/awesome-explainable-graph-reasoning A collection of research papers and software related to explainability in graph machine learning. awesome-list , deep-learning , explainable-ai , explainable-ml , graph , graph-algorithms , graphml ![]() ![]() ![]() ![]() |
9 | OpenGene/fastp An ultra-fast all-in-one FASTQ preprocessor (QC/adapters/trimming/filtering/splitting/merging...) adapter , bioinformatics , duplication , fastq , filter , filtering , illumina , merging , ngs , overlap , polyg , preprocessing , qc , quality , quality-control , sequencing , splitting , trimming , umi ![]() ![]() ![]() ![]() ![]() |
10 | scverse/scanpy Single-cell analysis in Python. Scales to >1M cells. anndata , bioinformatics , data-science , machine-learning , python , scanpy , scverse , transcriptomics , visualize-data ![]() ![]() ![]() ![]() ![]() |
11 | lh3/minimap2 A versatile pairwise aligner for genomic and spliced nucleotide sequences bioinformatics , genomics , sequence-alignment , spliced-alignment ![]() ![]() ![]() ![]() ![]() |
12 | allenai/scispacy A full spaCy pipeline and models for scientific/biomedical documents. bioinformatics , biomedical , custom-pipes , nlp , scientific-documents , spacy ![]() ![]() ![]() ![]() ![]() ![]() |
13 | broadinstitute/gatk Official code repository for GATK versions 4 and up bioinformatics , dna , gatk , genome , genomics , ngs , science , sequencing , spark ![]() ![]() ![]() ![]() ![]() ![]() |
14 | bioconda/bioconda-recipes Conda recipes for the bioconda channel. bioinformatics , conda , hacktoberfest , package-management ![]() ![]() ![]() ![]() ![]() |
15 | samtools/samtools Tools (written in C using htslib) for manipulating next-generation sequencing data ![]() ![]() ![]() ![]() ![]() |
16 | Slicer/Slicer Multi-platform, free open source software for visualization and image computing. 3d-printing , 3d-slicer , c-plus-plus , computed-tomography , image-guided-therapy , image-processing , itk , kitware , medical-image-computing , medical-imaging , national-institutes-of-health , neuroimaging , nih , python , qt , registration , segmentation , tcia-dac , tractography , vtk ![]() ![]() ![]() ![]() ![]() |
17 | lh3/bwa Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment) bioinformatics , fm-index , genomics , sequence-alignment ![]() ![]() ![]() ![]() ![]() |
18 | DeepGraphLearning/torchdrug A powerful and flexible machine learning platform for drug discovery deep-learning , drug-discovery , graph-neural-networks , pytorch ![]() ![]() ![]() ![]() ![]() ![]() |
19 | lh3/seqtk Toolkit for processing sequences in FASTA/Q formats bioinformatics , sequence-analysis ![]() ![]() ![]() ![]() ![]() |
20 | galaxyproject/galaxy Data intensive science for everyone. bioinformatics , dna , genomics , hacktoberfest , ngs , pipeline , science , sequencing , usegalaxy , workflow , workflow-engine ![]() ![]() ![]() ![]() ![]() ![]() |
21 | schrodinger/fixed-data-table-2 A React table component designed to allow presenting millions of rows of data. ![]() ![]() ![]() ![]() ![]() |
22 | soedinglab/MMseqs2 MMseqs2: ultra fast and sensitive search and clustering suite alignment , bioinformatics , blast , linclust , metagenomics , mmseqs , profile-search , sequence-clustering , sequence-search , taxonomy ![]() ![]() ![]() ![]() ![]() |
23 | facebookresearch/fastMRI A large-scale dataset of both raw MRI measurements and clinical MRI images. convolutional-neural-networks , deep-learning , fastmri , fastmri-challenge , fastmri-dataset , medical-imaging , mri , mri-reconstruction , pytorch ![]() ![]() ![]() ![]() ![]() ![]() |
24 | greenelab/deep-review A collaboratively written review paper on deep learning, genomics, and precision medicine deep-learning , genomics , manubot , manuscript , neural-networks , review ![]() ![]() ![]() ![]() ![]() ![]() |
25 | shenwei356/seqkit A cross-platform and ultrafast toolkit for FASTA/Q file manipulation bioinformatics , cross-platform , fasta , fastq , golang , manipulation , sequence , tool , toolkit ![]() ![]() ![]() ![]() ![]() ![]() |
26 | MultiQC/MultiQC Aggregate results from bioinformatics analyses across many samples into a single report. analysis , bioconda , bioinformatics , data-visualization , multiqc , pypi , python , quality-control , reporting , seqera , vizualisation ![]() ![]() ![]() ![]() ![]() ![]() |
27 | dcm4che/dcm4che DICOM Implementation in JAVA ![]() ![]() ![]() ![]() ![]() ![]() |
28 | scverse/scvi-tools Deep probabilistic analysis of single-cell and spatial omics data cite-seq , deep-generative-model , deep-learning , human-cell-atlas , scrna-seq , scverse , single-cell-genomics , single-cell-rna-seq , variational-autoencoder , variational-bayes ![]() ![]() ![]() ![]() ![]() |
29 | vgteam/vg tools for working with genome variation graphs dna , genome-graph , genomics , graph , variation-graph ![]() ![]() ![]() ![]() ![]() ![]() |
30 | schrodinger/pymol-open-source Open-source foundation of the user-sponsored PyMOL molecular visualization system. ![]() ![]() ![]() ![]() ![]() |
31 | scipipe/scipipe Robust, flexible and resource-efficient pipelines using Go and the commandline bioinformatics , bioinformatics-pipeline , cheminformatics , dataflow , fbp , go , golang , pipeline , scientific-workflows , scipipe , workflow , workflow-engine ![]() ![]() ![]() ![]() ![]() ![]() |
32 | shenwei356/csvtk A cross-platform, efficient and practical CSV/TSV toolkit in Golang bioinformatics , command-line , cross-platform , csv , golang , tool , toolkit , tsv ![]() ![]() ![]() ![]() ![]() ![]() |
33 | bigdatagenomics/adam ADAM is a genomics analysis platform with specialized file formats built using Apache Avro, Apache Spark, and Apache Parquet. Apache 2 licensed. avro , big-data , bioinformatics , genomics , java , parquet , python , r , scala , spark ![]() ![]() ![]() ![]() ![]() |
34 | broadinstitute/cromwell Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments application , bioinformatics , cloud , containers , docker , executor , ga4gh , hpc , scala , wdl , workflow , workflow-description-language , workflow-execution ![]() ![]() ![]() ![]() ![]() ![]() |
35 | hail-is/hail Cloud-native genomic dataframes and batch computing bioinformatics , genetics , genomics , gwas , hail , python , software , vcf ![]() ![]() ![]() ![]() ![]() ![]() |
36 | broadinstitute/picard A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. ![]() ![]() ![]() ![]() ![]() ![]() |
37 | aqlaboratory/proteinnet Standardized data set for machine learning of protein structure dataset , deep-learning , machine-learning , protein-sequence , protein-structure , proteins ![]() ![]() ![]() ![]() ![]() |
38 | shenwei356/rush A cross-platform command-line tool for executing jobs in parallel bioinformatics , command , cross-platform , execute , golang , parallel , pipeline , shell , windows ![]() ![]() ![]() ![]() ![]() ![]() |
39 | evo-design/evo DNA foundation modeling from molecular to genome scale ![]() ![]() ![]() ![]() ![]() |
40 | PaddlePaddle/PaddleHelix Bio-Computing Platform Featuring Large-Scale Representation Learning and Multi-Task Deep Learning “螺旋桨”生物计算工具集 biocomputing , ddi , deeplearning , dti , graph-networks , machine-learning , molecule-design , ppi , protein-design , protein-docking , protein-folding , protein-structure-prediction , representation-learning , rna-structure-prediction , self-supervised-learning ![]() ![]() ![]() ![]() ![]() ![]() |
41 | samtools/htslib C library for high-throughput sequencing data formats bam , bcf , bioinformatics , cram , htslib , ngs , sam , vcf ![]() ![]() ![]() ![]() ![]() |
42 | google/nucleus Python and C++ code for reading and writing genomics data. bioinformatics , dna , genomics , tensorflow ![]() ![]() ![]() ![]() ![]() ![]() |
43 | nroduit/Weasis Weasis is a DICOM viewer available as a desktop application or as a web-based application. dicom , dicom-image , dicom-image-viewer , dicom-images , dicom-pr , dicom-rt , dicom-seg , dicom-viewer , dicom-web-viewer , dicomweb , ecg , export-dicom , medical , medical-imaging , multiplanar-reconstruction , viewer , volume-rendering , weasis ![]() ![]() ![]() ![]() ![]() ![]() |
44 | baidu-research/NCRF Cancer metastasis detection with neural conditional random field (NCRF) camelyon16 , conditional-random-fields , deep-learning , pathology , whole-slide-imaging ![]() ![]() ![]() ![]() ![]() ![]() |
45 | AstraZeneca/chemicalx A PyTorch and TorchDrug based deep learning library for drug pair scoring. (KDD 2022) biology , chemistry , deep-chemistry , deep-learning , drug , drug-discovery , drug-interaction , drug-pair , geometric-deep-learning , geometry , graph-neural-network , machine-learning , pharma , polypharmacy , pytorch , smiles , smiles-strings , torch , torchdrug ![]() ![]() ![]() ![]() ![]() |
46 | samtools/hts-specs Specifications of SAM/BAM and related high-throughput sequencing file formats ![]() ![]() ![]() ![]() |
47 | samtools/bcftools This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html ![]() ![]() ![]() ![]() ![]() |
48 | insilicomedicine/GENTRL Generative Tensorial Reinforcement Learning (GENTRL) model ![]() ![]() ![]() ![]() |
49 | shenwei356/awesome Awesome resources on Bioinformatics, data science, machine learning, programming language (Python, Golang, R, Perl) and miscellaneous stuff. awesome , data-science , git , golang , linux , perl , programing-language , python ![]() ![]() ![]() ![]() ![]() ![]() |
50 | chanzuckerberg/cellxgene An interactive explorer for single-cell transcriptomics data dataviz , scientific , scrna-seq , transcriptomics , visualization ![]() ![]() ![]() ![]() ![]() ![]() |
51 | invesalius/invesalius3 3D medical imaging reconstruction software ![]() ![]() ![]() ![]() ![]() ![]() |
52 | lh3/bioawk BWK awk modified for biological data bioinformatics , sequence-analysis ![]() ![]() ![]() ![]() |
53 | MolecularAI/aizynthfinder A tool for retrosynthetic planning astrazeneca , chemical-reactions , cheminformatics , monte-carlo-tree-search , neural-networks , reaction-informatics ![]() ![]() ![]() ![]() ![]() |
54 | owkin/PyDESeq2 A Python implementation of the DESeq2 pipeline for bulk RNA-seq DEA. bioinformatics , differential-expression , python , rna-seq , transcriptomics ![]() ![]() ![]() ![]() ![]() |
55 | broadinstitute/infercnv Inferring CNV from Single-Cell RNA-Seq ![]() ![]() ![]() ![]() ![]() ![]() |
56 | scverse/anndata Annotated data. anndata , bioinformatics , data-science , machine-learning , scanpy , scverse , transcriptomics ![]() ![]() ![]() ![]() ![]() |
57 | soedinglab/hh-suite Remote protein homology detection suite. alignment , bioinformatics , cpp , hh-suite , hhblits , hhpred , hhsearch , opensource , profile-profile-search , profile-search , protein-structure , sequence-search , simd , viterbi ![]() ![]() ![]() ![]() ![]() |
58 | chhylp123/hifiasm Hifiasm: a haplotype-resolved assembler for accurate Hifi reads bioinformatics , denovo-assembly , genomics , hifi-read , pacbio ![]() ![]() ![]() ![]() ![]() ![]() |
59 | insitro/redun Yet another redundant workflow engine aws , bioinformatics , data-engineering , data-science , docker , etl , gcp , ml , python , workflow-engine ![]() ![]() ![]() ![]() ![]() |
60 | biosustain/potion Flask-Potion is a RESTful API framework for Flask and SQLAlchemy, Peewee or MongoEngine flask , flask-extensions , mongoengine , peewee , sqlalchemy ![]() ![]() ![]() ![]() ![]() |
61 | google-deepmind/alphamissense![]() ![]() ![]() ![]() ![]() |
62 | scverse/squidpy Spatial Single Cell Analysis in Python data-visualization , image-analysis , single-cell-genomics , single-cell-rna-seq , spatial-analysis , spatial-transcriptomics , squidpy ![]() ![]() ![]() ![]() ![]() |
63 | lh3/minigraph Sequence-to-graph mapper and graph generator bioinformatics , genome-graph , genomics , pan-genome , sequence-alignment ![]() ![]() ![]() ![]() ![]() |
64 | benevolentAI/guacamol Benchmarks for generative chemistry ![]() ![]() ![]() ![]() ![]() |
65 | calico/basenji Sequential regulatory activity predictions with deep convolutional neural networks. ![]() ![]() ![]() ![]() ![]() |
66 | ome/bioformats Bio-Formats is a Java library for reading and writing data in life sciences image file formats. It is developed by the Open Microscopy Environment. Bio-Formats is released under the GNU General Public License (GPL); commercial licenses are available from Glencoe Software. bio-formats , format-converter , format-reader , image , java , life-sciences-image , lightsheet , metadata , whole-slide-imaging , wsi ![]() ![]() ![]() ![]() ![]() |
67 | MolecularAI/GraphINVENT Graph neural networks for molecular design. ![]() ![]() ![]() ![]() ![]() |
67 | chembl/chembl_webresource_client Official Python client for accessing ChEMBL API chembl , cheminformatics , chemistry , chemoinformatics , python , rest , rest-client ![]() ![]() ![]() ![]() ![]() |
68 | shenwei356/taxonkit A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV bioinformatics , cross-platform , lca , lineage , taxdump , taxid , taxonkit , taxonomy ![]() ![]() ![]() ![]() ![]() ![]() |
69 | deepchem/DeepLearningLifeSciences Example code from the book "Deep Learning for the Life Sciences" ![]() ![]() ![]() ![]() ![]() |
70 | MolecularAI/Reinventastrazeneca , cheminformatics , denovo-design , neural-networks , reinforcement-learning , transfer-learning ![]() ![]() ![]() ![]() ![]() |
71 | aqlaboratory/rgn Recurrent Geometric Networks for end-to-end differentiable learning of protein structure deep-learning , deep-neural-networks , protein-structure , protein-structure-prediction ![]() ![]() ![]() ![]() ![]() |
72 | tencent-ailab/grover This is a Pytorch implementation of the paper: Self-Supervised Graph Transformer on Large-Scale Molecular Data ![]() ![]() ![]() ![]() ![]() ![]() |
73 | lh3/miniprot Align proteins to genomes with splicing and frameshift bioinformatics , sequence-alignment ![]() ![]() ![]() ![]() ![]() |
74 | Roche/pyreadstat Python package to read sas, spss and stata files into pandas data frames. It is a wrapper for the C library readstat. conversion , pandas-dataframe , python , readstat , sas7bdat , spss , stata-files ![]() ![]() ![]() ![]() ![]() |
75 | lh3/miniasm Ultrafast de novo assembly for long noisy reads (though having no consensus step) bioinformatics , denovo-assembly , genomics ![]() ![]() ![]() ![]() ![]() |
76 | chanzuckerberg/MedMentions A corpus of Biomedical papers annotated with mentions of UMLS entities. ![]() ![]() ![]() ![]() |
77 | AstraZeneca/rexmex A general purpose recommender metrics library for fair evaluation. coverage , deep-learning , evaluation , machine-learning , metric , metrics , mrr , personalization , precision , rank , ranking , recall , recommender , recommender-system , recsys , rsquared ![]() ![]() ![]() ![]() |
78 | samtools/htsjdk A Java API for high-throughput sequencing data (HTS) formats. bam , cram , dna , fasta , genomics , java , java-api , ngs , sam , sequencing , vcf ![]() ![]() ![]() ![]() |
79 | shenwei356/brename A practical cross-platform command-line tool for safely batch renaming files/directories via regular expression batch , batch-rename , batch-rename-files , batch-renamer , go , golang , rename , safe , windows ![]() ![]() ![]() ![]() ![]() ![]() |
80 | lh3/wgsim Reads simulator bioinformatics , genomics ![]() ![]() ![]() ![]() |
81 | Acellera/htmd HTMD: Programming Environment for Molecular Discovery automate , drug-discovery , htmd , molecular-simulations ![]() ![]() ![]() ![]() ![]() |
82 | DeepGraphLearning/GearNet GearNet and Geometric Pretraining Methods for Protein Structure Representation Learning, ICLR'2023 (https://arxiv.org/abs/2203.06125) graph-neural-networks , pre-training , protein-representation-learning ![]() ![]() ![]() ![]() ![]() |
83 | MolecularAI/REINVENT4 AI molecular design tool for de novo design, scaffold hopping, R-group replacement, linker design and molecule optimization. ai , astrazeneca , cheminformatics , chemistry , deep-learning , denovo-design , drug-design , drug-discovery , generative-ai , ml , molecule-generation , neural-networks , reinforcement-learning , transfer-learning ![]() ![]() ![]() ![]() ![]() |
84 | rdkit/rdkit-tutorials Tutorials to learn how to work with the RDKit ![]() ![]() ![]() ![]() ![]() |
85 | insightsengineering/rtables Reporting tables with R pharmaceuticals , r , tables ![]() ![]() ![]() ![]() ![]() |
86 | Bayer-Group/cloudformation-template-generator A type-safe Scala DSL for generating CloudFormation templates ![]() ![]() ![]() ![]() ![]() |
87 | pharmaverse/admiral ADaM in R Asset Library cdisc , clinical-trials , open-source , r ![]() ![]() ![]() ![]() ![]() |
87 | OpenGene/awesome-bio-datasets awesome-bio-datasets ![]() ![]() ![]() ![]() |
88 | OpenGene/AfterQC Automatic Filtering, Trimming, Error Removing and Quality Control for fastq data adapter-trimming , bioinformatics , error , fastq , filtering , ngs , overlap , qc , quality-control , sequencing , trimming ![]() ![]() ![]() ![]() ![]() |
89 | Bayer-Group/etcd-aws-cluster A container to assist in managing a etcd2 cluster from an Amazon auto scaling group ![]() ![]() ![]() ![]() ![]() |
89 | modernatx/seqlike Unified biological sequence manipulation in Python biological-sequences , biopython , machine-learning , sequence ![]() ![]() ![]() ![]() ![]() |
89 | scverse/scirpy A scanpy extension to analyse single-cell TCR and BCR data. ![]() ![]() ![]() ![]() ![]() |
90 | lh3/gfatools Tools for manipulating sequence graphs in the GFA and rGFA formats bioinformatics , genome-graph , genomics ![]() ![]() ![]() ![]() |
90 | scverse/muon muon is a multimodal omics Python framework anndata , cite-seq , mudata , multi-omics , multimodal-data , multimodal-omics-analysis , muon , scanpy , scatac-seq , scrna-seq , scverse ![]() ![]() ![]() ![]() ![]() |
91 | aws-samples/aws-batch-genomics Software sets up and runs an genome sequencing analysis workflow using AWS Batch and AWS Step Functions. ![]() ![]() ![]() ![]() ![]() ![]() |
92 | rdkit/mmpdb A package to identify matched molecular pairs and use them to predict property changes. ![]() ![]() ![]() ![]() ![]() |
93 | Acellera/moleculekit MoleculeKit: Your favorite molecule manipulation kit drug-discovery , machine-learning , molecular-modeling , molecular-simulation , molecule , proteins ![]() ![]() ![]() ![]() ![]() |
94 | bioinform/somaticseq An ensemble approach to accurately detect somatic mutations using SomaticSeq cancer-genomics , somatic-variants ![]() ![]() ![]() ![]() ![]() |
95 | MolecularAI/Chemformer![]() ![]() ![]() ![]() ![]() |
96 | owkin/FLamby Cross-silo Federated Learning playground in Python. Discover 7 real-world federated datasets to test your new FL strategies and try to beat the leaderboard. dataset , deep-learning , differential-privacy , federated-learning , healthcare , machine-learning , python ![]() ![]() ![]() ![]() ![]() |
96 | ome/openmicroscopy OME (Open Microscopy Environment) develops open-source software and data format standards for the storage and manipulation of biological light microscopy data. A joint project between universities, research establishments and industry in Europe and the USA, OME has over 20 active researchers with strong links to the microscopy community. Funded … database , image , java , omero , python , server ![]() ![]() ![]() ![]() ![]() |
97 | AstraZeneca-NGS/VarDict VarDict ![]() ![]() ![]() ![]() ![]() |
97 | scverse/spatialdata An open and interoperable data framework for spatial omics data ![]() ![]() ![]() ![]() ![]() |
98 | haowenz/chromap Fast alignment and preprocessing of chromatin profiles bioinformatics , chromatin-profiles , genomics , sequence-analysis ![]() ![]() ![]() ![]() ![]() ![]() |
99 | chao1224/MoleculeSTM Multi-modal Molecule Structure-text Model for Text-based Editing and Retrieval, Nat Mach Intell 2023 (https://www.nature.com/articles/s42256-023-00759-6) clip , computation-chemistry , drug-discovery , editing , foundation-model , molecule-editing , moleculeclip , moleculestm , pretraining , retrieval ![]() ![]() ![]() ![]() ![]() ![]() |
100 | openpharma/visR A package to wrap functionality for plots, tables and diagrams adhering to graphical principles. ![]() ![]() ![]() ![]() ![]() |
100 | chembl/ChEMBL_Structure_Pipeline ChEMBL database structure pipelines ![]() ![]() ![]() ![]() ![]() |
101 | AstraZeneca/awesome-drug-discovery-knowledge-graphs A collection of research papers, datasets and software related to knowledge graphs for drug discovery. Accompanies the paper "A review of biomedical datasets relating to drug discovery: a knowledge graph perspective" (Briefings in Bioinformatics, 2022) awesome-list , drug-discovery , drug-discovery-knowledge-graph , knowledge-graph ![]() ![]() ![]() ![]() |
102 | lh3/biofast Benchmarking programming languages/implementations for common tasks in Bioinformatics bioinformatics ![]() ![]() ![]() ![]() |
103 | shenwei356/kmcp Accurate metagenomic profiling && Fast large-scale sequence/genome searching bigsi , cobs , fracminhash , kmer , metagenomics , scaled-minhash , searching , sketch , sketching , syncmers , taxonomic-classification , taxonomic-profiling , virome ![]() ![]() ![]() ![]() ![]() ![]() |
104 | rgcgithub/regenie regenie is a C++ program for whole genome regression modelling of large genome-wide association studies. ![]() ![]() ![]() ![]() ![]() |
105 | soedinglab/metaeuk MetaEuk - sensitive, high-throughput gene discovery and annotation for large-scale eukaryotic metagenomics bioinformatics , eukaryotes , gene-discovery , gene-prediction , metagenomics ![]() ![]() ![]() ![]() ![]() |
106 | recursionpharma/gflownet GFlowNet library specialized for graph & molecular data deep-learning , gflownet , graph-neural-network , pytorch ![]() ![]() ![]() ![]() ![]() |
106 | scverse/scanpy-tutorials Scanpy Tutorials. ![]() ![]() ![]() ![]() |
107 | bioinform/neusomatic NeuSomatic: Deep convolutional neural networks for accurate somatic mutation detection convolutional-neural-networks , deep-learning , genomics , somatic-variants ![]() ![]() ![]() ![]() ![]() |
108 | lh3/readfq Fast multi-line FASTA/Q reader in several programming languages bioinformatics , sequence-analysis ![]() ![]() ![]() ![]() |
109 | insightsengineering/teal Exploratory Web Apps for Analyzing Clinical Trial Data clinical-trials , nest , r , shiny , webapp ![]() ![]() ![]() ![]() ![]() |
110 | lh3/cgranges A C/C++ library for fast interval overlap queries (with a "bedtools coverage" example) algorithm , bioinformatics , genomics ![]() ![]() ![]() ![]() ![]() |
110 | lh3/kmer-cnt Code examples of fast and simple k-mer counters for tutorial purposes bioinformatics , genomics , k-mer-counting ![]() ![]() ![]() ![]() ![]() |
111 | greenelab/tybalt Training and evaluating a variational autoencoder for pan-cancer gene expression data analysis , autoencoder , cancer , cancer-genomics , deep-learning , gene-expression , script , tool , unsupervised-learning , variational-autoencoder , variational-autoencoders ![]() ![]() ![]() ![]() ![]() ![]() |
112 | aqlaboratory/genie De Novo Protein Design by Equivariantly Diffusing Oriented Residue Clouds diffusion-models , protein-design ![]() ![]() ![]() ![]() ![]() |
113 | DeepGraphLearning/ConfGF Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021). ![]() ![]() ![]() ![]() ![]() |
114 | benevolentAI/DeeplyTough DeeplyTough: Learning Structural Comparison of Protein Binding Sites 3d-models , deep-learning , drug-discovery , metric-learning , protein-structure ![]() ![]() ![]() ![]() ![]() |
115 | chao1224/GraphMVP Pre-training Molecular Graph Representation with 3D Geometry, ICLR'22 (https://openreview.net/forum?id=xQUe1pOKPam) contrastive-learning , generative-model , geometry , graph , molecule , pretraining , self-supervised , self-supervised-learning ![]() ![]() ![]() ![]() ![]() ![]() |
116 | OpenGene/MutScan Detect and visualize target mutations by scanning FastQ files directly bioinformatics , cancer , detection , fastq , mutation , ngs , somatic , validation , variant , visualization ![]() ![]() ![]() ![]() ![]() |
117 | MolecularAI/ReinventCommunityastrazeneca , cheminformatics , denovo-design , jupyter-notebook , neural-networks , reinforcement-learning , transfer-learning ![]() ![]() ![]() ![]() ![]() |
117 | lh3/psmc Implementation of the Pairwise Sequentially Markovian Coalescent (PSMC) model bioinformatics , genomics , population-genetics ![]() ![]() ![]() ![]() ![]() |
117 | tencent-ailab/DrugOOD OOD Dataset Curator and Benchmark for AI-aided Drug Discovery ![]() ![]() ![]() ![]() ![]() |
118 | ome/ome-zarr-py Implementation of next-generation file format (NGFF) specifications for storing bioimaging data in the cloud. ngff , ome , ome-zarr , zarr ![]() ![]() ![]() ![]() ![]() |
119 | Novartis/tidymodules An Object-Oriented approach to Shiny modules communication , inheritance , oop , r , shiny , shiny-modules , tidy-operators ![]() ![]() ![]() ![]() ![]() |
120 | aws-samples/aws-genomics-workflows Genomics Workflows on AWS aws , batch , genomics , step-functions , workflows ![]() ![]() ![]() ![]() ![]() ![]() |
121 | MolecularAI/deep-molecular-optimization Molecular optimization by capturing chemist’s intuition using the Seq2Seq with attention and the Transformer molecular-optimization , multi-property-optimization , seq2seq , transformer ![]() ![]() ![]() ![]() ![]() |
122 | AstraZeneca/SubTab The official implementation of the paper, "SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning" contrastive-learning , multi-view-learning , representation-learning , self-supervised-learning , tabular-data ![]() ![]() ![]() ![]() ![]() |
122 | johnsonandjohnson/Bodiless-JS Framework for building editable websites on the JAMStack ![]() ![]() ![]() ![]() ![]() |
123 | Benson-Genomics-Lab/TRF Tandem Repeats Finder: a program to analyze DNA sequences ![]() ![]() ![]() ![]() ![]() |
124 | lh3/pangene Constructing a pangenome gene graph bioinformatics , pangenome ![]() ![]() ![]() ![]() |
125 | owkin/HistoSSLscaling Code associated to the publication: Scaling self-supervised learning for histopathology with masked image modeling, A. Filiot et al., MedRxiv (2023). We publicly release Phikon 🚀 computational-pathology ![]() ![]() ![]() ![]() ![]() |
126 | AstraZeneca/awesome-shapley-value Reading list for "The Shapley Value in Machine Learning" (JCAI 2022) artificial-intelligence , data-science , deep-learning , explainability , explainable , explainable-ai , explainable-artificial-intelligence , explainable-ml , lime , machine-learning , owen-value , shap , shapley , shapley-additive-explanations , shapley-decomposition , shapley-q-value , shapley-value , xai ![]() ![]() ![]() ![]() |
127 | lh3/bedtk A simple toolset for BED files (warning: CLI may change before bedtk becomes stable) bioinformatics ![]() ![]() ![]() ![]() ![]() |
128 | Bioconductor/Contributions Contribute Packages to Bioconductor bioconductor ![]() ![]() ![]() |
129 | Merck/BioPhi BioPhi is an open-source antibody design platform. It features methods for automated antibody humanization (Sapiens), humanness evaluation (OASis) and an interface for computer-assisted antibody sequence design. antibody , humanization , humanness , oasis , sapiens ![]() ![]() ![]() ![]() ![]() |
129 | soedinglab/plass sensitive and precise assembly of short sequencing reads bioinformatics , metagenomics , metatranscriptomics , opensource , proteins , proteomics , sequence-assembler ![]() ![]() ![]() ![]() ![]() |
130 | benevolentAI/guacamol_baselines Baselines models for GuacaMol benchmarks ![]() ![]() ![]() ![]() ![]() |
131 | AstraZeneca-NGS/VarDictJava VarDict Java port ![]() ![]() ![]() ![]() ![]() |
132 | lh3/ksw2 Global alignment and alignment extension bioinformatics , sequence-alignment ![]() ![]() ![]() ![]() ![]() |
132 | chao1224/ChatDrug LLM for Drug Editing, ICLR 2024 chatgpt , chatgpt3 , conversation , domain-feedback , drug , drug-discovery , drug-editing , editing , llm , molecule , motif , peptide , protein , retrieval , secondary-structure , small-molecule , structure ![]() ![]() ![]() ![]() ![]() |
133 | rdkit/rdkit-js A powerful cheminformatics and molecule rendering toolbelt for JavaScript, powered by RDKit . cheminformatics , drug-discovery , javascript , molecule , molecule-viewer , molecule-visualization , node-js , npm , rdkit , react , wasm ![]() ![]() ![]() ![]() ![]() |
133 | blazerye/DrugAssist DrugAssist: A Large Language Model for Molecule Optimization ai-for-science , drug-discovery , instruction-datasets , instruction-tuning , large-language-models , molecule-generation , molecule-optimization ![]() ![]() ![]() ![]() |
134 | bigdatagenomics/mango A scalable genome browser. Apache 2 licensed. ![]() ![]() ![]() ![]() ![]() |
135 | OpenGene/repaq A fast lossless FASTQ compressor with ultra-high compression ratio ![]() ![]() ![]() ![]() ![]() |
136 | Bioconductor/BiocStickers Stickers for some Bioconductor packages - feel free to contribute and/or modify. bioconductor , stickers ![]() ![]() ![]() ![]() ![]() |
136 | greenelab/pancancer Building classifiers using cancer transcriptomes across 33 different cancer-types analysis , cancer , classifier , gene-expression , machine-learning , methodology , pancancer , tcga , tool , transcriptome ![]() ![]() ![]() ![]() ![]() ![]() |
137 | Roche/BalancedLossNLP![]() ![]() ![]() ![]() ![]() |
138 | Merck/deepbgc BGC Detection and Classification Using Deep Learning bidirectional-lstm , biosynthetic-gene-clusters , deep-learning , deepbgc , natural-products , pfam2vec , python , synthetic-biology ![]() ![]() ![]() ![]() ![]() |
138 | benevolentAI/MolBERT![]() ![]() ![]() ![]() ![]() |
139 | genentech/equifold Official code repository for EquiFold: Protein Structure Prediction with a Novel Coarse-Grained Structure Representation machine-learning , proteins , structural-biology , structure-prediction ![]() ![]() ![]() ![]() ![]() |
140 | OpenGene/GeneFuse Gene fusion detection and visualization alk , bioinformatics , cancer , cosmic , eml4 , fusion , gene , ret , ros1 ![]() ![]() ![]() ![]() ![]() |
141 | biosustain/cameo cameo - computer aided metabolic engineering & optimization ![]() ![]() ![]() ![]() ![]() |
142 | EBI-Metagenomics/emg-viral-pipeline VIRify: detection of phages and eukaryotic viruses from metagenomic and metatranscriptomic assemblies cwl , nextflow , pipeline , viruses , workflow ![]() ![]() ![]() ![]() ![]() |
142 | OpenGene/gencore Generate duplex/single consensus reads to reduce sequencing noises and remove duplications bioinformatics , consensus , deduplication , deep-sequencing , duplex , duplex-sequencing , duplication , ngs , sequencing , sequencing-error , sequencing-noise , somatic ![]() ![]() ![]() ![]() ![]() |
142 | OpenGene/fastv An ultra-fast tool for identification of SARS-CoV-2 and other microbes from sequencing data. This tool can be used to detect viral infectious diseases, like COVID-19. 2019-ncov , bioinformatics , coronavirus , covid , covid-19 , hcov , meta-genomics , microbial-sequences , mngs , ngs , sars-cov-2 , sequencing , viral , viral-infectious-diseases , virus , visualization ![]() ![]() ![]() ![]() ![]() |
143 | lh3/yak Yet another k-mer analyzer bioinformatics , k-mer ![]() ![]() ![]() ![]() ![]() |
143 | lh3/fermikit De novo assembly based variant calling pipeline for Illumina short reads bioinformatics , denovo-assembly , genomics , variant-calling ![]() ![]() ![]() ![]() ![]() |
144 | Merck/Halyard Halyard is an extremely horizontally scalable Triplestore with support for Named Graphs, designed for integration of extremely large Semantic Data Models, and for storage and SPARQL 1.1 querying of the whole Linked Data universe snapshots. ![]() ![]() ![]() ![]() ![]() |
144 | ome/ngff Next-generation file format (NGFF) specifications for storing bioimaging data in the cloud. bioimaging , cloud , data-science , file-formats , spec ![]() ![]() ![]() ![]() ![]() |
144 | soedinglab/CCMpred Protein Residue-Residue Contacts from Correlated Mutations predicted quickly and accurately. ![]() ![]() ![]() ![]() ![]() |
145 | lh3/minimap This repo is DEPRECATED. Please use minimap2, the successor of minimap. ![]() ![]() ![]() ![]() ![]() |
146 | chao1224/Geom3D Geom3D: Geometric Modeling on 3D Structures, NeurIPS 2023 3d , 3d-structures , ai4science , biology , chemistry , crystals , drugs , equivariance , geometry , group , invariance , material , molecules , physics , proteins , symmetry ![]() ![]() ![]() ![]() ![]() ![]() |
147 | phuse-org/phuse-scripts Delivery standard industry analyses, built upon CDISC standards for analysis data ![]() ![]() ![]() ![]() ![]() |
147 | chembl/FPSim2 Simple package for fast molecular similarity searches cheminformatics , chemistry , gpu , python , similarity-search ![]() ![]() ![]() ![]() ![]() |
148 | bayer-science-for-a-better-life/Img2Mol![]() ![]() ![]() ![]() ![]() |
149 | Biogen-Inc/tidyCDISC Demo the app here: https://bit.ly/tidyCDISC_app pharma , r , rinpharma , rstats ![]() ![]() ![]() ![]() ![]() |
150 | openpharma/mmrm Mixed Models for Repeated Measures (MMRM) in R. ![]() ![]() ![]() ![]() ![]() |
150 | MolecularAI/DockStream DockStream: A Docking Wrapper to Enhance De Novo Molecular Design astrazeneca , chemoinformatics , denovo-design , jupyter-notebook , molecular-docking , reinforcement-learning ![]() ![]() ![]() ![]() ![]() |
150 | Bayer-Group/paquo PAthological QUpath Obsession - QuPath and Python conversations digital-pathology , python , qupath ![]() ![]() ![]() ![]() ![]() |
151 | genentech/gReLU gReLU is a python library to train, interpret, and apply deep learning models to DNA sequences. ![]() ![]() ![]() ![]() ![]() |
152 | lh3/hickit TAD calling, phase imputation, 3D modeling and more for diploid single-cell Hi-C (Dip-C) and general Hi-C bioinformatics , genomics , hi-c ![]() ![]() ![]() ![]() |
153 | aqlaboratory/rgn2![]() ![]() ![]() ![]() |
154 | lh3/bgt Flexible genotype query among 30,000+ samples whole-genome bioinformatics , genomics ![]() ![]() ![]() ![]() ![]() |
154 | scverse/rapids_singlecell Rapids_singlecell: A GPU-accelerated tool for scRNA analysis. Offers seamless scverse compatibility for efficient single-cell data processing and analysis. anndata , bioinformatics , gpu , scverse , single-cell ![]() ![]() ![]() ![]() ![]() |
154 | shenwei356/bio_scripts Practical, reusable scripts for bioinformatics bioinformatics , perl , python , reusable , script ![]() ![]() ![]() ![]() ![]() |
155 | EBISPOT/OLS Ontology Lookup Service from SPOT at EBI java , neo4j , obofoundry , owl , owl-api ![]() ![]() ![]() ![]() ![]() |
156 | Sanofi-Public/CodonBERT Repository for mRNA Paper and CodonBERT publication. ![]() ![]() ![]() ![]() ![]() |
156 | OpenGene/scrnapip A Systematic and Dynamic Pipeline for Single-Cell RNA Sequencing Analysis ![]() ![]() ![]() ![]() |
157 | EBI-Metagenomics/genomes-catalogue-pipeline MGnify genome analysis pipeline ![]() ![]() ![]() ![]() ![]() |
158 | samtools/tabix Note: tabix and bgzip binaries are now part of the HTSlib project. ![]() ![]() ![]() ![]() |
158 | shenwei356/BlackheartedHospital (forked from: open-power-workgroup/Hospital) 网传附莆田系医院名单,欢迎更新 ![]() ![]() ![]() |
159 | AbSciBio/unlocking-de-novo-antibody-design![]() ![]() ![]() ![]() |
159 | schrodinger/gpusimilarity A Cuda/Thrust implementation of fingerprint similarity searching cheminformatics , chemistry , gpu , similarity-analysis ![]() ![]() ![]() ![]() ![]() |
159 | lh3/dipcall Reference-based variant calling pipeline for a pair of phased haplotype assemblies ![]() ![]() ![]() ![]() ![]() |
160 | Bioconductor/CSAMA Course material for CSAMA: Statistical Data Analysis for Genome Scale Biology ![]() ![]() ![]() ![]() |
160 | AstraZeneca/onto_merger OntoMerger is an ontology alignment library for deduplicating knowledge graph nodes that represent the same domain. algorithm , alignment , biological-networks , biology , graph , kg , knowledge , knowledge-graph , mapping , ontology , ontology-alignment ![]() ![]() ![]() ![]() ![]() |
160 | hoelzer-lab/rnaflow A simple RNA-Seq differential gene expression pipeline using Nextflow ![]() ![]() ![]() ![]() ![]() |
160 | shenwei356/perfect-bioinformatic-tools What should perfect bioinformatic tools be like? bioinformatics , cli , usability ![]() ![]() ![]() ![]() |
161 | Sanofi-IADC/whispr Open source event, comment and alert processing hub created by Sanofi IADC ![]() ![]() ![]() ![]() ![]() |
161 | calico/scBasset Sequence-based Modeling of single-cell ATAC-seq using Convolutional Neural Networks. ![]() ![]() ![]() ![]() ![]() |
161 | shenwei356/bio A lightweight and high-performance bioinformatics package in Golang bioinformatics , golang , minimizer , package , scaled-minhash , sequence , syncmer , taxdump , taxonomy ![]() ![]() ![]() ![]() ![]() ![]() |
162 | owkin/HE2RNA_code Train a model to predict gene expression from histology slides. ![]() ![]() ![]() ![]() ![]() |
162 | scverse/pertpy Perturbation Analysis in the scverse ecosystem. perturbation , scverse , single-cell ![]() ![]() ![]() ![]() ![]() |