Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Francan authored Oct 26, 2024
1 parent cd07b3a commit 348757f
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
## [BLAH9](https://blah9.linkedannotation.org/)
Project proposal for the Biomedical Linked Annotation Hackathon 9

# Dataset extraction from Graph & Vector DB with automatic relation generation using LLM

## Abstract
Biomedical datasets for AI training are becoming increasingly available, yet each still requires extensive analysis to assess intra-dataset connections, column relevance, etc., which enforces repetitive per-project manual work to find and select the relevant data.
What we propose instead is an automatic system that supports loading arbitrary datasets into a central graph database to form a relationship-based collection, supporting unstructured user queries. The relations between entities are automatically generated by the system, identifying connection points that can relate the imported data. The user will then be able to write a query in natural language about the task at hand, which will result in a selection from the available data relevant for the user.
The objective is to expedite the data selection process, enabling a data-lake-like workflow, relying on the automatic system for the task-specific dataset selection. Additionally, the automatically generated entity relations may bring out relevant connections that could have otherwise gone unused.

## Project Goals:
1. Develop a system capable of extracting subsets of data from a larger collection that fits a user unstructured query.
2. Explore and report on the abilities of the chosen LLM in the creation of automatic entity relationships and data selection upon user query.

## Procedures:
1. Dataset pre-processing: A pipeline for pre-processing operations such as loading, normalization, and structurization of the datasets.
2. Database import: Pipeline responsible for optimal mapping of the source data to Neo4J representations.
Expand Down

0 comments on commit 348757f

Please sign in to comment.