We aim at developing Open-Source Large Language Models to serve Australia
through collaborations across universities, governments and business sectors
Announcement (7 Feb): For all newcomers and our OG crews, we send you a big welcome for joining with us (again) in this journey. Below are the old schedules in 2024, to give you a litte bit taste of who we are and what actually is going on. If you have any ideas or questions, don't hesitate to connect with us via Discord or shoot an email to our team members. Wish you all will have a pleasant experience in the next following months!
2024 Schedule
Join our exciting 12-week (5 Aug - 7 Oct) Meetup Events held every Monday:
- 🏫 Come to visit us at ANU School of Computing in person
- 💬 Hop on our Discord Server to have a chitchat
Latest Update (8 Oct): Sadly, our 12-week journey has come to an end. A heartfelt ❤️ thank you to all our community members who joined us over the past few months. It’s been an amazing journey with lovely people like you! Stay tuned to our Meetup Events and we’ll see you next semester.
-
🏃♀️ Speed run some basic knowledge
- Play and visualise LLMs with LLM Visualization created by Brendan Bycroft.
- Enjoy transformer videos made by 3Blue1Brown:
- Read these awesome articles from real human intelligence 📜
-
🛠️ Build one from scratch
- Follow one of tutorial videos from Andrej Karpathy (former OpenAI research scientist):
-
📜 Read some simple yet functional repos
- minGPT: A small, clean, interpretable and educational GPT re-implementated in PyTorch.
- nanoGPT: The simplest, fastest repository for training/finetuning medium-sized GPTs. A rewrite of minGPT.
- build-nanogpt: Walk through step-by-step and clean GitHub commits to slowly build a nanoGPT.
- nano-llama31: A minimal, dependency-free implementation of the Llama 3.1 architecture.
- ⚔️ Compare performance of the latest LLMs
- 🎮 Good visualisation is all you need
- WizMap from Polo Club of Data Science @ Georgia Tech for visualising large-scale token embeddings.
- Dodrio from Polo Club of Data Science @ Georgia Tech for attention head summarization and semantic and syntactic knowledge contexts from transformer models.
- 📦 Interesting topics and other stuffs
- ChatGPT: 30 Year History | How AI Learned to Talk from Art of the Problem on YouTube.
- The moment we stopped understanding AI [AlexNet] from Welch Lab on YouTube.
- CNN Explainer from Polo Club of Data Science @ Georgia Tech for helping non-experts learn about Convolutional Neural Networks (CNNs).
- NeuroCartography and Summit from Polo Club of Data Science @ Georgia Tech for visualising image embeddings from ImageNet.
- Data Source Contributor 🕵️♀️
- Identify and provide access to Australia-related data sources.
- Collaborate with other contributors to ensure data quality and relevance.
- Data Collecting, Crawling and Scraping 👩🌾
- Develop scripts and tools to collect data from various sources.
- (Optional) Have experience with web scraping tools (e.g., BeautifulSoup, Scrapy).
- Data Cleaning 👩⚕️
- Clean and preprocess datasets to ensure they are ready for analysis and modeling.
- (Optional) Have experience with data manipulation libraries (e.g., Pandas, NumPy).
- Model Building, Training and Tuning 👩💻
- Develop and train LLMs to solve with our datasets.
- Have experience with machine learning frameworks (e.g., TensorFlow, PyTorch).
- GitHub Organising 👩🔧
- Manage the GitHub repository by organizing files, documentation, and issues.
- (Optional) Have proficiency in using Git and GitHub.
- Hugging Face Organising 👩🏭
- Manage and organize model versions and datasets.
- Ensure proper documentation and metadata for each model and dataset.
- Social Media Organising 👩💼
- Promote the project and its updates on social media platforms (e.g., Discord, Meetup).
- Engage with the community to increase project visibility and collaboration.
Can't wait to join us? Send a message to our lovely team members:
- Mattew: [email protected]
- Mohan: [email protected]
- Roshan: [email protected]