This repository contains the first semantic communication dataset and playground, designed to be scalable, reproducible, and adaptable for a wide range of applications. The dataset and framework are tailored for semantic decoding, classification, and localization tasks in 6G applications, integrating generative AI and semantic communication. Implementation of GenSC-6G: A Prototype Testbed for Integrated Generative AI, Quantum, and Semantic Communication.
A flexible prototype that supports modifications to baseline models, communication modules, and decoders, enabling customization for diverse communication needs.
The integration of generative AI for synthetic data generation, enriching the Knowledge Base (KB) and leveraging large language model (LLM) capabilities for enhanced semantic tasks.
A labeled dataset with injected noise, specifically optimized for semantic tasks such as target recognition, localization, and recovery. The dataset comprises 4,829 training and 1,320 testing instances across 15 classes of military and civilian vehicle types. It incorporates Additive White Gaussian Noise (AWGN) and Radio Frequency (RF) interference at varying Signal-to-Noise Ratios (SNRs) to evaluate model robustness under realistic channel conditions.
Download the main dataset here
Download the segmentation dataset here
A detailed case study that evaluates baseline models across various semantic tasks, assessing performance and adaptability under different noise conditions to validate the GenSC-6G framework.
Classification Models | Segmentation Models | Upsampling Models | EdgeLLM Models |
---|---|---|---|
ClassicalViT-L-32 | UNet | ResNet-50 | Llama-3 |
ViT-L-32 | EfficientNet | DINO-V2 | BLIP-2 |
ResNet-50 | SAM | ViT-L-32 | GPT-4 |
VGG-16 | Qwen2-VL | ||
Inception-V3 | Phi3-Vision | ||
EfficientNet-B1 | |||
MobileNet-V3 |
- Install Anaconda.
- Create an environment using:
conda env create -f environment.yml # For classical-only setup conda env create -f environment-quantum.yml # For HQC setup conda activate gensc
- Download the dataset from HuggingFace๐ค:
cd GenSC-Testbed git clone https://huggingface.co/datasets/CQILAB/GenSC-6G cd ..
- For classification tasks:
python train.py # For quantum-based training python train-nonquantum.py # For classical training
- For upsampling tasks:
python train_super_res.py
- Train-test different models classification ([inceptionv3, resnet, qresnet, vit, swin, mobilenet, efficientnet, vgg, qcnn]) using:
python train.py --model_name resnet --batch_size 32 --snr 10
python train_nonquantum.py --model_name resnet --batch_size 32 --snr 10
- Train-test different models upsampling with:
python train_super_res.py --model_name resnet --batch_size 32 --snr 10
playground-training-classification.ipynb
playground-training-upsampling-and-edgellm.ipynb
Set the kernel to gensc
.
-
Encoders and Decoders:
- Modify
allmodels.py
to use custom encoder networks. - Pair with decoders defined in
decoder.py
.
- Modify
-
Edge LLM Models:
- Replace the edge-based large language model (LLM) with alternatives in
playground-training-upsampling-and-edgellm.ipynb
.
- Replace the edge-based large language model (LLM) with alternatives in
Labeled dataset with ground-truth data, noise features, and extracted semantic features. Uploaded to HuggingFace๐ค
- image: Raw image data used for training and evaluation.
- image_path: Path to the corresponding image file.
- classification_class: Integer label corresponding to the classification category (0-15).
- classification_{basemodel}_features: Extracted feature embeddings from
{basemodel}
's encoder, consisting of 1000 float32 tensors. - classification_awgn10dB_{basemodel}_features: Feature embeddings extracted from
{basemodel}
encoder with Additive White Gaussian Noise (AWGN) at 10dB SNR. - classification_awgn30dB_{basemodel}_features: Feature embeddings extracted from
{basemodel}
encoder with AWGN at 30dB SNR. - upsampling_{basemodel}_features: Extracted feature embeddings for upsampling tasks using
{basemodel}
encoder, consisting of 1000 float32 tensors. - upsampling_awgn10dB_{basemodel}_features: Upsampling features with AWGN at 10dB SNR for
{basemodel}
. - upsampling_awgn30dB_{basemodel}_features: Upsampling features with AWGN at 30dB SNR for
{basemodel}
.
To experiment with real-world semantic communication, you can use the GNURadio and HackRF.
- Install Dependencies:
- Install GNU Radio
- Install HackRF tools:
sudo apt install hackrf
- Configure Transceiver:
- Transmitter config:
GNURadio/transmitter.grc
- Outputs a streaming binary file
- Transmitter config:
- Run Transmitter:
- Open
GNURadio/transmitter.grc
in GNU Radio Companion - Set SDR parameters (frequency, gain, bandwidth)
- Execute to start transmission
- Open
- Run Receiver:
- Modify
GNURadio/receiver.grc
settings - Run to capture and process signals By following these steps, you can replicate real-world transmission experiments using the testbed and analyze its performance.
- Modify
Modular structure for customization, including:
- Base models
- Communication modules
- Decoders
- Base model and decoder that is task awareness and potentially support more AI downstream task.
-
The model outputs logs in
logs/
directory. -
Metrics include:
- Accuracy
- F1 Score
- PSNR (Peak Signal-to-Noise Ratio)
- SSIM (Structural Similarity Index Measure)
- LPIPS (Learned Perceptual Image Patch Similarity)
- CLIP-S (Contrastive Language-Image Pretraining Score for LLMs)
- BERT Score
- BLEU Score
- Word Error Rate (WER)
These metrics provide insight into the robustness of models under different noise conditions and evaluate text-image alignment for semantic tasks. For additional evaluation metrics, refer to TorchMetrics.
- Trained models are saved in the
logs/
directory. (Can be opened with tensorboard) - Checkpoints can be loaded for continued training or evaluation.
The paper can be found at arXiv.
If you use this dataset or framework in your research, please cite:
@article{gensc6g,
title={GenSC-6G: A Prototype Testbed for Integrated Generative AI, Quantum, and Semantic Communication},
author={Brian E. Arfeto and Shehbaz Tariq and Uman Khalid and Trung Q. Duong and Hyundong Shin},
year={2025},
eprint={2501.09918},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2501.09918},
}
- Official Repository: CQILAB/GenSC-6G
- Dataset: HuggingFace
MIT License
- Brian Estadimas
- Shehbaz Tariq
You can contribute by adding your own model or modifying the code. Make a pull request (PR), and once it is merged, your name will be listed here.
- Dataset uploaded to HuggingFace and Repo intiated