GenSC-6G - Scalable Semantic Communication Framework and Dataset

This repository contains the first semantic communication dataset and playground, designed to be scalable, reproducible, and adaptable for a wide range of applications. The dataset and framework are tailored for semantic decoding, classification, and localization tasks in 6G applications, integrating generative AI and semantic communication. Implementation of GenSC-6G: A Prototype Testbed for Integrated Generative AI, Quantum, and Semantic Communication.

Features of the GenSC-6G Dataset

🔧 Adaptable SC Framework

A flexible prototype that supports modifications to baseline models, communication modules, and decoders, enabling customization for diverse communication needs.

🤖 Generative AI-Driven SC

The integration of generative AI for synthetic data generation, enriching the Knowledge Base (KB) and leveraging large language model (LLM) capabilities for enhanced semantic tasks.

📊 Noise-Augmented Dataset

A labeled dataset with injected noise, specifically optimized for semantic tasks such as target recognition, localization, and recovery. The dataset comprises 4,829 training and 1,320 testing instances across 15 classes of military and civilian vehicle types. It incorporates Additive White Gaussian Noise (AWGN) and Radio Frequency (RF) interference at varying Signal-to-Noise Ratios (SNRs) to evaluate model robustness under realistic channel conditions.

📥 Dataset Download and Overview

Main Dataset

Download the main dataset here

Segmentation Dataset

Download the segmentation dataset here

📝 Case Study on Semantic Tasks

A detailed case study that evaluates baseline models across various semantic tasks, assessing performance and adaptability under different noise conditions to validate the GenSC-6G framework.

Table of Supported Case Study

Classification Models	Segmentation Models	Upsampling Models	EdgeLLM Models
ClassicalViT-L-32	UNet	ResNet-50	Llama-3
ViT-L-32	EfficientNet	DINO-V2	BLIP-2
ResNet-50	SAM	ViT-L-32	GPT-4
VGG-16			Qwen2-VL
Inception-V3			Phi3-Vision
EfficientNet-B1
MobileNet-V3

Setup Instructions

1. Environment Setup

Install Anaconda.

Create an environment using:

conda env create -f environment.yml  # For classical-only setup
conda env create -f environment-quantum.yml  # For HQC setup
conda activate gensc

2. Dataset Setup

Download the dataset from HuggingFace🤗:

cd GenSC-Testbed
git clone https://huggingface.co/datasets/CQILAB/GenSC-6G
cd ..

3. Training Scripts

For classification tasks:

python train.py  # For quantum-based training
python train-nonquantum.py  # For classical training

For upsampling tasks:
```
python train_super_res.py
```

Example of Running Experiments

Train-test different models classification ([inceptionv3, resnet, qresnet, vit, swin, mobilenet, efficientnet, vgg, qcnn]) using:

python train.py --model_name resnet --batch_size 32 --snr 10

python train_nonquantum.py --model_name resnet --batch_size 32 --snr 10

Train-test different models upsampling with:

python train_super_res.py --model_name resnet --batch_size 32 --snr 10

For Interactive Examples

playground-training-classification.ipynb
playground-training-upsampling-and-edgellm.ipynb

Set the kernel to gensc.

Code Customization

Encoders and Decoders:
- Modify allmodels.py to use custom encoder networks.
- Pair with decoders defined in decoder.py.
Edge LLM Models:
- Replace the edge-based large language model (LLM) with alternatives in playground-training-upsampling-and-edgellm.ipynb.

Reproducibility

🗃️ Dataset

Labeled dataset with ground-truth data, noise features, and extracted semantic features. Uploaded to HuggingFace🤗

Dataset Columns and Descriptions

image: Raw image data used for training and evaluation.
image_path: Path to the corresponding image file.
classification_class: Integer label corresponding to the classification category (0-15).
classification_{basemodel}_features: Extracted feature embeddings from {basemodel}'s encoder, consisting of 1000 float32 tensors.
classification_awgn10dB_{basemodel}_features: Feature embeddings extracted from {basemodel} encoder with Additive White Gaussian Noise (AWGN) at 10dB SNR.
classification_awgn30dB_{basemodel}_features: Feature embeddings extracted from {basemodel} encoder with AWGN at 30dB SNR.
upsampling_{basemodel}_features: Extracted feature embeddings for upsampling tasks using {basemodel} encoder, consisting of 1000 float32 tensors.
upsampling_awgn10dB_{basemodel}_features: Upsampling features with AWGN at 10dB SNR for {basemodel}.
upsampling_awgn30dB_{basemodel}_features: Upsampling features with AWGN at 30dB SNR for {basemodel}.

🏗️ Testbed

To experiment with real-world semantic communication, you can use the GNURadio and HackRF.

Install Dependencies:
- Install GNU Radio
- Install HackRF tools: sudo apt install hackrf
Configure Transceiver:
- Transmitter config: GNURadio/transmitter.grc
- Outputs a streaming binary file
Run Transmitter:
- Open GNURadio/transmitter.grc in GNU Radio Companion
- Set SDR parameters (frequency, gain, bandwidth)
- Execute to start transmission
Run Receiver:
- Modify GNURadio/receiver.grc settings
- Run to capture and process signals By following these steps, you can replicate real-world transmission experiments using the testbed and analyze its performance.

💻 Flexible Code

Modular structure for customization, including:

Base models
Communication modules
Decoders
Base model and decoder that is task awareness and potentially support more AI downstream task.

📊 Performance Metrics

The model outputs logs in logs/ directory.
Metrics include:
- Accuracy
- F1 Score
- PSNR (Peak Signal-to-Noise Ratio)
- SSIM (Structural Similarity Index Measure)
- LPIPS (Learned Perceptual Image Patch Similarity)
- CLIP-S (Contrastive Language-Image Pretraining Score for LLMs)
- BERT Score
- BLEU Score
- Word Error Rate (WER)
These metrics provide insight into the robustness of models under different noise conditions and evaluate text-image alignment for semantic tasks. For additional evaluation metrics, refer to TorchMetrics.

🔄 Model Checkpoints

Trained models are saved in the logs/ directory. (Can be opened with tensorboard)
Checkpoints can be loaded for continued training or evaluation.

Citation

The paper can be found at arXiv.

If you use this dataset or framework in your research, please cite:

@article{gensc6g,
      title={GenSC-6G: A Prototype Testbed for Integrated Generative AI, Quantum, and Semantic Communication}, 
      author={Brian E. Arfeto and Shehbaz Tariq and Uman Khalid and Trung Q. Duong and Hyundong Shin},
      year={2025},
      eprint={2501.09918},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2501.09918}, 
}

Others

Official Repository: CQILAB/GenSC-6G
Dataset: HuggingFace

License

MIT License

Contributor

Brian Estadimas
Shehbaz Tariq

You can contribute by adding your own model or modifying the code. Make a pull request (PR), and once it is merged, your name will be listed here.

Changelogs

Dataset uploaded to HuggingFace and Repo intiated

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
FeatUp		FeatUp
GNURadio		GNURadio
GenSC-Testbed		GenSC-Testbed
__pycache__		__pycache__
architecture		architecture
case_study		case_study
figures		figures
logs		logs
semantic_communication		semantic_communication
.gitignore		.gitignore
README.md		README.md
allmodels.py		allmodels.py
allmodels_superres_nonfeatup.py		allmodels_superres_nonfeatup.py
decoder.py		decoder.py
decoder_featup.py		decoder_featup.py
decoder_super_res.py		decoder_super_res.py
decoder_super_res_featup.py		decoder_super_res_featup.py
diffusion_super_res.py		diffusion_super_res.py
feature_extractor.py		feature_extractor.py
feature_extractor_featup.py		feature_extractor_featup.py
test_classifier.py		test_classifier.py
train.py		train.py
train_nonquantum.py		train_nonquantum.py
train_super_res.py		train_super_res.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenSC-6G - Scalable Semantic Communication Framework and Dataset

Features of the GenSC-6G Dataset

🔧 Adaptable SC Framework

🤖 Generative AI-Driven SC

📊 Noise-Augmented Dataset

📥 Dataset Download and Overview

Main Dataset

Segmentation Dataset

📝 Case Study on Semantic Tasks

Table of Supported Case Study

Setup Instructions

1. Environment Setup

2. Dataset Setup

3. Training Scripts

Example of Running Experiments

For Interactive Examples

Code Customization

Reproducibility

🗃️ Dataset

Dataset Columns and Descriptions

🏗️ Testbed

💻 Flexible Code

📊 Performance Metrics

🔄 Model Checkpoints

Citation

Others

License

Contributor

Changelogs

About

Releases

Packages

Languages

CQILAB-Official/GenSC-6G

Folders and files

Latest commit

History

Repository files navigation

GenSC-6G - Scalable Semantic Communication Framework and Dataset

Features of the GenSC-6G Dataset

🔧 Adaptable SC Framework

🤖 Generative AI-Driven SC

📊 Noise-Augmented Dataset

📥 Dataset Download and Overview

Main Dataset

Segmentation Dataset

📝 Case Study on Semantic Tasks

Table of Supported Case Study

Setup Instructions

1. Environment Setup

2. Dataset Setup

3. Training Scripts

Example of Running Experiments

For Interactive Examples

Code Customization

Reproducibility

🗃️ Dataset

Dataset Columns and Descriptions

🏗️ Testbed

💻 Flexible Code

📊 Performance Metrics

🔄 Model Checkpoints

Citation

Others

License

Contributor

Changelogs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages