Skip to content

Zehong-Wang/TANS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Can LLMs Convert Graphs to Text-Attributed Graphs?

pytorch pyg license TANS arxiv

The official implementation of TANS, a dataset synthesis method for graphs.

Authored by Zehong Wang, Sidney Liu, Zheyuan Zhang, Tianyi (Billy) Ma, Chuxu Zhang, and Yanfang Ye.

Overview

Motivation

Graph neural networks (GNNs) have become a popular tool to learn node embeddings through message passing on these structures. However, a significant challenge arises when applying GNNs to multiple graphs with different feature spaces, as existing GNN architectures are not designed for cross-graph feature alignment. To address this, recent approaches introduce text-attributed graphs, where each node is associated with a textual description, enabling the use of a shared textual encoder to project nodes from different graphs into a unified feature space. While promising, this method relies heavily on the availability of text-attributed data, which can be difficult to obtain in practice.

Method

To bridge this gap, we propose a novel method named Topology-Aware Node description Synthesis (TANS), which leverages large language models (LLMs) to automatically convert existing graphs into text-attributed graphs. The key idea is to integrate topological information with each node's properties, enhancing the LLMs' ability to explain how graph topology influences node semantics.

Installation

You may use conda to install the environment. Please run the following script.

conda env create -f environment.yml
conda activate TANS

Quick Start

We use cora, pubmed, usa, europe, and brazil in our experiments. The last three will be downloaded when running the model. The previous two can be found in here, and you should put them into data/dataset.

For a quick start, you can download our processed encoded node embedding at here, and put them under the folder data/text_emb.

You can test the quality of the embedding via training from scratch

cd TANS/train
python main.py --use_params --data cora --label_setting ratio/number --emb_method TANS --backbone gcn/gat/mlp --node_text title abstract --seed 1 2 3 4 5

Or domain adaptation

python da.py --use_params --pt_data usa --data europe --emb_method TANS --backbone gcn --seed 1 2 3 4 5

Or transfer learning

transfer.py --use_params --pt_data pubmed --data cora --emb_method TANS --backbone gcn 

Processing the data by yourself!

Generating node descriptions

The folder TANS/preprocess contains the code for preprocessing the data, i.e., using LLMs to generate the node descriptions. You can run the following code to generate the prompts:

cd TANS/generate_text
sh preprocessing.sh

The preprocessing.sh contains (1) generating node topological properties and (2) generating prompts. Note that the first step is time-consuming. Thus, we provide the pre-computed properties in link, which can be put at path data/property. After generating the prompts, you can use the script query_llms to generate node descriptions via LLMs. We use OpenAI-4o-mini as default.

python query_llms.py --model gpt-4o-mini --dataset cora --setting text_rich

For cora and pubmed, we support text_rich and text_limit setting. For usa, brazil, europe, we support text_free setting.

We also provide our generated questions and responses here, which should be put inside data/response.

Encoding node description

Once you have generated the descriptions, you can use language models for encoding.

Basically, you can run the script under the folder TANS/preprocess

sh encode_text.sh

where we use MiniLM as the default encoder.

Also, it is feasible to specify the parameters, like

python encode_text.py --data_name cora --enc_model gpt-4o-mini --llm_model minilm --node_text title abstract

Note that, in node_text term, you can set title abstract to indicate text-rich setting; title or abstract to indicate text-limit setting; and none to indicate text-free setting.

Contact Us

Please contact [email protected] or open an issue if you have questions.

Citation

If you find the repo is useful for your research, please cite the original paper properly.

@inproceedings{wang2025tans,
  title={Can LLMs Convert Graphs to Text-Attributed Graphs?},
  author={Wang, Zehong and Liu, Sidney and Zhang, Zheyuan and Ma, Tianyi and Zhang, Chuxu and Ye, Yanfang},
  booktitle={2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics},
  year={2025}
}

@article{wang2025tans,
  title={Can LLMs Convert Graphs to Text-Attributed Graphs?},
  author={Wang, Zehong and Liu, Sidney and Zhang, Zheyuan and Ma, Tianyi and Zhang, Chuxu and Ye, Yanfang},
  journal={arXiv preprint arXiv:2412.10136},
  year={2024}
}

About

Can LLMs Convert Graphs to Text-Attributed Graphs? NAACL 25

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published