Enhancing text classification with BERT-based graph representations using novel contextual metrics

Document Type

Article

Source of Publication

Progress in Artificial Intelligence

Publication Date

1-1-2025

Abstract

Text classification plays a fundamental role in Natural Language Processing (NLP) and is essential for applications such as sentiment analysis, topic labeling, and language detection. While using graphs to represent text shows promise for capturing complex relationships between words and documents, current methods often fall short in encoding semantic information and context, reducing their accuracy. In this paper, we introduce a new approach to improve graph-based text representation by utilizing BERT’s contextualized embeddings. We propose two new BERT-based metrics: Word Importance Score (WIS) and Contextualized PMI (CPMI), which take advantage of the detailed contextual information from BERT embeddings, replacing traditional measures like TF-IDF and PMI. Our method builds graphs where nodes represent words and documents, and edges are weighted using these innovative metrics. This allows us to better capture semantic relationships and contextual nuances. Experiments on standard datasets show that our approach significantly improves text classification accuracy over traditional methods, with our CTextGCN model outperforming the classical TextGCN architecture by a large margin on key benchmarks (e.g., +5% on the MR dataset). This advancement underscores the potential of BERT-based metrics in enhancing graph-based text representations for various NLP tasks.

ISSN

2192-6352

Disciplines

Computer Sciences

Keywords

BERT embeddings, BERT-based metrics, Graph convolution neural networks, Graph-based representation, Text classification

Scopus ID

05006798518

Indexed in Scopus

yes

Open Access

no

Share

COinS