Adding my NLP post by f-allian · Pull Request #997 · RSE-Sheffield/RSE-Sheffield.github.io

f-allian · 2026-02-17T19:06:14Z

Adding my NLP blog post from https://casestudiesrcg.blogspot.com/2026/02/a-novel-deep-learning-architecture-for.html
Any feedback/suggestions are welcome, TIA

ns-rse

Reads well @f-allian 👍

Couple of lines which haven't been wrapped to 120 characters and some minor questions/queries are in-line.

ns-rse · 2026-02-24T10:13:39Z

_posts/2026-02-18-innovation-project.md

+artificially upsampling of rare labels in our approach. Instead, we relied on strict stratified sampling across our
+training, validation, and test splits that mimics the raw dataset's proportions and reduces the model's bias. This
+guarantees that rare technological domains are preserved and adequately represented across all phases of model
+development. A summary of the data splits is shown in Table 1.


Could have embedded links to Table 1 here and for other tables/figures even though this table is adjacent and the document isn't too complicated (its a habit I have from using LaTeX to link internally to figures/tables).

ns-rse · 2026-02-24T10:22:01Z

_posts/2026-02-18-innovation-project.md

+architecture of our multi-label text classification pipeline involves the following three main steps:
+
+1. **Preprocessing:** Raw abstracts are tokenised (up to a maximum sequence length of 512 tokens). Each token is mapped to
+a 768-dimensional embedding vector, and the final hidden state of the classification token ([CLS]) is pooled to create a


Why is ([CLS]) in both parentheses and square brackets, is it meant to be a hyper-link to something?

ns-rse · 2026-02-24T10:22:55Z

_posts/2026-02-18-innovation-project.md

+a 768-dimensional embedding vector, and the final hidden state of the classification token ([CLS]) is pooled to create a
+single, dense semantic representation of the entire abstract.
+
+2. **Fine-tuning:** The token embeddings are passed into the pre-trained SciBERT layer to perform fine-tuning. [CLS] token


As above for [CLS] is it meant to be a hyperlink with the target URL in parentheses missing?

ns-rse · 2026-02-24T10:24:53Z

_posts/2026-02-18-innovation-project.md

+During validation, we treated the task as a multi-label classification problem, looking at both micro metrics (e.g.
+F1-micro, which favour frequent classes) and macro metrics (e.g. F1-macro, which treat rare niche classes equally) to


I've not heard of F1-[micro|macro] before would it be worth linking or citing references that explain these for readers?

ns-rse · 2026-02-24T11:45:37Z

_posts/2026-02-18-innovation-project.md

+
+![Training and evaluation performances](/assets/images/2026-02-18-innovation-project/figure3.png)
+{: style="text-align: center;"}
+***Figure 3**: Training and evaluation performances of the hierarchical classifier across 20 epochs. (a) Training loss


One other thing I just remembered...

Whilst *italics* italicises text and **bold** using the two adjacent in this manner could lead to confusion (I had to think a little about it and only really clicked when I rendered the page).

A solution to make it clearer is to use _italics_ which give the same effect and make the source easier to read.

This is more a matter of personal style, but if you used something markdownlint-cli2 then depending on configuration for rule MD049 it might throw some errors. (Other Markdown linters are available, this is the one I use commonly as a pre-commit hook, I may switch to a Rust based on rumdl in the future).

f-allian · 2026-02-24T13:29:43Z

@ns-rse Thanks for your feedback Neil, I've addressed the changes in a new commit.

f-allian added 2 commits February 17, 2026 18:51

add: NLP innovation blog post

5bc1689

fix: a few sentences/formatting

2b0a5e9

f-allian self-assigned this Feb 17, 2026

f-allian added the new post Create a new post label Feb 17, 2026

ns-rse requested changes Feb 24, 2026

View reviewed changes

ns-rse reviewed Feb 24, 2026

View reviewed changes

fix: feedback suggestions

4d13e71

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding my NLP post#997

Adding my NLP post#997
f-allian wants to merge 3 commits intoRSE-Sheffield:masterfrom
f-allian:master

f-allian commented Feb 17, 2026

Uh oh!

ns-rse left a comment

Uh oh!

ns-rse Feb 24, 2026 •

edited

Loading

Uh oh!

ns-rse Feb 24, 2026

Uh oh!

ns-rse Feb 24, 2026

Uh oh!

ns-rse Feb 24, 2026

Uh oh!

ns-rse Feb 24, 2026

Uh oh!

f-allian commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		During validation, we treated the task as a multi-label classification problem, looking at both micro metrics (e.g.
		F1-micro, which favour frequent classes) and macro metrics (e.g. F1-macro, which treat rare niche classes equally) to

Conversation

f-allian commented Feb 17, 2026

Uh oh!

ns-rse left a comment

Choose a reason for hiding this comment

Uh oh!

ns-rse Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ns-rse Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

ns-rse Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

ns-rse Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

ns-rse Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

f-allian commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ns-rse Feb 24, 2026 •

edited

Loading