Add force read chain to segment when reading a PDB file / set groups by ids

## Is your feature request related to a problem? ##

I faced an issue when I read my PDB file. 

I downloaded the PDB file from RCSB using the `biopandas`'s package.
```python
# download the files
from biopandas.pdb import PandasPdb

ppdb = PandasPdb().fetch_pdb("5N69")
ppdb.to_pdb(path='./dataset/examples/5N69.pdb', records=['ATOM', 'HETATM'])
```
When I read it in MDAnalysis, I found unexpected segments and segids. This makes it difficult to select the atoms by chain (segid) and operate the universe at the SegmentGroup level.  

![Image](https://github.com/user-attachments/assets/7d2682e4-3cb9-49e2-ae62-6871baed5572)

I first checked my `MDAnalysis.Universe`. It seems that MDAnalysis has detected the segids and thus uses it in the MDAnalysis object. However, the PDB file I downloaded should not have segid information. Thus, from my expectation, it should use chain ID as seg ID in `MDAnalysis.universe` (ref from [doc](https://userguide.mdanalysis.org/1.1.1/formats/reference/pdb.html#reading-in)).   

Next, I opened my PDB file and discovered that the issue stems from the exceeding digit in the tempFactor column.(see line 12-14 in the below pic).
![Image](https://github.com/user-attachments/assets/cff462e7-41b3-4667-a610-eccbe5e85af9)

However, for the current MDAnalysis version, there is no direct solution to correct this format issue in the seg ID. This format issue might often occur when other software processes the PDB file.

I suggest adding this feature to the main codebase so that the user can decide which information to load to segment when reading PDB files.

## Describe the solution you'd like ##

The solution is currently available in the forked MDAnalysis repo:  see changelog [here](https://github.com/MDAnalysis/mdanalysis/commit/48a20d42ae3e626c3b55ee1a802503cdc7f1c4c7)

Considering the current PDBParser works well to get chain ID, the idea is to simply add a variable called `force_chainids_to_segids`. This will force the PDBParser to use chain ID as the seg ID. The user can decide whether to use it or not. If `force_chainids_to_segids=True`, the segments in the Universe are based on chain ID.
```python
# read the universe in the future
u = mda.Universe(pdb_path, force_chainids_to_segids=True)
```

## Describe alternatives you've considered ##

In the current version of MDAnalysis, the only solution to select by chain is to use (but it seems we can't operate the SegmentGroup properly):
```python
# for instance, to select chain A
u.select_atoms('chainID A')

# to operate the segment (chain), might need to select atom first rather than using the SegmentGroup directly
u_chainA = u.select_atoms('chainID A')
u_chainA
```

## Additional context ##

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add force read chain to segment when reading a PDB file / set groups by ids #4948

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add force read chain to segment when reading a PDB file / set groups by ids #4948

Description

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions