Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docx_summary removes hyphens from text #573

Closed
daviddiviny opened this issue May 16, 2024 · 2 comments
Closed

docx_summary removes hyphens from text #573

daviddiviny opened this issue May 16, 2024 · 2 comments

Comments

@daviddiviny
Copy link

Please let me know if you need a reprex, but I notice that when extracting tables using docx_summary, any hyphens in words are removed, e.g. "Inspector-General" becomes "InspectorGeneral".

This occurred when extracting tables from the Australian Government Budget at this link.

Thanks to @elipousson for pointing out that this is probably due to docx_summary not the {officerExtras} package.

@trekonom
Copy link
Contributor

The issue is that the hyphens in your table are non breaking hyphens which are added as a <noBreakHyphen> element. As a result these hyphens get dropped in docx_summary as it only extracts text. I just added a PR with a possible fix. See #575 .

Copy link

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants