You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
`` encloses a role. There is a default role, else :<role>:`text`
_ in front, is the special target role. For one word the backtick can be dropped.
_`__init__` should produce a target named "__init__".
But instead the produced target is "init".
The backtick avoids ambiguity. There is no need for this behavior.
--- old
+++ new
@@ -1,7 +1,8 @@
-`` encloses a role. There is a default role, else :<role>:`text`
+ `` encloses a role. There is a default role, else :<role>:`text`
-_ in front, is the special target role. For one word the backtick can be dropped.
+ _ in front, is the special target role. For one word the backtick can be dropped.
-_`__init__` should produce a target named "__init__".
+ _`__init__` should produce a target named "__init__".
+
But instead the produced target is "init".
The backtick avoids ambiguity. There is no need for this behavior.
Please be careful with using raw markup in a web form like this. SourceForge expects MarkDown, which has enough similarities to reStructuredText that the markup will be interpreted/misinterpreted. Use MarkDown to quote any markup, and check that the result makes sense when rendered (use the preview function).
When you say, "There is no need for this behavior", what behavior do you mean, exactly?
It works fine for me. This input:
$ rst2pseudoxml.py<<'EOF'
a target _`__init__` in a paragraph
EOF
Produces this output:
<document source="<stdin>">
<paragraph>
a target
<target ids="init" names="__init__">
__init__
in a paragraph
The target name is __init__. The ID drops the underscores, for the reasons explained in docutils.nodes.Element and docutils.nodes.make_id, e.g.:
Docutils identifiers will conform to the regular expression [a-z](-?[a-z0-9]+)*. For CSS compatibility, identifiers (the "class"
and "id" attributes) should have no underscores, colons, or periods.
Hyphens may be used.
The generated HTML uses the id, so to link to it from outside, use
"name-of-the-document#init"
In the rST source of the same document, you link to it via the name, e.g., __init___
Docutils resolves the "reference name" and the above becomes a reference to the matching target:
Docutils XML: <reference name="__init__" refid="init">,
HTML: <a class="reference internal" href="#init">__init__</a>.
HTML5 id is specified to not contain spaces, but some browsers do support spaces nevertheless.
HTML5 does not specify why it disallows spaces. It should therefore allow spaces.
The same content item should have the same id independent of format (rst, html, pdf, ...)
How should a user target his content item, if every formatter chooses to modify his chosen id?
The id should not be changed.
Docutils should even keep spaces despite HTML5 disallowing them.
If the user runs into a problem with a browser, he will change the id himself and know about it.
Maybe he converts to just pdf, anyway.
To summarize:
RST is not html and does not need restrictions from HTML (or CSS) altogether.
Docutils should develop in that direction.
Relaxing rules does not produce backward incompatibility, either.
Docutils doctree elements may have multiple ids and names.
In the reStructuredText source, only reference names_ are used for naming
elements as well as referring to them. IDs_ are only used in generated
documents.
Reference names_ may be auto-derived from the content (e.g. section
titles) or specified by the author via rST syntax (:name: option of
directives, content of hyperlink targets, label of footnotes or citations).
IDs_ are generated by Docutils (sometimes using names as base) when
parsing rST or in transformations.
How should a user target his content item, if every formatter chooses
to modify his chosen id?
Internal (rST source and included files/parent documents):
use the reference name. This works independent of the output format.
External:
HTML: Use the generated id (when unsure about the transformation of a
given name to id, look it up in the output).
LaTeX: Use the id as label (e.g. in \ref{}). This works only if the
external LaTeX source is combined with the Docutils-generated
LaTeX source (i.e. one must include the other or both included in a
common parent).
PDF: named destinations__ are currently not supported in PDFs
generated from Docutils-generated LaTeX.
The id is (currently) generated once and used unchanged by the writers.
Docutils should even keep spaces despite HTML5 disallowing them.
Docutils policy is to create valid output. Untill this restriction is
lifted in the HTML5 standard, Docutils will not use spaces in HTML-IDs.
Spaces are allowed in reference names_.
If the user runs into a problem with a browser, he will change the id
himself and know about it.
The author cannot change IDs nor implicite reference names directly. If we
would keep spaces, any document with a section title containing whitespace
would also contain spaces in the id of the corresponding section element.
Maybe he converts to just pdf, anyway.
Even worse: Accented characters, Umlauts, Greek, Cyrillic, etc. in section
titles would lead to compilation errors with pdflatex.
To summarize:
RST is not html and does not need restrictions from HTML (or CSS) altogether.
This is why the internal identifiers (reference names_) don't
have these limitations. The rules for reference names (whitespace
normalization and downcasing) are solely based on practicability for rST.
Identifiers in the generated documents must comply with the restrictions of
the output document format.
Docutils should develop in that direction.
There are two alternatives:
a) Keep ids identical across output formats. This would allow only the
intersection of valid element identifiers.
We could lift the restrictions of CSS1, as generated documents would
still be valid XHTML1 and CSS selectors may use escaping or [argument] syntax.
This would relax the requirements to complying with the regexp [A-Za-z][-_:.A-Za-z0-9]* (i.e. also allow underscore, colon,
and full stop).
b) Allow less restrictive identifiers in some formats:
HTML is the format most probably linked to.
The "html5" writer could use the name as ID, just replacing spaces.
This would allow external links like http://example.com/parrot.html#1.Ιανουάριος.
Or the restriction on the first character may be dropped with an exception
for "html4css1".
Relaxing rules does not produce backward incompatibility, either.
No problem for internal links (unless we also change the rules for reference names_.
However, external links adapted to the current rules may break.
Example: a document, parrot.rst contains::
Schöner Titel: warum nicht?
=====================
and I link to this section from somewhere on the net with the URL http://example.org/parrot.html#schoner-titel-warum-nicht, this link will
be broken after re-processing the unchanged source with a Docutils
version with relaxed id-rules.
Therefore, I would only change the rules after careful consideration and an
advance warning. Possibly with an opt-in setting.
I've abbreviated the general concept of identifier with ID.
In this general meaning a reference name is an ID,
because you reference something by uniquely identifying it.
If in docutils there are more reference names and ids
then there are more ways to reference an item.
That is OK.
I was referring only to user chosen reference name_.
Let's keep out IDs generated from headers or form :name:.
I personally never rely on these generated IDs,
because I don't know them.
Instead I put .. _`some_title_id`: in front of a header.
User chosen target IDs (reference name_ in rst) should not be changed.
How are more reference names translated to html,
e.g. for the above additional some_title_id?
More IDs would allow to keep the legacy ID and add
the unchanged user reference name as additional ID.
Else one could add a docutils.conf setting to tell docutils which method to use.
User chosen target IDs (reference name_ in rst) should not be changed.
If the user confines herself to valid names, no change is done.
If the user uses invalid names, the output would be buggy in some output formats. If we want
consistent identifiers, the same rules must aply to all output formats.
Anchors with unchecked user-specified ID value could be specified using raw input but this is not recommended, though.
How are more reference names translated to html?
Try yourself:
.. _first explicit target:
.. _other explicit target:
.. note::
:name: refname from directive option
the object
If you export to Docutils-XML or ~pseudoxml, you will see the three names and ids of the note element. In the HTML, spans are used as anchors for the additional identifiers.
Docutils has versions.
A new version is allowed to behave differently, according semantic versioning.
Everyone knows that.
If someone uses a new version of docutils,
it is that one's responsibility to integrate it into its context.
Docutils should develop with the associated standards.
HTML has standard 5 now.
IDs should be modified only according standard 5.
This means that only spaces can be replaced
when deriving HTML IDs.
If someone uses a new version of docutils,
it is that one's responsibility to integrate it into its context.
There is one problem, though: "Cool URIs don't change"
(https://www.w3.org/Provider/Style/URI.html).
When a new Docutils version produces different URIs for the same input, we
should offer users a way to keep the old URIs.
Docutils should develop with the associated standards.
HTML has standard 5 now.
HTML comes in many different versions. Docutils supports HTML5 with the
"html5_polyglot" writer and XHTML1.1/transitional with the default writer
"html4css1". The default may change in future.
IDs should be modified only according standard 5.
This means that only spaces can be replaced
when deriving HTML IDs.
Identifier keys must be valid in all supported output formats.
Therefore, they must comply with restrictions in the
respective output formats (HTML4.1__, HTML5__, polyglot HTML,
LaTeX, ODT__, troff (manpage), XML__).
We may want to keep the "one ID format for all output formats". Then only
the underscore ("_") may be allowed in addition to the current
transformation.
+1 one rule is easier to remember than a set of different rules.
-1 IDs must keep to a restrictive rule even in more relaxed output formats.
Alternatively, we may allow different identifier transformations for each
output format:
+1 ID-transformation follows (almost) the relaxed rules of the output format.
-1 More complex setup.
-1 ID value used in the output is even harder to predict.
A possible implementation would be via a new "identifier_restrictions"
configuration setting that takes a list of rule sets (CSS1, HTML4, HTML5,
XML, LaTeX, XeTeX/LuaTeX, ODT, troff) and combines them to form the required
transition.
Examples:
The current transition would be identifier_restrictions: HTML4,CSS1.
The "html5polyglot" section could use identifier_restrictions: XML, as
polyglot HTML requires valid XML identifiers.
A user may override this in a config file or with rst2html5 --identifier-restrictions=HTML5.
The has_prefix shouldn't be needed because determined by the ID format data id_start and id_char.
In the command line interface I would also default to legacy,
because of "Cool URIs don't change" and to avoid the necessity to change people's scripts.
I did not compare the ID language data in your py file with the documentation of the according formats.
The text was updated successfully, but these errors were encountered:
author: rpuntaie
created: 2019-09-30 17:30:55.106000
assigned: goodger
SF_url: https://sourceforge.net/p/docutils/feature-requests/66
But instead the produced target is "init".
The backtick avoids ambiguity. There is no need for this behavior.
commenter: goodger
posted: 2019-09-30 18:32:30.338000
title: #66 .. _
__init__
: becomesinstead of
Diff:
commenter: goodger
posted: 2019-09-30 18:54:01.082000
title: #66 .. _
__init__
: becomesinstead of
Please be careful with using raw markup in a web form like this. SourceForge expects MarkDown, which has enough similarities to reStructuredText that the markup will be interpreted/misinterpreted. Use MarkDown to quote any markup, and check that the result makes sense when rendered (use the preview function).
When you say, "There is no need for this behavior", what behavior do you mean, exactly?
It works fine for me. This input:
Produces this output:
The target name is
__init__
. The ID drops the underscores, for the reasons explained in docutils.nodes.Element and docutils.nodes.make_id, e.g.:commenter: goodger
posted: 2019-09-30 18:54:38.778000
title: #66 .. _
__init__
: becomesinstead of
commenter: milde
posted: 2019-09-30 19:24:29.406000
title: #66 .. _
__init__
: becomesinstead of
Do you mean the "hyperlink name" in explicit hyperlink targets?
Here, the backticks can always be dropped
(http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#hyperlink-targets).
In "inline internal targets", the backticks are mandatory (http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#inline-internal-targets).
It is a bit more complicated:
The generated target has the name
__init__
and the idinit
.The id is generated from the name (but may also be something like "id35" if the name-derived id is not unique). The rules and motivation for the conversion are described in
http://docutils.sourceforge.net/docs/ref/rst/directives.html#rationale.
The generated HTML uses the id, so to link to it from outside, use
"name-of-the-document#init"
In the rST source of the same document, you link to it via the name, e.g.,
__init__
_Docutils resolves the "reference name" and the above becomes a reference to the matching target:
Docutils XML:
<reference name="__init__" refid="init">
,HTML:
<a class="reference internal" href="#init">__init__</a>
.commenter: rpuntaie
posted: 2019-10-01 09:23:06.896000
title: #66 .. _
__init__
: becomesinstead of
According
https://www.w3.org/TR/CSS21/syndata.html#characters
an identifier can start with two underscores in CSS.
HTML5 allows the id value to start with two underscores (https://html.spec.whatwg.org/multipage/dom.html#the-id-attribute).
HTML5 id is specified to not contain spaces, but some browsers do support spaces nevertheless.
HTML5 does not specify why it disallows spaces. It should therefore allow spaces.
I made a related post about docutils changing IDs in 11/2018: https://sourceforge.net/p/docutils/mailman/message/36453416/
My position is this:
The id should not be changed.
Docutils should even keep spaces despite HTML5 disallowing them.
If the user runs into a problem with a browser, he will change the id himself and know about it.
Maybe he converts to just pdf, anyway.
To summarize:
RST is not html and does not need restrictions from HTML (or CSS) altogether.
Docutils should develop in that direction.
Relaxing rules does not produce backward incompatibility, either.
commenter: milde
posted: 2019-10-01 16:16:46.659000
title: #66 .. _
__init__
: becomesinstead of
Ticket moved from /p/docutils/bugs/379/
commenter: milde
posted: 2019-10-12 21:35:57.074000
title: #66 .. _
__init__
: becomesinstead of
In rST/Docutils, it is a bit more complicated:
Docutils doctree elements may have multiple
ids
andnames
.In the reStructuredText source, only
reference names
_ are used for namingelements as well as referring to them. IDs_ are only used in generated
documents.
Reference names
_ may be auto-derived from the content (e.g. sectiontitles) or specified by the author via rST syntax (:name: option of
directives, content of hyperlink targets, label of footnotes or citations).
IDs_ are generated by Docutils (sometimes using
names
as base) whenparsing rST or in transformations.
.. _reference names:
http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#reference-names
.. _ids: http://docutils.sourceforge.net/docs/ref/doctree.html#ids
To achieve this, the
id
must be valid in all output formats supported byDocutils (HTML4.1/XHTML1, HTML5, LaTeX, troff (manpage), XML, ODF/ODT).
HTML4.1:
IDs must begin with a letter ([A-Za-z]) and may be followed by
any number of letters, digits ([0-9]), or any of the characters "-_:.".
HTML5:
no whitespace
LaTeX:
only ASCII characters (32-127) except "%~#{}"
https://tex.stackexchange.com/questions/18311/what-are-the-valid-names-as-labels
ODT/ODF:
troff:
Internal (rST source and included files/parent documents):
use the
reference name
. This works independent of the output format.External:
HTML: Use the generated
id
(when unsure about the transformation of agiven
name
toid
, look it up in the output).LaTeX: Use the
id
as label (e.g. in\ref{}
). This works only if theexternal LaTeX source is combined with the Docutils-generated
LaTeX source (i.e. one must include the other or both included in a
common parent).
PDF:
named destinations
__ are currently not supported in PDFsgenerated from Docutils-generated LaTeX.
__ https://tex.stackexchange.com/questions/213860/how-to-generate-a-named-destination-in-pdf
The
id
is (currently) generated once and used unchanged by the writers.Docutils policy is to create valid output. Untill this restriction is
lifted in the HTML5 standard, Docutils will not use spaces in HTML-IDs.
Spaces are allowed in
reference names
_.The author cannot change IDs nor implicite reference names directly. If we
would keep spaces, any document with a section title containing whitespace
would also contain spaces in the
id
of the corresponding section element.Even worse: Accented characters, Umlauts, Greek, Cyrillic, etc. in section
titles would lead to compilation errors with
pdflatex
.This is why the internal identifiers (
reference names
_) don'thave these limitations. The rules for reference names (whitespace
normalization and downcasing) are solely based on practicability for rST.
Identifiers in the generated documents must comply with the restrictions of
the output document format.
There are two alternatives:
a) Keep
ids
identical across output formats. This would allow only theintersection of valid element identifiers.
We could lift the restrictions of CSS1, as generated documents would
still be valid XHTML1 and CSS selectors may use escaping or
[argument]
syntax.This would relax the requirements to complying with the regexp
[A-Za-z][-_:.A-Za-z0-9]*
(i.e. also allow underscore, colon,and full stop).
b) Allow less restrictive identifiers in some formats:
HTML is the format most probably linked to.
The "html5" writer could use the
name
as ID, just replacing spaces.This would allow external links like
http://example.com/parrot.html#1.Ιανουάριος
.Or the restriction on the first character may be dropped with an exception
for "html4css1".
No problem for internal links (unless we also change the rules for
reference names
_.However, external links adapted to the current rules may break.
Example: a document,
parrot.rst
contains::and I link to this section from somewhere on the net with the URL
http://example.org/parrot.html#schoner-titel-warum-nicht, this link will
be broken after re-processing the unchanged source with a Docutils
version with relaxed id-rules.
Therefore, I would only change the rules after careful consideration and an
advance warning. Possibly with an opt-in setting.
commenter: rpuntaie
posted: 2019-10-13 13:36:43.038000
title: #66 .. _
__init__
: becomesinstead of
I've abbreviated the general concept of identifier with ID.
In this general meaning a reference name is an ID,
because you reference something by uniquely identifying it.
If in docutils there are more
reference names
andids
then there are more ways to reference an item.
That is OK.
I was referring only to user chosen
reference name
_.Let's keep out IDs generated from headers or form
:name:
.I personally never rely on these generated IDs,
because I don't know them.
Instead I put
.. _`some_title_id`:
in front of a header.User chosen target IDs (
reference name
_ in rst) should not be changed.How are more reference names translated to html,
e.g. for the above additional
some_title_id
?More IDs would allow to keep the legacy ID and add
the unchanged user
reference name
as additional ID.Else one could add a
docutils.conf
setting to tell docutils which method to use.About multiple IDs in html:
https://stackoverflow.com/questions/192048/can-an-html-element-have-multiple-ids
See comment by BoltClock or the answer by tvanfosson.
commenter: milde
posted: 2019-10-13 20:54:11.563000
title: #66 .. _
__init__
: becomesinstead of
consistent identifiers, the same rules must aply to all output formats.
Anchors with unchecked user-specified ID value could be specified using raw input but this is not recommended, though.
Try yourself:
If you export to Docutils-XML or ~pseudoxml, you will see the three names and ids of the note element. In the HTML, spans are used as anchors for the additional identifiers.
commenter: rpuntaie
posted: 2019-10-15 07:58:22.844000
title: #66 .. _
__init__
: becomesinstead of
Docutils has versions.
A new version is allowed to behave differently, according semantic versioning.
Everyone knows that.
If someone uses a new version of docutils,
it is that one's responsibility to integrate it into its context.
Docutils should develop with the associated standards.
HTML has standard 5 now.
IDs should be modified only according standard 5.
This means that only spaces can be replaced
when deriving HTML IDs.
commenter: milde
posted: 2019-10-30 21:21:32.616000
title: #66 .. _
__init__
: becomesinstead of
There is one problem, though: "Cool URIs don't change"
(https://www.w3.org/Provider/Style/URI.html).
When a new Docutils version produces different URIs for the same input, we
should offer users a way to keep the old URIs.
HTML comes in many different versions. Docutils supports HTML5 with the
"html5_polyglot" writer and XHTML1.1/transitional with the default writer
"html4css1". The default may change in future.
Identifier keys must be valid in all supported output formats.
Therefore, they must comply with restrictions in the
respective output formats (HTML4.1__, HTML5__,
polyglot HTML
,LaTeX, ODT__, troff (manpage), XML__).
__ http://www.w3.org/TR/html401/types.html#type-name
__ https://www.w3.org/TR/html50/dom.html#the-id-attribute
__ https://www.w3.org/TR/html-polyglot/#id-attribute
__ https://tex.stackexchange.com/questions/18311/what-are-the-valid-names-as-labels
__ https://help.libreoffice.org/6.3/en-US/text/swriter/01/04040000.html?DbPAR=WRITER#bm_id4974211
__ https://www.w3.org/TR/REC-xml/#id
We may want to keep the "one ID format for all output formats". Then only
the underscore ("_") may be allowed in addition to the current
transformation.
+1 one rule is easier to remember than a set of different rules.
-1 IDs must keep to a restrictive rule even in more relaxed output formats.
Alternatively, we may allow different identifier transformations for each
output format:
+1 ID-transformation follows (almost) the relaxed rules of the output format.
-1 More complex setup.
-1 ID value used in the output is even harder to predict.
A possible implementation would be via a new "identifier_restrictions"
configuration setting that takes a list of rule sets (CSS1, HTML4, HTML5,
XML, LaTeX, XeTeX/LuaTeX, ODT, troff) and combines them to form the required
transition.
Examples:
The current transition would be
identifier_restrictions: HTML4,CSS1
.The "html5polyglot" section could use
identifier_restrictions: XML
, aspolyglot HTML requires valid XML identifiers.
A user may override this in a config file or with
rst2html5 --identifier-restrictions=HTML5
.commenter: rpuntaie
posted: 2019-10-31 13:35:58.843000
title: #66 .. _
__init__
: becomesinstead of
This is a nice solution. I would also have a special
--identifier-restrictions=none
to turn of all ID mappings.commenter: milde
posted: 2020-03-25 08:30:56.869000
title: #66 .. _
__init__
: becomesinstead of
attachments:
I attach an experimental implementation draft and tests for exploration.
commenter: rpuntaie
posted: 2020-03-26 10:46:44.391000
title: #66 .. _
__init__
: becomesinstead of
I like this:
It allows to use the same ID for output formats that support it,
which are a lot considering HTML5, ODT, XeTeX and XML.
It also means that the generated documents of these formats all have the same ID for the same content,
including the RST source
It stores the ID language restrictions of different target formats within docutils
Regarding API, I would make your
trim_name()
the newmake_id()
:The
has_prefix
shouldn't be needed because determined by the ID format dataid_start
andid_char
.In the command line interface I would also default to
legacy
,because of "Cool URIs don't change" and to avoid the necessity to change people's scripts.
I did not compare the ID language data in your py file with the documentation of the according formats.
The text was updated successfully, but these errors were encountered: