Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Cookiecut v0.1.0: modify .github and .pre-commit-config.yaml. Follow new practices for writing commit/issue and news.rst file #320

Closed
bobleesj opened this issue Jan 2, 2025 · 14 comments

Comments

@bobleesj
Copy link
Contributor

bobleesj commented Jan 2, 2025

Please re-cookiecut but you would only need to modify the .github folder and .pre-commit-config.yaml. We’ve also added two group practices: For commit messages and issue titles, please use the guideline in Billingegroup/scikit-package#212. For writing news.rst, please use Billingegroup/scikit-package#227. Lastly, we are in the process of writing official documentation for bg-cookiecutter, which has also been cookiecuttered. So, please look out for that!

@ycexiao
Copy link
Contributor

ycexiao commented Feb 1, 2025

In the transition to 79 line length limit, there are three kinds of errors can't be handled by code formatting tools.

  1. long string variable
  2. long comments
  3. long doc string (the most one)

Writing other scripts to handle these three cases is plausible but troublesome. For it should identify context and line length at the same time.

About the fix them manually, the last one is obvious. Just like adjust line length in other documents. There are some options about split strings and comments and I'd like to discuss here.

For strings, there is a good reference How to write very long string that conforms with PEP8 and prevent E501. The patterns I summerized here

s1 = ("a very"
      "long line")

s2 = ("a very" +
      "long line"s3 = "a very" \
     "long line"

s4 = "a very" + \
     "long line"

Also I want to check the ways we handle long comments.

[
    (
        # a very very long comments
        a = b

My idea is change it into the pattern like this while retain the original indentation

[
    (
        # a very very 
        # long comments
        a = b

@ycexiao
Copy link
Contributor

ycexiao commented Feb 1, 2025

Here are some examples of long lines.

long strings variable:
.src/diffpy/utils/diffraction_objects: 33

x_values_not_equal_emsg = (
    "The two objects have different values in x arrays (my_do.all_arrays[:, [1, 2, 3]]). "
    "Please ensure the x values of the two objects are identical by re-instantiating "
    "the DiffractionObject with the correct x value inputs."
)

long doc string:
.src/diffpy/utils/diffraction_objects: 72

'''
    scat_quantity : str
        The type of scattering experiment (e.g., "x-ray", "neutron"). Default is an empty string "".
'''

.src/diffpy/utils/diffraction_objects: 128

'''
        ...
        >>> x = np.array([0.12, 0.24, 0.31, 0.4])  # independent variable (e.g., q)
'''

comments:
./src/diffpy/utils/parsers/serialization.py: 82

    # performed second to prioritize overwriting hdata entries with data_table column entries

./tests/test_diffraction_objects.py: 203

    [
        # Test whether the original y-array is scaled as expected
        (  # C1: none of q, tth, d, provided, expect to scale on the maximal intensity from each object
            {

@ycexiao
Copy link
Contributor

ycexiao commented Feb 1, 2025

My question here is to (1) pick one standards of shortening long string variables (2) ensure this way of handling long comments.

@ycexiao
Copy link
Contributor

ycexiao commented Feb 1, 2025

Another topics may rise here is that whether I should leave some marks where these modification are done to facilitate possible future rephrasing. If there is a need to rephrase the long sentences in the future, some work done here can save much trouble.

@ycexiao
Copy link
Contributor

ycexiao commented Feb 1, 2025

@bobleesj @sbillinge Please see my questions about applying 79 length limits to current projects.

@ycexiao
Copy link
Contributor

ycexiao commented Feb 1, 2025

Another topics may rise here is that whether I should leave some marks where these modification are done to facilitate possible future rephrasing. If there is a need to rephrase the long sentences in the future, some work done here can save much trouble.

e.g.

    # performed second to prioritize overwriting hdata entries with data_table column 
    # entries  <need-rephrase>

I think for string variables, since it concerns with output of a program, it will not be rephrased at all. But for some ugly splitted comments, rephrasing would be good.

@sbillinge
Copy link
Contributor

My question here is to (1) pick one standards of shortening long string variables (2) ensure this way of handling long comments.

Thanks @ycexiao these are great and clearly laid out questions!

we haven't a codified group standard (maybe this could be the start of one), but the defacto standard we use is (1)

s1 = ("a very "
         "long line")

though note the space at the end of the first line which wasn't in your example. If it is an f-string the f needs also to be propagated. My PyCharm handles this automatically if I hit a carriage return in the right place so it is not so bad to do it (though clearly tedious). we only have to do it once though.....paying for our technical debt....

and as per your suggestion

[
    (
        # a very very
        # long comment
        a = b

though note, in this case we don't put a white-space at the end of the line (and note that comment is singular, not comments)

I am afraid that these do need to be handled manually. It is because a human is better at understanding the logic and
making the edits in a human-readable way. If it was easy to automate it would have been automated in black or flake8 autolinters already, believe me! So we probably don't want to go down that rabbit hole....

It is always ok to propose a rephrasing of comments and docstrings if you think it could be more readable! This is welcomed!

@ycexiao
Copy link
Contributor

ycexiao commented Feb 1, 2025

My question here is to (1) pick one standards of shortening long string variables (2) ensure this way of handling long comments.

Thanks @ycexiao these are great and clearly laid out questions!

we haven't a codified group standard (maybe this could be the start of one), but the defacto standard we use is (1)

s1 = ("a very "
         "long line")

though note the space at the end of the first line which wasn't in your example. If it is an f-string the f needs also to be propagated. My PyCharm handles this automatically if I hit a carriage return in the right place so it is not so bad to do it (though clearly tedious). we only have to do it once though.....paying for our technical debt....

and as per your suggestion

[
    (
        # a very very
        # long comment
        a = b

though note, in this case we don't put a white-space at the end of the line (and note that comment is singular, not comments)

I am afraid that these do need to be handled manually. It is because a human is better at understanding the logic and making the edits in a human-readable way. If it was easy to automate it would have been automated in black or flake8 autolinters already, believe me! So we probably don't want to go down that rabbit hole....

It is always ok to propose a rephrasing of comments and docstrings if you think it could be more readable! This is welcomed!

Thank you @sbillinge . Thanks for pointing out the space at the end of line here. In the following PR I will

  1. split the long string using the s1 patter.
  2. try to split the comment in a human-readable way.
  3. adjust the line length for doc string like what we do to normal text.
  4. rephrase some long comment when necessary.

@sbillinge
Copy link
Contributor

btw, something went wrong with the indenting in my cut-paste.....the split long-string lines should have the same indent as each other.

@ycexiao
Copy link
Contributor

ycexiao commented Feb 1, 2025

btw, something went wrong with the indenting in my cut-paste.....the split long-string lines should have the same indent as each other.

I see, I will use this pattern

s1 = ("a very"
      "long line")

@sbillinge
Copy link
Contributor

yes, but with a space after very

@ycexiao
Copy link
Contributor

ycexiao commented Feb 2, 2025

from ./src/diffpy/utils/tools.py funtion get_user_info().

'''
    A template for the config file is below.  Create a text file called '
    diffpyconfig.json' in your home directory and copy-paste the template into
    it, editing it with your real information.
    {
      "owner_name": "<your name as you would like it stored with your data>>",
      "owner_email": "<your_associated_email>>@email.com",
      "owner_orcid": "<your_associated_orcid if you would like this stored with your data>>"
    }
'''

Apparently, a template for .json file is provided in the doc string. But the previous rules to shorten string variables don't work here. Worsely, according to the highest answer from Are multi-line strings allowed in JSON?, there is no way to write a long string variable under the line limit because json doesn't understand string concatenation.

Thus, there is no way of preventing E501 while keeping the .json template syntax correct in .json. In this case, I will rewrite the last line as

      "owner_orcid": "<your_associated_orcid to store this with your data>>"

Thus, in future practice, .json templates in python comment should be as concise as possible. When a longer description is needed, there is no way of keeping the length limit, and it should be held in other places.

This is just a notice here as we are transitioning to the 79-line limit. Thinking that this kind of detail might be helpful.

@sbillinge
Copy link
Contributor

Apparently, a template for .json file is provided in the doc string. But the previous rules to shorten string variables don't work here. Worsely, according to the highest answer from Are multi-line strings allowed in JSON?, there is no way to write a long string variable under the line limit because json doesn't understand string concatenation.

this is then an appropriate time to escape this by leaving the long line and putting at the end # noqa: E510. Please double-check that syntax, but it is something like that.

@bobleesj
Copy link
Contributor Author

bobleesj commented Feb 3, 2025

Closed by @ycexiao in #326

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants