Skip to content

Suggestions for code organization#16

Closed
ian-ross wants to merge 1 commit into
MIT-LAE:mainfrom
ian-ross:tidying-suggestions
Closed

Suggestions for code organization#16
ian-ross wants to merge 1 commit into
MIT-LAE:mainfrom
ian-ross:tidying-suggestions

Conversation

@ian-ross

Copy link
Copy Markdown
Member

I noticed some things that I would normally set up differently in a Python project. This PR shows some suggestions:

  1. I usually find that it's better not to put data files in the source tree. I like to have a separate data directory, and also to use an environment variable to provide a default location for input files (AEIC_DATA_DIR or something). Then you can use a little wrapper around anywhere that you open files to look first in the current working directory, and then in the given data directory. This allows you to easily have job-specific data files and fall back onto defaults. I've moved things around in this PR and I've added a utility function to do the file lookup. See the example workflow below to see how it works.
  2. If you want to use a top-level src directory (which is a good idea), you need to do it properly. If you find yourself writing import src..., something is not right. The idea of using a src directory is that it forces you to install your code in the same way as the users of the package you're writing will have to. That means that you should install your package as an editable install using pip. (See the workflow suggestion and the references below.)
  3. The dependencies section of the pyproject.toml file is for dependencies you need when using your code. Dependencies needed for development activities like building documentation go into a separate dev dependencies group. And you should list the actual dependencies in your pyproject.toml (in this case, just numpy and scipy, but it's a good habit to get into).
  4. Don't do from something import *. It causes all sorts of problems in the end, for a few seconds of convenience at the start.
  5. Use fussier tools. In the BADA package, for example, there are lots of cases where Optional[float] values are multiplied by a FloatOrNDArray value. The linting tool that I use (either pyright or ruff, not sure which one this message comes from) reports these as Operation "*" not supported for types "FloatOrNDArray" and "float | None". It also catches things like using dictionary access to access attributes of dataclass values, for example using self.aircraft_parameters['c_tc4'] instead of self.aircraft_parameters.c_tc4. Using fussy annoying tools saves you time in the end: if you get rid of all the irritating red squiggles these tools show you before you try running things, you will avoid a lot of potential errors. (I've more or less settled on using pyright and ruff, annoying as they are.)

Example workflow

# Start with a fresh clone.
git clone ...
cd AEIC

# Here, set up a virtual environment however you normally do that. Either using:
#  - Conda, or
#  - python -m venv or
#  - my preferred option these days, uv.

# Set the data directory. All files will come out of here by default.
export AEIC_DATA_DIR=$(pwd)/data

# Install the AEIC code as a local editable install.
pip install -e .

# Make a separate jobs directory to show how the default file access works.
mkdir jobs
cd jobs

python

# No src. in imports!
from AEIC.performance_model import PerformanceModel

# The files here will come from the top-level data directory, unless
# you create job-specific files in the current directory.
pm = PerformanceModel('IO/default_config.toml')

# Reads model data from the data directory in this case.
pm.read_performance_data()

References

@ian-ross ian-ross deleted the tidying-suggestions branch July 11, 2025 05:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants