This repository serves as a template for new python projects and a way to express best practices
-
Designs are often very fluid while exploring a new problem. Expect to throw away many parts of a first version to focus on the things that matter.
-
After becoming familiar with the project, spend some time in pseudo-code to focus on identifying the basic building blocks and call graph
- Start implementing with the most deeply nested methods with accompanying tests
- Expand to the methods that call those methods
- End with user interface
-
With the building blocks identified in pseudo-code, carefully consider the data access patterns to inform the data design
-
Choose a data design that is suitable for your most deeply nested methods and modify it sparingly and only with good cause as you add layers
-
Object-oriented design often seems like overkill for new small projects but a. Can avoid the need to pass around lots of data objects b. Can be a framework for helping to formalize the design process
- Always use
argparse
for command-line arguments - Assume the use of
yaml
for structured input files unless there are compelling reasons for something else
- Always use
logger
for status output - Carefully choose an output format for standard formats, considering the following in order of priority:
- All runnable scripts should include a block like:
if __name__ == "__main__":
do_main_task()
- Always use
pytest
for testing - Introduce a simple continuous integration (CI) action ASAP
- Generate pull requests (PRs) with as little code change as possible
- Include tests in all PRs
- Do not merge your own PR; there should always be at least one review by a non-author, and a non-author should merge
- Introduce a
pyproject.toml
file ASAP
- Follow PEP8 style guide, ideally with a tools like
black
to help enforce it, especially via a plugin to your editor
- Generally, choose nouns for variables and verbs for methods
- The most common approach to multi-word variables is "snake case": variables are in lower case with words separated by underscore
- Clear variable and method names can reduce the need for comments
- Carefully determine whether temporary variables are helpful. If you only use it once, consider if that line of code can be combined with the place it is used.
- Avoid Magic Numbers - numerical constants without a clear purpose
- provide numerical constants with a variable to describe their purpose
- these can be physical constants (
AVOGADRO = 6.02e23
), unit conversions (EV2MEV = 1e-6
), or vector indices (z = 2
), among others - they are not generally needed for things like squaring a quantity or dividing by some integer that arises from algebra
- Take advantage of python's rich data structures and related methods
dictionaries
are a preferred way to bind data together in a way that is clear, rather than parallel lists that are indexed in parallelnumpy
arrays are frequently better choices than python lists for numerical data- when looping over iterables (e.g. lists, dictionaries, numpy arrays, etc),
avoid an indexing variable if possible
zip()
may be useful for iterating through multiple iterables with parallel indexingenumerate()
allows you to autogenerate an indexing variable, but make sure you need that index- iterating over a
range()
is probably a last resort, try iterating directly over the iterable or usingzip
- consider list comprehensions for simple operations
- when using a loop variable, consider the same general guidance on variable naming;
avoid overly simple loop variables, e.g.
(i,j,k)
, that have no semantic meaning
- Include a docstring in every method
- Rely on clear variable and method names and add comments sparingly where the intent/approach is non-intuitive
- If you have cut & paste code in two different places, it probably should be a method/function (or in some cases a loop)
- Even very short methods can be valuable if the method name makes the code more readable
- Ideally, methods should be no longer than one screen worth of lines
- Practice Separation of Concerns:
- a single method should have a single purpose that is clear from the name
- avoid any combination of reading, using and writing data in the same method
Before submitting a PR, ask yourself: "Have I...."
- modularized my code into methods that each have a clear and singular purpose?
- included a docstring for every method?
- replaced all magic numbers with variables?
- used method names (verbs) and variable names (nouns) that make the code clearly readable?
- removed instances of copy/paste code or nearly identical sections of code?
- made good data structure design choices?