Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-45456 Move from Flask to FastAPI +424-488 #32

Merged
merged 13 commits into from
Aug 2, 2024
Merged

Conversation

bbrondel
Copy link
Collaborator

Small changes in API behavior:

  • structure of errors, especially parse errors, influenced by FastAPI
  • some HTTP error codes changed
  • instrument names and observation type names returned as lowercase (e.g., "LATISS" -> "latiss")

Astropy is added as a dependency for validation of UCD and units.

@bbrondel bbrondel requested review from timj and ktlim July 31, 2024 23:56
Copy link

codecov bot commented Aug 1, 2024

Codecov Report

Attention: Patch coverage is 85.65401% with 34 lines in your changes missing coverage. Please review.

Project coverage is 84.90%. Comparing base (5822be9) to head (fb4015c).

Files Patch % Lines
python/lsst/consdb/pqserver.py 83.65% 34 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #32      +/-   ##
==========================================
+ Coverage   81.90%   84.90%   +3.00%     
==========================================
  Files           4        4              
  Lines         503      550      +47     
==========================================
+ Hits          412      467      +55     
+ Misses         91       83       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bbrondel bbrondel marked this pull request as ready for review August 1, 2024 03:09
@bbrondel bbrondel requested a review from dhirving August 1, 2024 15:54
Copy link

@dhirving dhirving left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks alright -- the only issue that is functionally incorrect is that all of those async def route handlers should be def.

Some resources that might help you out as you start your FastAPI journey:

  1. Russ's recommended FastAPI pattern technote: https://sqr-072.lsst.io/ . None of this is mandatory and lots of apps vary from it for different reasons. But the closer you stay to this, the more people will be able to quickly jump in and help with the app.
  2. SQRE just put on a boot camp that covered a lot of about FastAPI and how it is deployed in Rubin. Recordings of all the talks are here: https://confluence.lsstcorp.org/display/DM/SQuaRE+Bootcamp+-+May+6-10+2024
  3. You should probably stop in to #dm-square on Slack and use the link at the top to sign up for an "Ask SQuaRE" slot to stop by their weekly Q&A, just to introduce yourself. SQRE has sort of paved the way for doing this kind of work and will be helpful folks to talk to if you get stuck. (This is Frossie's team. Frossie, Adam, Angelo, and Stelios are in Tucson -- have you met them yet? If not you should come to lunch with us tomorrow. A couple of the key players are remote -- Russ and Jonathan.)
  4. I'm in room M-15 all day every day if you need anything.


Returns
-------
json_dict: `dict` [ `str`, `Any` ]
JSON response with a list of instruments, observation types, and
data types.
"""
logger.info("GET /")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FastAPI by default logs the paths that are accessed, so you don't need to manually. (Or it might be one of the layers under it -- asgi or uvicorn -- but something puts it in the logs anyway.)

"/",
description="Metadata and health check endpoint.",
include_in_schema=False,
response_model=IndexResponseModel,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to manually declare the response_model here -- FastAPI already knows what it is from the type annotation on the function.


message: str = Field(..., title="Human-readable response message")
key: str = Field(..., title="The name of the added key")
instrument: str = (Depends(validate_instrument_name),)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think FastAPI will do anything with a Depends if it is anywhere other than a function parameter to a route handler.

Generally speaking Depends isn't used for validation logic -- it is for pulling in things like a database connection, user authentication info, etc. Generally things that are external resources that are part of your system, not data from the user. See the docs at https://fastapi.tiangolo.com/tutorial/dependencies/. (They're also really helpful for unit testing because it gives you a way to inject mocks easily.)

Since it looks like you want this instrument name validation logic in a few places, one good way to write this might be something like:

# Declare a reusable, validated string data type
InstrumentName = Annotated[str, AfterValidator(validate_instrument_name)]

class AddKeyResponseModel(BaseModel):
    ...
    instrument : InstrumentName

...

async def add_flexible_metadata_key(
    instrument: InstrumentName
...

Note that you will need to modify validate_instrument_name so it raises ValueError instead of HttpException.

If it's a one-off, you can declare validation logic directly on the field in a few different ways as well, see the docs at https://docs.pydantic.dev/latest/concepts/validators/



class AddKeyRequestModel(BaseModel):
key: str = Field(..., title="The name of the added key")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just omit the default value instead of explicitly writing ... for required fields.

key: str = Field(..., title="The name of the added key")
dtype: AllowedFlexTypeEnum = Field(..., title="Data type for the added key")
doc: Optional[str] = Field("", title="Documentation string for the new key")
unit: Optional[str | None] = Field(None, title="Unit for value")
Copy link

@dhirving dhirving Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some type-related nitpicks here. Optional[T] is exactly equivalent to T | None, so saying Optional[str | None] is redundant -- you can just say str | None or Optional[str].

For cases like doc where you have provided a default value and don't want to let it be None, you can just declare it as

doc: str = Field("", title=...)

From the end-user perspective it is still "optional" because if they don't provide it, it will default to the empty string. If you had declared it as Optional[str] they could pass in a null in the JSON and it would pass validation, turning into None in your model.

response_model=AddKeyResponseModel,
description="Add a flexible metadata key for the specified instrument and obs_type.",
)
async def add_flexible_metadata_key(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to be EXTREMELY careful about the difference between async def and def in FastAPI.

If you use a blocking, synchronous function inside an async def handler, no other requests will be handled until the function returns. Unless there is an await statement in your function to return control back to the event loop, the function will prevent everything else in FastAPI from running. (async/await is basically cooperative multi-tasking.)

Generally speaking, the two places it is correct to use async def in FastAPI handlers (or dependency functions, exception handlers, etc.) are:

  1. When your function body contains an await statement to do background I/O
  2. Short, fast functions that do no I/O or long-running computations.

In this function you are doing a synchronous call to sqlalchemy (there is no await), so this will block. (sqlalchemy does provide an async/await version of its API if its configured with the right driver -- you might consider switching to that so you can use async throughout.)

If you instead declare the function with plain def, FastAPI will run this on a thread pool for you so it does not block other requests.

See https://fastapi.tiangolo.com/async/ for more info.

description="Flex schema for the given instrument and observation type.",
)
async def get_flexible_metadata_keys(
instrument: Annotated[str, Depends(validate_instrument_name)] = Path(..., title="Instrument name"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the discussion above, it doesn't make sense to use Depends and Path in the same declaration -- either the data is coming from the path, or it's coming from the dependency function -- it can't be both.

(This looks like it is working because validate_instrument_name is separately declaring a Path for pulling in the instrument name. The value is coming entirely from validate_instrument_name, not the Path declaration here.)

table: str,
data: InsertMultipleRequestModel = Body(..., title="Data to insert or update"),
u: Optional[int] = Query(0, title="Update if data already exist"),
) -> dict[str, Any] | tuple[dict[str, str], int]:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return value doesn't appear to be correct (it's actually returning a GenericResponse).

Mypy flags this in my editor -- it looks like this repo might not be running typechecks in the GitHub actions.

return m[0](v)


class ObsIdColname(str, Enum):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it do when you subclass from both str and Enum? I've never seen that before and I don't see anything about it in the docs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know what I was thinking there, but I was going for StrEnum.

@dhirving
Copy link

dhirving commented Aug 1, 2024

One other thing is that you may want to preserve the case of the instrument names and observation types unless you have a good reason to do otherwise. These show up in many different contexts and they typically have a very specific casing. (Ask Tim about this maybe.)

# Environment variables that must be set:
# DB_HOST DB_PASS DB_USER DB_NAME or POSTGRES_URL

# Expose the port.
EXPOSE 8080

ENTRYPOINT [ "gunicorn", "-b", "0.0.0.0:8080", "-w", "2", "consdb-pq.pqserver:app" ]
ENTRYPOINT [ "gunicorn", "-b", "0.0.0.0:8080", "-w", "2", "-k", "uvicorn.workers.UvicornWorker", "consdb_pq.pqserver:app" ]
Copy link

@dhirving dhirving Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually people just invoke uvicorn directly e.g.

uvicorn consdb_pq.pqserver:app --host 0.0.0.0 --port 8080

and I think that would let you drop the dependency on gunicorn as well.

@bbrondel bbrondel merged commit 19b5224 into main Aug 2, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants