-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/sagemaker llms #234
Draft
isobel-daley-6point6
wants to merge
101
commits into
main
Choose a base branch
from
feat/sagemaker-llms
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Feat/sagemaker llms #234
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
fix: tighten IAM permissions for sagemaker
Fix/sagemaker iam adjustment
fix: adjust iam polciies to enable access to sagemaker bucket
Modularisation complete; fixed some minor bugs too that led to CPU util alarm being defunct for utilization over 70%. Can start considering moving all models to this methodology going forwards.
Making the models easier to redeploy with simplier blocks. Removed redudant code configs.
Feat/sagemaker llms modular
Updated to ensure longer time up and longer time to scale down for improving useability
Updated Alarm params
Chore/remove duplication
Added SNS topic for lambda subscriptions to composite alarms in separate logic - N.B. We NEED to refactor this logic and abstract to another module or something similar as it's overused and not nice like this. makes it harder to diagnose issues.
Added gpu composite to all models
…e/tf-56-composite-alarms
Feature/tf 56 composite alarms - composite alarms
Instead of having to have files local in the filesystem (which is tricky to store securely), that are then copied to S3, which GitLab pulls from on launch, this makes it so GitLab secrets are stored in Secrets Manager, which GitLab pulls from on launch. This is a part 1 of (probably) 2 parts - this does not remove existing object, permissions or any associated config, to allow environments to keep on accessing the secrets as they were, so we don't have to migrate them all at once. Later parts will likely remove permissions and config. This is part of our move away from having to have any secrets locally on the filesystem.
This follows up from #223 by making it possible to apply the terraform with GitLab enabled, but while not have GitLab secrets on the local filesystem.
Feat/sagemaker llms main updates
Feat/lengthen scaledown for phi
* update for all endpoints * tweak so all at 5 minutes
* WIP: new experimental version numbers and formatting Makefile * modifications for a readme * update readme * remove github workflow * modify the way uv install works * latest * update path
* 900 seconds uptime for all * extend alarms for scale down
* 900 seconds uptime for all * extend alarms for scale down * correct error
* feat/create new sagemaker vpc and switch sagemaker resources to run inside vpc * feat: add security group rules for notebook endpoints in new sagemaker vpc * fix: add new routes for sagemaker vpc to enable access to endpoints from Theia * fix: add new security group rules to open up access to sagemaker endpoints from Theia * fix: add new routes for sagemaker vpc to enable access to endpoints from Theia * fix: adjust subets and security groups to reflect new sagemaker vpc * fix: remove duplicate lifecycle policies * fix: adjust changes to test one model in new sagemaker vpc * fix: move domain back into notebooks vpc to avoid unneccessary changes * fix: modifications to security groups * fix: removing sagemaker endpoints in main * fix: modifications to VPC settings to address ongoing endpoint issues * fix: add route 53 private DNS record to enable sagemkaer endpoint to be called from Theia * fix: remove unneccesary peering * fix: add SageMaker API DNS record * fix: move VPC endpoints to notebooks VPC * fix: adjust subnets/security groups to switch models to Sagemaker vpc * fix: adjustments to get SNS endpoint working * fix: address SNS notification issue * fix: sns endpoint * fix: add route table association for S3 endpoint * fix: move all endpoints to new sagemaker vpc * fix: enable connection from SageMaker s3 endpoint to Notebooks bucket * fix: add security group rules to enable access to s3 endpoint * fix: modifications to security group naming * fix: remove unnecessary route53 resources * fix: reinstate alarms on gpt neo 125m * fix: resolve merge conflict - remove falcon
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.