-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from previous PR on Running automated tests via CICD #67
Conversation
The previous PR commit history got messed up due to rebasing to handle the commits that had missing signoffs. Created a new PR to bring over the final changes from there. PR link: EVerest#62 Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if a matrix job B is dependent on matrix job A but matrix job A hasn’t finished building it’s image as yet, then matrix job B might use the older image which it would have pulled and not the locally built updated image.
In this case, the tests would be run on an older image, and could falsely pass the tests.
I don't understand this. The tests cannot run on an older image because every image has a new tag, and you specify the new tag while testing the image. The whole point of having unique tags is to ensure that every image is unique and the behavior of the image is reproducible. You need to ensure that the new image tag is actually passed in to the docker compose properly
So, the matrix job for the mqtt service uses the pulled image layers from the cache for the manager image (it pulled in the first place since the manager image was used as a part of the docker compose), and this pulled cached image did not have the test script file.
I am not sure what you mean by this. Since the manager image with the new tag has not been built, it will not exist to pull
Do you mean to say that the existing workflow yaml file produces a new tag for each image? There is a step in the
Or, should I be manually updating the TAG in the .env file ?
Right now, the
So, as seen in this workflow run, it starts pulling the matching tagged image from GHCR if that image isn't available locally. |
I looked at the commit history specifically for the
So, on seeing the commit history, I thought that perhaps, whenever changes were there to the Dockerfiles, or any contents of the service directory (e.g. This can be seen in this commit: 33767cc But that is not always the case, there are commits where the TAG in |
This. We build on each commit but we only push when there is a new release, aka the |
I did a test workflow run by updating the TAG in
Both runs resorted to pulling the image with the TAG but did not find it.
Error in workflow run:
Error in workflow run:
In this run, the |
Found some discussions on tags, going through those and found that they do touch on some of the points I've mentioned before:
|
It seems like there's a fairly trivial fix for this - have you tried changing the order of the jobs in the matrixed workflow so that the manager is built last? |
I don't think this would work as the documentation states that the order of jobs defines the order of jobs creation. I did test this out and the workflow still failed as expected, with the same results as above.
I also tried setting the concurrency to 1 using
I also tried deleting the cached data in my forked repo, hoping that it's then able to detect the freshly cached data. |
What I’m trying now: 1. Separating the Build -> Test -> Push steps into separate jobs. I had hoped that the images tagged with the new TAG in .env that I manually updated would be found by the automated tests job. Workflow run failed. My assumption is that docker compose is the unable to pull images from cache. |
Confirmed: It’s the docker compose that’s unable to use latest built images from the GitHub Actions cache !To confirm this, I’m going to continue on error or skip the automated tests job for now and see if the last Push job runs and is able to pull images with the latest tag from the cache. This will also confirm whether the 1st job uses the cached layers from the previous workflow runs or not. Yes, the Build and the Push jobs worked without the automated tests job included. Additionally, the both the initial Build job was able to use cached image layers from the previous workflow run. |
Finally got Docker compose to work without making any changes to the docker compose file. Here's the workflow run ProblemThe problem as identified in the 2nd workflow run output as seen in this comment above, was that the mqtt image was not being found with the latest tag. And this was surprising since the job before it had just built the images, cached the image layers to GitHub Actions cache. So why was the docker compose still pulling images from GitHub Container Registry? GitHub Actions cache vs GitHub Container RegistryThe answer was in the two GitHub locations I just mentioned above: GitHub Actions cache vs GitHub Container Registry. So, the build jobs import from and export to the GitHub Actions Cache which is a 'local' storage, local in the sense that it is specific to the user or organization's repository (documentation). The images in our case are not stored as complete images but rather image layers are cached as blob objects. GitHub Actions registry is a different container service altogether, that stores the final images instead. This is where the docker compose defaults to pulling the images from as it did not find images locally. |
Solution: How did I get the image to be fetched from GitHub Actions cache?For the image with the latest tag to be available in the current job's runner instance, I re-used the docker build push action to load both the mqtt-server and manager service images that were built in the previous build job:
This looks exactly similar to the way we build images in the 1st job in the docker build push action. But a key difference is that, the earlier job was actually a set of matrix jobs. So, for each individual service This explains why in the mqtt server matrix job, manager image wasn't found and vice versa as observed in this comment. |
I got this working by separating the Now that we know this, we can get this to work by keeping them in a single job as well by making use of conditional statements to run steps only for certain matrix jobs (mostly, manager, for automated tests for now). Doing this now. |
Note that your issue is that the matrixed workflow creates multiple jobs, one for each iinput. |
I did take a look at initially and mentioned it here as an approach. But now having narrowed the core issue down, I now understand I would only need the So, will finalize on that approach now. |
…x job Found out an issue with the latest tagged image not being fetched in the docker compose command. The reason was that the matrix job only loaded its respective image from the cache. But the docker compose command for running automated tests uses both manager and mqtt server images. To solve this, initially I used docker build push action to reload the mqtt image in the matrix job for manager. But then finally went with the approach of using artifacts to share the mqtt image between jobs - uploaded in mqtt matrix job, downloaded in manager matrix job. Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
Latest commit now using artifacts to share the mqtt image and make it accessible in the matrix job for manager. This is based on the example to share images from the documentation. Four new steps introduced:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MukuFlash03 I am fine with this PR with the changes below. Can you please apply them? I will then merge. We can move on polishing this; notably thinking about whether we actually need the MQTT server to be built every time.
.github/workflows/cicd.yaml
Outdated
cache-from: type=gha,scope=${{ matrix.image_name }} | ||
cache-to: type=gha,mode=max,scope=${{ matrix.image_name }} | ||
|
||
# Following fours steps are specifically for running automated steps which includes loading the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Following fours steps are specifically for running automated steps which includes loading the | |
# Following four steps are specifically for running automated steps which includes loading the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
if: ${{ matrix.image_name == 'mqtt-server' }} | ||
id: save-mqtt-image | ||
shell: bash | ||
run: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we not have to save the node-red
container as well? I guess since this is an automated test, maybe not. Can you please add a comment here to clarify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comment.
.github/workflows/cicd.yaml
Outdated
run: | | ||
docker images | ||
echo "Running docker compose up..." | ||
docker compose --project-name everest-ac-demo \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docker compose --project-name everest-ac-demo \ | |
docker compose --project-name everest-ac-automated-testing \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified.
Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
I've committed the suggested changes. As for the CI/CD failing, I haven't encountered this error before:
I went ahead and tested it on my forked repo with the following configurations and still it failed. Different testing configurations used:
Found some related public issues / discussions: I suspect the issue seems to be with either:
|
Also checked out the main branch after pulling in latest changes from remote origin after syncing with parent repo. |
Investigated this more to narrow down root cause.
Next, within the RUN command I commented some layers and the next level was the entrypoint.sh script which is present in Now in this install.sh script, I added some log statements:
On running the workflow again, it only had one echo statement - Workflow run [can search for "CI/CD Test: Running" in the GitHub Actions workflow logs"] It's during this command for configuring and setting up the build, that the CI/CD fails. |
@MukuFlash03 I don't actually see that you have identified the root cause. I understand that the CI/CD fails while configuring and setting up the build, but what is the cause of the failure in the configuration? We can't actually merge this PR while the workflow is broken. |
Right, took a deeper look at the GitHub Action workflow logs. The erroneous files are present in everest-core/cmake
|
I built the
CMakeError.log
I am not sure if anyone are related to the everest project or any code from the repo. Some error types seen are: Type 1:
Type 2:
Type 3:
Type 4:
Type 5:
Type 6:
Type 7:
Type 8:
|
Since I didn't find much info in the log files generated, coming back to the error message:
The EV_CLI name not being found could be the next thing to investigate:
|
These are the two files mentioned in the error logs: cmake/ev-cli.cmake Changes to cmake/ev-cli.make were last made on June 24th, 2024 - history Changes to cmake/ev-project-bootstrap.cmake were last made on July 10th, 2024 - history The last successful CI/CD run for Changes in However, changes in |
To investigate further, I changed the source repo in the RUN command to use a different branch in my forked repo for everest-core:
This however failed saying that no version spec found. The I took a look at the The latest TAG 2024.7.1 was released in August 2024. |
So I began testing locally first by just building the
As mentioned in this comment, I had added log statements to the
I tested with these
FAILED - 2024.7.0
FAILED - 2024.7.1
|
Seeing that the newer release versions of 1. Testing with a copy of the current main branch of everest-demo - PASSED First, I made a copy of the current main branch of everest-demo, changed the As noted in the above comment, with the currently defined But, in the workflow run that I triggered with The workflow PASSED ! 2. Testing with a my forked repo everest-demo - FAILED Next, I set the The manager docker image was also built successfully. It failed in the import / export docker image step which is a part of the
No space left, memory issues in the Github action runner OS image? |
Next immediate goal before implementing anything:
Focus on these fundamental questions; not everything has to be implemented / executed always. Now that I've tried so much stuff out, I should be able to answer these. |
Hey @MukuFlash03; |
Added the hardcoded build-kit alpine image with the EVEREST_VERSION = 2024.3.0 and matching SHA256 digest to the previous latest tag version which has now been moved to untagged (7494bd6624aee3f882b4f1edbc589879e1d6d0ccc2c58f3f5c87ac1838ccd1de) Found this image by going through the logs in the last successful run (in my forked repo) and looking at the FROM layer in the Docker build step. fa60246 Image link: https://github.com/everest/everest-ci/pkgs/container/build-kit-alpine/161694439 --- Expected: Should pass since it matches configurations of last successful run with matching EVEREST_VERSION and SHA256 digest for build-kit alpine base image. ---- Will add more details in PR comments. ---- Signed-off-by: Mahadik, Mukul Chandrakant <[email protected]>
e09e7d3
to
e80a625
Compare
@shankari identified the possible likely cause of the issue where the build is failing for the The problem might be the
Now, the latest version could be compatible with the latest Finding the correct tag was slightly tricky since there isn’t a way to know which all versions were tagged as “latest”. A. Going through list of tags here Current latest points to a version released on Aug 8th. But my changes were from Jun 3rd week, mid-July. I tried testing out different versions around that time. I did see versions have download numbers listed as well here. So maybe, the one with highest downloads should work. Result of workflow run: Failed as well with the same B. Inspecting the Workflow logs I inspected the last successful workflow runs and looked for the Docker build step hoping to find some more info other “latest” for the build-kit alpine image.
I did find the specific image finally:
The This image with the matching SHA256 digest 7494bd6 - has a high number of downloads = 4727, was released 9 months ago. Result of workflow run: The run passed successfully. |
The previous PR #62 commit history got messed up due to rebasing to handle the commits that had missing signoffs. Created a new PR to bring over the final changes from there.
The goal was to have automated tests run as a part of the CI/CD pipeline with GitHub Actions.
The commit history and detailed discussion can be seen in the previous PR #62