Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load attachments into the search index #3321

Closed
2 tasks
chouinar opened this issue Dec 19, 2024 · 0 comments · Fixed by #3467
Closed
2 tasks

Load attachments into the search index #3321

chouinar opened this issue Dec 19, 2024 · 0 comments · Fixed by #3467
Assignees

Comments

@chouinar
Copy link
Collaborator

chouinar commented Dec 19, 2024

Summary

This work should be behind an environment variable feature flag which defaults to NOT doing anything (locally we can enable it)

After we've setup the attachment pipeline in the prior ticket (#3320) we want to load attachments into the index.

We'll need to do the following:

  • For every attachment load it from S3
  • Base64 encode the attachment and set it as an attachments list on the opportunity JSON like so:
{
    "opportunity_id": 1,
    "opportunity_title": "my title",
    "summary" : {...},
     .. a bunch of other fields not included here for brevity,
    "attachments": [
    {
      "filename" : "ipsum.txt",
      "data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo="
    },
    {
      "filename" : "test.txt",
      "data" : "VGhpcyBpcyBhIHRlc3QK"
    }
  ]
}

To make a pipeline get used when uploaded a record, you need to specify pipeline="whatever_we_called_the_pipeline" when calling the self._client.bulk method inside of our search client (have pipeline be an optional field passed into the bulk method).

NOTE: We likely need to have infra modify the search cluster to be larger for this. We will need a large search index (disk, not CPU, maybe memory) as the data size will grow from ~1gb to 55gb+ when we do this as the attachments are about 55gb.

Acceptance criteria

  • Attachments loaded into search for each opportunity
  • Thorough testing
@chouinar chouinar moved this from Icebox to Todo in Simpler.Grants.gov Product Backlog Dec 19, 2024
@babebe babebe self-assigned this Jan 7, 2025
@babebe babebe moved this from Todo to In Progress in Simpler.Grants.gov Product Backlog Jan 8, 2025
@babebe babebe moved this from In Progress to In Review in Simpler.Grants.gov Product Backlog Jan 14, 2025
babebe added a commit that referenced this issue Jan 16, 2025
## Summary
Fixes [#{3321}](#3321)

### Time to review: __10 mins__

## Changes proposed
Add `ENABLE_OPPORTUNITY_ATTACHMENT_PIPELINE` feature flag
Use multi-attachment pipeline to bulk update opportunities into
openSearch
Add/Update tests: Check attachment data was encoded 
Ensure attachments are indexed properly 

## Context for reviewers


## Additional information
> Screenshots, GIF demos, code examples or output to help show the
changes working as expected.

---------

Co-authored-by: nava-platform-bot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging a pull request may close this issue.

2 participants