Discussion: MD-collected data vs community-collected data #68

YoniSchirris · 2020-04-06T17:14:24Z

An issue that popped up since our repo is public, that anyone with the APK can upload data to the DB while putting in fake ground-truth values.

Currently, we have no way of controlling for this.

How can we have a sense of "authentication" without actual authentication?

Ideas:

Create a private APK for the version that we send to MDs
This private APK can write to a separate DB, or...
This private APK can add a flag in one of the JSON files that states that this is a "verified" measurement, meaning that when downloading / processing we can filter bij this flag

MalcolmMielle · 2020-04-06T18:16:54Z

I think it will be important later to be able to separate the data collected from the MD from other data if we end up releasing th ecollection app to more people.
I'm not sure which strategy would be best for this.

haggy · 2020-04-06T20:00:03Z

I've been thinking about this for an hour or so and I have a couple of questions.

How does the access pattern differ between MD and community sourced data? For example Are you going to want to do things like consume the two datasets completely independent of each other (to train independent models, for example)?
What does the onboarding process for clinical professionals look like?

I think the harder question to answer is "How to we restrict app packages based on who's using them?". If we're on-boarding clinical professionals manually, we can just have an obfuscated link for them (a link with ID/Hash values that are hard or impossible to just guess) and a "regular" link for anyone else (testers, random users that stumble on it, etc). We would then only share the obfuscated link with the pros.

The actual layout of the data in S3 is dependent on the answer to question 1 but we have a lot of flexibility. Without knowing the answer to question 1 I see two possible ways to isolate the incoming data streams:

A multi-bucket strategy:

covital-videos-pro-staging/year=<year>/month=<month>/day=<day>/survey-id=<id>/...
covital-videos-patient-staging/year=<year>/month=<month>/day=<day>/survey-id=<id>/...

Or a single bucket strategy:

covital-videos-staging/pro/year=<year>/month=<month>/day=<day>/survey-id=<id>/...
covital-videos-staging/patient/year=<year>/month=<month>/day=<day>/survey-id=<id>/...

I personally think a single bucket strategy is going to reduce complexity and increase flexibility so again without knowing the answer to question 1 I'd suggest that approach.

MalcolmMielle · 2020-04-06T23:49:40Z

We might want to train independent models. I think what we might end up doing, for example, is train on mixed dataset but then run the test only against the "pro" dataset to be sure we get our results on valid data.
We will distribute the apk to the MD by making available through a download link (the apk will be on a google drive).

Can we not simply request a signedURL for PRO and a different signedURL for others? for example in the openapi, we could simply send a bool true/false depending on who is using it and then use the single bucket strategy?

haggy · 2020-04-07T00:01:35Z

Can we not simply request a signedURL for PRO and a different signedURL for others

Yes at the API level that's fine but you still need to know at the app level what flag(s) to send on the API call to let the backend know what kind of user is inputting the data. I can easily add a role property to the API that defines something like clinical and patient but you're going to need an APK per "role" so that the app the MDs install know to pass a role of clinical instead of patient. Does that make sense?

kaeawc · 2020-04-07T01:51:02Z

We can definitely create and distribute multiple APKs with different versions and signatures. That's straightforward.

Another option could be that MDs must authenticate and other users are not required to. From talking with CJ it sounded like we were going with Auth0 to handle authentication - at that point we can ask MDs to contact us for validation before they're approved.

One of the questions that was raised was about API secrets - we can hide credentials, secrets, etc in things like Vault while still being completely open source. That's just one option, and maybe should be a separate discussion from this thread.

haggy · 2020-04-07T01:59:48Z

@kaeawc Authn is part of the "phase 2" (user-facing app) initiatives. There is additional work involved to get the authn flow right between all the apps and backend so it was decided that the data collection app would be "auth-less". Speed to launch (for the collection app) is critical because product is worried that doctors could soon be too overwhelmed to collect data using the app (which is understandable).

I think that having multiple APK's, one specifically for clinical pros and another for everyone else is good enough for the phase 1 collection app. If that sounds good then I can add a new property to the signed-upload API that allows the app to direct the signed uploads appropriately. Here is what Im thinking:

{
  "role": "clinical | patient"
}

Thoughts?

kaeawc · 2020-04-07T02:19:57Z

Ahk, gotcha. In that case the variants idea seems like the best option.

MalcolmMielle · 2020-04-07T07:19:44Z

Yeah the auth version sounds great but the variant is much more straight forward so let's go with that.

If we see a real use in having built that dataset and that the MD are really keen to help maybe we can have a v2 where the MDs have to auth and verify with us so that they can participate in building the dataset. But that would be for later and if the project worked.

Question: is it possible to "kill" the signed url upload for the app later? Let's say we want to collect data only for a certain time using this app, can we make it so it won't upload anymore when we decide we have enough data?

haggy · 2020-04-07T20:30:23Z

@MalcolmMielle we can disable the API(s) in various ways so that's not a problem.

OK so to summarize:

We will build/package multiple APKs and distribute them based on who is using them (one for clinical pros, another for regular users/patients)
The signed upload API will include a new field called role that the app will set depending on which build it is. Valid values are clinical and patient.

Do we all agree on that method?

@YoniSchirris Does this answer your question?

MalcolmMielle · 2020-04-07T20:42:14Z

Yes that sounds good! Although maye the role values should be "clinical" and "community" to disctinguish the two cases ?

haggy · 2020-04-07T21:04:59Z

Sure works for me 😄

YoniSchirris · 2020-04-08T06:48:51Z

Just to restate Malcolm's comment: Patients won't use this app. This is all for the data collection application. However, the data collected can be done by professional MDs, or by community members :)

The above sounds good. Thanks @haggy

MalcolmMielle · 2020-04-10T17:30:53Z

@haggy Is the role in the API already :)? I've pulled the latest version but I may be missing it from the doc.

haggy · 2020-04-10T23:25:25Z

@MalcolmMielle Not yet. Also Im going to change the property name from role to source. It fits its purpose much better. role also could conflict with future authorization functionality.

Property name: source
Values: community | clinical

I'm going to add that in tonight and I'll let you know when it's done.

haggy · 2020-04-10T23:56:52Z

New key schema ^^. Local tests are going well so Im going to get this pushed to the sandbox.

haggy · 2020-04-11T12:15:28Z

FYI @MalcolmMielle Im having some weird issues with the build pipeline. Working through those now.

MalcolmMielle · 2020-04-11T12:43:52Z

Ok, should I hold off updating the app yet? I'll wait until you tell me it's ok to update the upload func :). Thanks for the great job

haggy · 2020-04-11T14:06:27Z

Yea hold off for now. Waiting for GH to fix things on their side. Will post here when released.

I'm assuming the app is built against staging?

MalcolmMielle · 2020-04-11T14:12:47Z

I usually pull master, compile the dart library and push it to this repo to use in the app.

haggy · 2020-04-11T16:22:10Z

@MalcolmMielle GH has fixed the issues and CI is back online. Im testing on Sandbox and will deploy to staging soon.

haggy · 2020-04-11T16:28:04Z

@MalcolmMielle @YoniSchirris Released to staging!
https://guarded-crag-28391.herokuapp.com/api-docs/#/default/batch-signed-upload-req

Please test and let me know if you run into any issues :D

MalcolmMielle · 2020-04-11T16:32:21Z

Will do either tonight or tomorrow! :D

YoniSchirris assigned haggy and MalcolmMielle Apr 6, 2020

haggy mentioned this issue Apr 11, 2020

Add source to signed upload API #70

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: MD-collected data vs community-collected data #68

Discussion: MD-collected data vs community-collected data #68

YoniSchirris commented Apr 6, 2020

MalcolmMielle commented Apr 6, 2020 •

edited

Loading

haggy commented Apr 6, 2020 •

edited

Loading

MalcolmMielle commented Apr 6, 2020

haggy commented Apr 7, 2020

kaeawc commented Apr 7, 2020

haggy commented Apr 7, 2020

kaeawc commented Apr 7, 2020

MalcolmMielle commented Apr 7, 2020

haggy commented Apr 7, 2020

MalcolmMielle commented Apr 7, 2020

haggy commented Apr 7, 2020

YoniSchirris commented Apr 8, 2020

MalcolmMielle commented Apr 10, 2020

haggy commented Apr 10, 2020

haggy commented Apr 10, 2020

haggy commented Apr 11, 2020

MalcolmMielle commented Apr 11, 2020

haggy commented Apr 11, 2020 •

edited

Loading

MalcolmMielle commented Apr 11, 2020

haggy commented Apr 11, 2020

haggy commented Apr 11, 2020

MalcolmMielle commented Apr 11, 2020

Discussion: MD-collected data vs community-collected data #68

Discussion: MD-collected data vs community-collected data #68

Comments

YoniSchirris commented Apr 6, 2020

MalcolmMielle commented Apr 6, 2020 • edited Loading

haggy commented Apr 6, 2020 • edited Loading

MalcolmMielle commented Apr 6, 2020

haggy commented Apr 7, 2020

kaeawc commented Apr 7, 2020

haggy commented Apr 7, 2020

kaeawc commented Apr 7, 2020

MalcolmMielle commented Apr 7, 2020

haggy commented Apr 7, 2020

MalcolmMielle commented Apr 7, 2020

haggy commented Apr 7, 2020

YoniSchirris commented Apr 8, 2020

MalcolmMielle commented Apr 10, 2020

haggy commented Apr 10, 2020

haggy commented Apr 10, 2020

haggy commented Apr 11, 2020

MalcolmMielle commented Apr 11, 2020

haggy commented Apr 11, 2020 • edited Loading

MalcolmMielle commented Apr 11, 2020

haggy commented Apr 11, 2020

haggy commented Apr 11, 2020

MalcolmMielle commented Apr 11, 2020

MalcolmMielle commented Apr 6, 2020 •

edited

Loading

haggy commented Apr 6, 2020 •

edited

Loading

haggy commented Apr 11, 2020 •

edited

Loading