-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow null values to represent "unknown" or missing data #88
Comments
This will make it very hard to validate a model. E.g. how would the parser validate a deme that has a null start_time? |
Yeah, we'd need to choose the supported fields with care. The main point is that there can be useful semantics we might want to hang on |
I'll note one other thing for future us while I think of it: the Builder API in demes-python is probably not setup to permit useful semantics for |
I've been thinking about this a bit more today, and I agree with Jerome that it's a lot more helpful to be able to place As for which parameters in a model we allow to be unknown, I think we can make a reasonable choice along the lines of allowing Maybe we only allow this in the YAML? Or should we allow, e.g., an epoch population size to be Just to give a concrete example, this would be a valid model for Tracts (cc: @sgravel, do you agree?), which doesn't use population sizes: description: Tracts model with no population sizes set. A, B, and C contribute to admixed deme D.
time_units: generations
demes:
- name: A
epochs:
- start_size: null
end_time: 0
- name: B
epochs:
- start_size: null
end_time: 0
- name: C
epochs:
- start_size: null
end_time: 0
- name: D
ancestors: [A, B]
proportions: [0.8, 0.2]
start_time: 10
epochs:
- start_size: null
end_time: 0
pulses:
- time: 5
source: C
dest: D
proportion: 0.1 |
Yes, that is pretty much how I had coded the model, but with dummy placeholder values rather than null. So I can do without a null in the spec, but of course the explicit "null" is preferable for my purposes if only so that people don't confusedly go about assuming that the dummy variables are good enough for msprime simulations, of fussing over which value they should use. |
I'm definitely -1 on doing this for Demes 1.0. But if you want to do something Demes-like now, without waiting for the spec, you can use the import demes
def load_with_null_sizes(file):
data = demes.load_asdict(file)
demes = data.get("demes")
if demes is None:
raise ValueError("must specify one or more demes")
for deme in demes:
for k in ("start_size", "end_size", "size_function"):
if deme.get(k) is not None:
raise ValueError("start_size, end_size, and size_function must be omitted (or `null`)")
demes["start_size"] = 1
return demes.Graph.fromdict(data) When writing such a model out to a file, you'd have to bypass the demes API again. E.g. import demes
def dump_with_null_sizes(graph, file):
data = graph.asdict_simplified()
for deme in data["demes"]:
for k in ("start_size", "end_size", "size_function"):
if k in deme:
del deme[k]
with open(file) as f:
_dump_yaml_fromdict(data, f)
# This is the inverse of demes.load_asdict(), but it's not part of the public API.
# We could easily make it part of the API if it proves useful.
# https://github.com/popsim-consortium/demes-python/blob/8ec47962acb5d023b169dac0affa13f03122b8f0/demes/load_dump.py#L44
import ruamel.yaml
def _dump_yaml_fromdict(data, fp, multidoc=False) -> None:
"""
Dump data dict to a YAML file-like object.
:param bool multidoc: If True, output the YAML document start line ``---``,
and document end line ``...``, which indicate the beginning and end of
a YAML document respectively. The start indicator is needed when
outputing multiple YAML documents to a single file (or file stream).
The end indicator is not strictly needed, but may be desirable
depending on the underlying communication channel.
"""
with ruamel.yaml.YAML(typ="safe", output=fp) as yaml:
# Output flow style, but only for collections that consist only
# of scalars (i.e. the leaves in the document tree).
yaml.default_flow_style = None
# Don't emit obscure unicode, output "\Uxxxxxxxx" instead.
# Needed for string equality after round-tripping.
yaml.allow_unicode = False
# Keep dict insertion order, thank you very much!
yaml.sort_base_mapping_type_on_output = False
if multidoc:
yaml.explicit_start = True
yaml.explicit_end = True
yaml.dump(data) |
I'd also prefer to kick this one down the road a bit, as it's a change that we can incorporate in 1.1 pretty smoothly, and we really do want to ship the spec and paper ASAP. |
I'm not suggesting that we do this for version 1.0, but here's a counter-proposal for the semantics of null values to #76.
There will be situations in which key parameters are unknown, or unknowable by particular inference methods. Forcing them to specify a value would be artificial, and unhelpful. It would be better if there was an agreed-upon way of stating this, and using "null" values seems like as good a way as any.
This has some desirable properties, as simulators that don't support unknown values (if we add it in to a later version of the spec) SHOULD raise an error when an unexpected type (null) is provided for a value.
The text was updated successfully, but these errors were encountered: