Filling missing time series values from records of the same day #608
Replies: 5 comments 10 replies
-
If you have this input
and run
|
Beta Was this translation helpful? Give feedback.
-
You could export a csv that contains all records with
And then you could run a join between And you can do the same for It's a fast example, it should work. I hope I was able to explain myself. I have renamed |
Beta Was this translation helpful? Give feedback.
-
Idea for an alternative solution: currently The steps would be:
With
and pseudo-code like
or equally with
would give
where as
would give
and
would give
@aborruso @johnkerl What do you think? ps- I understand the --up and --bidirectional might be a killer and need to read all data before doing anything... ? |
Beta Was this translation helpful? Give feedback.
-
@NikosAlexandris @aborruso @derekmahar see also #173 |
Beta Was this translation helpful? Give feedback.
-
@NikosAlexandris this is awesome!!! :D |
Beta Was this translation helpful? Give feedback.
-
From input to output
Input
Wanted output
Questions
This is about a large time series.
NULL
s (either*
or any other string like-
orNA
). When merging multiple files along with theunsparsify
I decided to go on with the default--fill-with
which is the empty string. Is there a replace-it-all method to convert allNA
s to empty strings?NULL
s im large/massige datasets?Details
Input data structure
Following a CSV structure composed by 86 columns
x
andy
coordinatesx
andy
coordinatesdate
,time
, day-of-year (doy
) and a day or night flag (daynight
)(x, y)
-- there are multiple recordsdate
+time
)NULL
s are either empty or represented withNA
Sample CSV
To exemplify, the following
sample.csv
holds 5 or less records for each(x, y)
location (columns1
and2
aftercut
ting). Though, it can be more.Final CSV
A large CSV file
final.csv
(~3.4GB) produced after merging multiple smaller suchsample_??.csv
files like so:(without removing any column) looks like
and
What to do?
Some variables do not expectedly change during the day. Therefore I would like to fill-in missing values of a variable's records for the same day by re-using values (copy or average) from records of other (physically identical) variables observed at any time during the same day.
To exemplify what I think makes sense to do, here is the tail from
sample_??.csv file
with only a few relevant variables and custom names:The above records:
(99, 7)
,(99, 8)
and(99, 9)
time
s:00:00:00
,01:30:00
,10:30:00
,10:55:50
,13:30:00
)e29
,e31
ande32
will never match in a record withle
le
observations over a location are expectedly the same during a dayHow to do?
When at least one
le
obsevation exists for a day (at any time), it should be reused to fill in the rest of the records during the same day, yet at different times. The results would then look like:Applying the same for the
ln
variable as well, the output will look like:Beta Was this translation helpful? Give feedback.
All reactions