forked from ltscomputingllc/faersdbstats
-
Notifications
You must be signed in to change notification settings - Fork 4
Data Wrangling Notes
wolfderby edited this page Sep 23, 2022
·
1 revision
-
'04 to '05Q2 is missing REPORTER COUNTRY
-
'05Q3 to current day has extra $ delimiter in the data region of the file
-
to add a trailing data (not-header) column
- replace 21st $ w/ $$$
- then replace $$REPORTER_COUNTRY w/
$REPORTER_COUNTRY (since header had trailing $ ; just not the data section) - Null characters need replaced w/ space (data_5 step)
- found these lines had issues
- 5882361 6757695 I 5882361-X 20080801 20080913 PER US-ABBOTT-08P-163-0465 EXP US-JNJFOC-20080900824 CENTOCOR PHARMACOVIGILANCE F Y 20080915 CN UNITED STATES DEMO08Q3.txt
- 8129732 8401177 I 8129732-9 20120126 20120206 20120210 EXP JP-CUBIST- E2B0000000182 CUBIST PHARMACEUTICALS, INC. 85 YR M Y 20120210 12 DEMO12Q1.txt
- 5887630 6762437 I 5887630-5 20080817 20080818 20080918 PER US-BAYER-200831521NA BAYER HEALTHCARE PHARMACEUTICALS INC. F Y 20080918 MD UNITED STATES 08 6633114 35796472
Record isr 5025765 missing many columns