Skip to content

Data Wrangling Notes

wolfderby edited this page Sep 23, 2022 · 1 revision

LAERS (04-12Q3)

DEMO (legacy)

  • '04 to '05Q2 is missing REPORTER COUNTRY

  • '05Q3 to current day has extra $ delimiter in the data region of the file

  • to add a trailing data (not-header) column

    • replace 21st $ w/ $$$
    • then replace $$REPORTER_COUNTRY w/ $REPORTER_COUNTRY (since header had trailing $; just not the data section)
    • Null characters need replaced w/ space (data_5 step)

Imported yr and qtr as var char

  • found these lines had issues
    • 5882361 6757695 I 5882361-X 20080801 20080913 PER US-ABBOTT-08P-163-0465 EXP US-JNJFOC-20080900824 CENTOCOR PHARMACOVIGILANCE F Y 20080915 CN UNITED STATES DEMO08Q3.txt
    • 8129732 8401177 I 8129732-9 20120126 20120206 20120210 EXP JP-CUBIST- E2B0000000182 CUBIST PHARMACEUTICALS, INC. 85 YR M Y 20120210 12 DEMO12Q1.txt
    • 5887630 6762437 I 5887630-5 20080817 20080818 20080918 PER US-BAYER-200831521NA BAYER HEALTHCARE PHARMACEUTICALS INC. F Y 20080918 MD UNITED STATES 08 6633114 35796472

06 Q2 DEMO Import File issue

Record isr 5025765 missing many columns

FAERS (12Q4-current day)