Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Support ORIV Strategy to Handle Measurement Error in Covariates and Dependent Variables #755

Open
s3alfisc opened this issue Dec 19, 2024 · 3 comments
Labels
feature A new feature for PyFixest

Comments

@s3alfisc
Copy link
Member

@dsliwka pointed me to this paper by Gillen et al on handling measurement error via IV: link. An ungated version of the paper can be found here.

To support the method via pyfixest, we would have to define a user-friendly API to run the following stacked instrumental variables regression

image

where $Y_a$ and $Y_b$ and $X_a$ and $X_b$ are two measurements of the same underlying construct.

One option would be to define an API as follows:

def oriv(*fmls, data, vcov=None, **options):
   # process data to get data_stacked and fml_stacked
   # then call pf.feols() to support all post-estimation procedures
   fit = pf.feols(fml = fml_stacked, data = data_stacled) 
   return fit

For more than one covariate with error, we would have to support multiple endogenous variables via GMM estimation, for which we should of course use gmm =) In this case, a larger update of the Feiv class would be required:

  • The FixestFormula class would have to be reworked to support multiple endogenous variables
  • The Feiv.fit() method would have to be adjusted to support GMM estimation.

Overall likely not a massive amount of work to support multiple endogeneous variables (?)

@s3alfisc s3alfisc added the feature A new feature for PyFixest label Dec 19, 2024
@dsliwka
Copy link
Contributor

dsliwka commented Jan 17, 2025

Hi @s3alfisc, thanks for taking this up! I tried a simple implementation using pyfixest just porting the Stata code from Gillen et al. (which also only covers the case where one variable is measured with error). This is illustrated in the following notebook: https://github.com/dsliwka/oriv/blob/main/ORIV.ipynb

@s3alfisc
Copy link
Member Author

Very cool Dirk, thank you! I will take a look later tonight, latest on Sunday. Thanks! =)

@s3alfisc
Copy link
Member Author

s3alfisc commented Jan 17, 2025

At a first quick glance, this looks really good! Would you open a PR to add the functionality?

I think that we could organize the PR in the following way:

  • we move the oriv function defined in the python notebook into a standalone folder pyfixest.estimation.oriv and name it _oriv
  • create a user facing oriv() function as a wrapper around _oriv in pyfixest.estimation.estimation
  • by adjusting estimation.init and init, we make oriv available to uses by calling pf.oriv()
  • the notebook is very nice and it would be great to use it as a vignette. to do so, you'd have to simply move it to the docs folder and add it to the learn more section in _quarto.yml
  • last, it would be fantastic to have some basic unit tests against stata. you could use the ccv tests against stata as an example - basically, the stata output is hard coded in the script

Two nice to have's would be to update the github readme and the changelog.qmd. And of course I'm happy to help out with everything along the way! =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new feature for PyFixest
Projects
None yet
Development

No branches or pull requests

2 participants