-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathpapers.txt
16 lines (16 loc) · 20 KB
/
papers.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
the relationship between housing costs and ness has important implications for the way that city and county governments respond to increasing populations. though many analyses in the public policy literature have examined inter-community variation in ness rates to identify causal mechanisms of ness (byrne et al., 2013; lee et al., 2003; fargo et al., 2013), few studies have examined time-varying counts within the same community (mccandless et al., 2016). to examine trends in population counts in the 25 largest u.s. metropolitan areas, we develop a dynamic bayesian hierarchical model for time-varying count data. particular care is given to modeling uncertainty in the count generating and measurement processes, and a critical distinction is made between the counted number of and the true size of the population. for each metro under study, we investigate the relationship between increases in the zillow rent index and increases in the population. sensitivity of inference to potential improvements in the accuracy of point-in-time counts is explored, and evidence is presented that the inferred increase in the rate of ness from 2011-2016 depends on prior beliefs about the accuracy of counts. a main finding of the study is that the relationship between ness and rental costs is strongest in new york, los angeles, washington, d.c., and seattle.
over the past two decades, a variety of methods have been used to count the in large metropolitan areas. in this paper, we report on an effort to count the in los angeles county, one that employed the sampling of census tracts. a number of complications are discussed, including\^{e} the need to impute counts to areas of \^{e}the county\^{e} not sampled. we conclude that, despite their imperfections, estimated counts provided useful and credible information to the stakeholders involved.
new york city faces the challenge of an ever-increasing population with almost 60,000 people currently living in city shelters. in 2015, approximately 25% of families stayed longer than 9 months in a shelter, and 17% of families with children that exited a shelter returned to the shelter system within 30 days of leaving. this suggests that "long-term" shelter residents and those that re-enter shelters contribute significantly to the rise of the population living in city shelters and indicate systemic challenges to finding adequate permanent housing. women in need (win) is a non-profit agency that provides shelter to almost 10,000 women and children (10% of all families of nyc), and is the largest shelter provider in the city. this paper focuses on our preliminary work with win to understand the factors that affect the rate of readmission of families at win shelters, and to predict the likelihood of re-entry into the shelter system on exit. these insights will enable improved service delivery and operational efficiencies at these shelters. this paper describes our recent efforts to integrate win datasets with city records to create a unified, comprehensive database of the population being served by win shelters. a preliminary classification model is developed to predict the odds of readmission and length of shelter stay based on the demographic and socioeconomic characteristics of the population served by win. this work is intended to form the basis for establishing a network of "smart shelters" through the use of data science and data technologies.
this paper presents healer, a software agent that recommends sequential intervention plans for use by shelters, who organize these interventions to raise awareness about hiv among youth. healer's sequential plans (built using knowledge of social networks of youth) choose intervention participants strategically to maximize influence spread, while reasoning about uncertainties in the network. while previous work presents influence maximizing techniques to choose intervention participants, they do not address three real-world issues: (i) they completely fail to scale up to real-world sizes; (ii) they do not handle deviations in execution of intervention plans; (iii) constructing real-world social networks is an expensive process. healer handles these issues via four major contributions: (i) healer casts this influence maximization problem as a pomdp and solves it using a novel planner which scales up to previously unsolvable real-world sizes; (ii) healer allows shelter officials to modify its recommendations, and updates its future plans in a deviation-tolerant manner; (iii) healer constructs social networks of youth at low cost, using a facebook application. finally, (iv) we show hardness results for the problem that healer solves. healer will be deployed in the real world in early spring 2016 and is currently undergoing testing at a shelter.
in many metropolitan areas efforts are made to count the to ensure proper provision of social services. some areas are very large, which makes spatial sampling a viable alternative to an enumeration of the entire terrain. counts are observed in sampled regions but must be imputed in unvisited areas. along with the imputation process, the costs of underestimating and overestimating may be different. for example, if precise estimation in areas with large c ounts is critical, then underestimation should be penalized more than overestimation in the loss function. we analyze data from the 2004--2005 los angeles county study using an augmentation of l_1 stochastic gradient boosting that can weight overestimates and underestimates asymmetrically. we discuss our choice to utilize stochastic gradient boosting over other function estimation procedures. in-sample fitted and out-of-sample imputed values, as well as relationships between the response and predictors, are analyzed for various cost functions. practical usage and policy implications of these results are discussed briefly.
objective. to pilot test an artificial intelligence (ai) algorithm that selects peer change agents (pca) to disseminate hiv testing messaging in a population of youth. methods. we recruited and assessed 62 youth at baseline, 1 month (n = 48), and 3 months (n = 38). a facebook app collected preliminary social network data. eleven pcas selected by ai attended a 1-day training and 7 weekly booster sessions. mixed-effects models with random effects were used to assess change over time. results. significant change over time was observed in past 6-month hiv testing (57.9%, 82.4%, 76.3%; p < .05) but not condom use (63.9%, 65.7%, 65.8%). most youth reported speaking to a pca about hiv prevention (72.0% at 1 month, 61.5% at 3 months). conclusions. ai is a promising avenue for implementing pca models for youth. increasing rates of regular hiv testing is critical to hiv prevention and linking youth to treatment.
the sampling frame in most social science surveys excludes members of certain groups, known as hard-to-reach groups. these groups, or subpopulations, may be difficult to access (the , e.g.), camouflaged by stigma (individuals with hivaids), or both (commercial sex workers). even basic demographic information about these groups is typically unknown, especially in many developing nations. we present statistical models which leverage social network structure to estimate demographic characteristics of these subpopulations using aggregated relational data (ard), or questions of the form "how many x's do you know?" unlike other network-based techniques for reaching these groups, ard require no special sampling strategy and are easily incorporated into standard surveys. ard also do not require respondents to reveal their own group membership. we propose a bayesian hierarchical model for estimating the demographic characteristics of hard-to-reach groups, or latent demographic profiles, using ard. we propose two estimation techniques. first, we propose a markov-chain monte carlo algorithm for existing data or cases where the full posterior distribution is of interest. for cases when new data can be collected, we propose guidelines and, based on these guidelines, propose a simple estimate motivated by a missing data approach. using data from mccarty et al. [human organization 60 (2001) 28-39], we estimate the age and gender profiles of six hard-to-reach groups, such as individuals who have hiv, women who were raped, and persons. we also evaluate our simple estimates using simulation studies.
social and behavioral interventions are a critical tool for governments and communities to tackle deep-rooted societal challenges such as ness, disease, and poverty. however, real-world interventions are almost always plagued by limited resources and limited data, which creates a computational challenge: how can we use algorithmic techniques to enhance the targeting and delivery of social and behavioral interventions? the goal of my thesis is to provide a unified study of such questions, collectively considered under the name "algorithmic social intervention". this proposal introduces algorithmic social intervention as a distinct area with characteristic technical challenges, presents my published research in the context of these challenges, and outlines open problems for future work. a common technical theme is decision making under uncertainty: how can we find actions which will impact a social system in desirable ways under limitations of knowledge and resources? the primary application area for my work thus far is public health, e.g. hiv or tuberculosis prevention. for instance, i have developed a series of algorithms which optimize social network interventions for hiv prevention. two of these algorithms have been pilot-tested in collaboration with la-area service providers for youth, with preliminary results showing substantial improvement over status-quo approaches. my work also spans other topics in infectious disease prevention and underlying algorithmic questions in robust and risk-aware submodular optimization.
respondent-driven sampling (rds) is a chain-referral method for sampling members of a hidden or hard-to-reach population such as sex workers, people, or drug users via their social network. most methodological work on rds has focused on inference of population means under the assumption that subjects' network degree determines their probability of being sampled. criticism of existing estimators is usually focused on missing data: the underlying network is only partially observed, so it is difficult to determine correct sampling probabilities. in this paper, we show that data collected in ordinary rds studies contain information about the structure of the respondents' social network. we construct a continuous-time model of rds recruitment that incorporates the time series of recruitment events, the pattern of coupon use, and the network degrees of sampled subjects. together, the observed data and the recruitment model place a well-defined probability distribution on the recruitment-induced subgraph of respondents. we show that this distribution can be interpreted as an exponential random graph model and develop a computationally efficient method for estimating the hidden graph. we validate the method using simulated data and apply the technique to an rds study of injection drug users in st. petersburg, russia.
statistics are bleak for youth aging out of the united states foster care system. they are often left with few resources, are likely to experience ness, and are at increased risk of incarceration and exploitation. the think of us platform is a service for foster youth and their advocates to create personalized goals and access curated content specific to aging out of the foster care system. in this paper, we propose the use of a machine learning algorithm within the think of us platform to better serve youth transitioning to life outside of foster care. the algorithm collects and collates publicly available figures and data to inform caseworkers and other mentors chosen by the youth on how to best assist foster youth. it can then provide valuable resources for the youth and their advocates targeted directly towards their specific needs. finally, we examine machine learning as a support system and aid for caseworkers to buttress and protect vulnerable young adults during their transition to adulthood.
in organizational and commercial settings, people often have clear roles and workflows against which functional and non-functional requirements can be extracted. however, in more social settings, such as platforms for enhancing social interaction, successful applications are driven more by using emotional engagement than functionality, and the drivers of user engagement are difficult to identify. a key challenge is to understand people's emotional goals so that they can be incorporated into the design. this paper proposes a novel framework called the emotional attachment framework, which is based on existing models and theories of emotional attachment. its aim is to facilitate the process of getting a deeper insight into emotional goals in software engineering. to demonstrate the framework in use, emotional goals are elicited for a software application that aims to provide help for people. to measure the effectiveness and efficiency of the proposed technique in this study, a series of evaluations are undertaken: a semi-controlled experiment, a comparison analysis, and domain expert and end-user evaluation. the results indicate that the emotional attachment framework has the potential to give better insight during analysis of emotional goals.
this research model uses an emancipatory approach to address challenges of equity in the science, technology, engineering, and math (stem) workforce. serious concerns about low minority participation call for a rigorous evaluation of new pedagogical methods that effectively prepares underrepresented groups for the increasingly digital world. the inability to achieve stem workforce diversity goals is attributed to the failure of the academic pipeline to maintain a steady flow of underrepresented minority students. formal curriculum frequently results in under-preparedness and a professional practices gap. exacerbating lower performance are fragile communities where issues such as poverty, single-parent homes, incarceration, abuse, and ness disengage residents. since data shows that more minorities have computing and engineering degrees than work in the field, this discussions explores how educational institutions can critically examine social and political realities that impede stem diversity while capturing cultural cues that identify personal barriers amongst underrepresented groups.
f\'elix-medina and thompson (2004) proposed a variant of link-tracing sampling to estimate the size of a hidden population such as drug users, sexual workers or people. in their variant a sampling frame of sites where the members of the population tend to gather is constructed. the frame is not assumed to cover the whole population, but only a portion of it. a simple random sample of sites is selected; the people in the sampled sites are identified and are asked to name other members of the population which are added to the sample. those authors proposed maximum likelihood estimators of the population size which derived from a multinomial model for the numbers of people found in the sampled sites and a model that considers that the probability that a person is named by any element in a particular sampled site (link-probability) does not depend on the named person, that is, that the probabilities are homogeneous. later, f\'elix-medina et al. (2015) proposed unconditional and conditional maximum likelihood estimators of the population size which derived from a model that takes into account the heterogeneity of the link-probabilities. in this work we consider this sampling design and set conditions for a general model for the link-probabilities that guarantee the consistency and asymptotic normality of the estimators of the population size and of the estimators of the parameters of the model for the link-probabilities. in particular we showed that both the unconditional and conditional maximum likelihood estimators of the population size are consistent and have asymptotic normal distributions which are different from each other.
we propose a general semi-supervised inference framework focused on the estimation of the population mean. as usual in semi-supervised settings, there exists an unlabeled sample of covariate vectors and a labeled sample consisting of covariate vectors along with real-valued responses ("labels"). otherwise, the formulation is "assumption-lean" in that no major conditions are imposed on the statistical or functional form of the data. we consider both the ideal semi-supervised setting where infinitely many unlabeled samples are available, as well as the ordinary semi-supervised setting in which only a finite number of unlabeled samples is available. estimators are proposed along with corresponding confidence intervals for the population mean. theoretical analysis on both the asymptotic distribution and \ell_2-risk for the proposed procedures are given. surprisingly, the proposed estimators, based on a simple form of the least squares method, outperform the ordinary sample mean. the simple, transparent form of the estimator lends confidence to the perception that its asymptotic improvement over the ordinary sample mean also nearly holds even for moderate size samples. the method is further extended to a nonparametric setting, in which the oracle rate can be achieved asymptotically. the proposed estimators are further illustrated by simulation studies and a real data example involving estimation of the population.
we study a regression problem where for some part of the data we observe both the label variable (y) and the predictors ({\bf x}), while for other part of the data only the predictors are given. such a problem arises, for example, when observations of the label variable are costly and may require a skilled human agent. if the conditional expectation e[y | {\bf x}] is exactly linear in {\bf x} then typically the additional observations of the {\bf x}'s do not contain useful information, but otherwise the unlabeled data can be informative. in this case, our aim is at constructing the best linear predictor. we suggest improved alternative estimates to the naive standard procedures that depend only on the labeled data. our estimation method can be easily implemented and has simply described asymptotic properties. the new estimates asymptotically dominate the usual standard procedures under certain non-linearity condition of e[y | {\bf x}]; otherwise, they are asymptotically equivalent. the performance of the new estimator for small sample size is investigated in an extensive simulation study. a real data example of inferring population is used to illustrate the new methodology.
while reigning models of diffusion have privileged the structure of a given social network as the key to informational exchange, real human interactions do not appear to take place on a single graph of connections. using data collected from a pilot study of the spread of hiv awareness in social networks of youth, we show that health information did not diffuse in the field according to the processes outlined by dominant models. since physical network diffusion scenarios often diverge from their more well-studied counterparts on digital networks, we propose an alternative activation jump model (ajm) that describes information diffusion on physical networks from a multi-agent team perspective. our model exhibits two main differentiating features from leading cascade and threshold models of influence spread: 1) the structural composition of a seed set team impacts each individual node's influencing behavior, and 2) an influencing node may spread information to non-neighbors. we show that the ajm significantly outperforms existing models in its fit to the observed node-level influence data on the youth networks. we then prove theoretical results, showing that the ajm exhibits many well-behaved properties shared by dominant models. our results suggest that the ajm presents a flexible and more accurate model of network diffusion that may better inform influence maximization in the field.