Here I document my proposed project for the Data Incubator program. The plots below were generated using the notebook here.
Table of Contents
The number of newly founded establishments has been shown to correlate somewhat with the trends in the market indices.
Fig 1. Rate of new business creation by yearThe figure above shows that the number of young companies takes a noticeable downturn before and during both the 'Dot-com crash' in 2001 and the recession in late 2007. However, the drop in new businesses appears to decline even after the economy begins to recover.
A potentially better way to predict the recovery of the market after a crash is to use the rate of trademark filings. This has been discussed in the past (see this 2004 article in The New York Times). The rational is as follows:
- As the economy worsens, the expectation that a new business will be successful is much lower. Thus:
- Investors invest in fewer new businesses
- Individuals don't start new businesses and the number of trademarks filed declines
- As the economy improves, entrepreneurs and investors are more optimistic about starting new businesses
- Trademark filings increase in anticipation of the creation of said businesses (i.e. trademarks precede new businesses)
In the following figures it's obvious that the rate of trademark filing in the U.S. (yellow dashed) correlates more strongly with the market prices than rate of new businesses (blue dashed). For comparison, the number of private sector jobs (green dashed) is also shown, as it is typically used to measure the health of the economy.
Fig 2. Normalized filing rates, market prices, and newly established businesses between 1994-2015 (data from: data/preliminary_data.csv) Fig 3. Correlation plots between various markets (x-axis) and test measures (y-axis)It is clear that 'trademark filings' and 'private sector jobs' both correlate strongly with the market. This suggests that the rate of trademark filings could be an indicator of optimism/pessimism about the markets and thus predictive of market changes. However, monitoring trademarks has two distinct advantages:
- Trademark data is released on a daily basis, while job numbers are only available every month.
- Trademark numbers are far more robust, while Job numbers are often revised in subsequent months.
What's more, trademarks are typically assigned one or more classifications based on the intended uses. This allows a more fine grained approach to making industry-specific predictions. For instance, the following plot shows that the construction industry appears to have peaked more strongly before and recovered slightly slower than other industries after the recession in the late 2000's. This likely reflects the housing bubble and it's subsequent burst around this time.
Fig 4. Normalized count of trademarks filed by industry, data available since 2004Additionally, the agricultural industry has been slowing somewhat in recent years, potentially reflecting the recent trade tensions between the U.S. and China.
This has practical applications such as:
- identifying emerging industry changes (e.g. new technologies)
- predicting economic downturns/upturns by industry
- identifying saturation in a given industry (measure rate of rejected trademarks)
The goal of this project is to see if we can use the rate at which trademarks are filed as a market indicator.
Initial data pipeline
- Daily data downloader
- Read/parse the data into a database containing only the useful components
- Define and extract informative metrics based on the data
- Filing rate by industry
- Weighting by industry sub-classes
Determine the best approach to modeling industry changes
- Medium difficulty: Simple ML model
Dashboard for viewing predictions
- Change in the above metrics over time
- Present predictions from above modeling
Improvements should time permit
- Identify potentially connected industries (e.g. shared keywords in descriptions)
- Incorporate trademark rejection rates to identify industry saturation
- Showcase filing by region (city, state, country)
While the market data and the trademark filing rate appear to be correlated, it would be unrealistic to assume that filing data alone will provide perfect capability to predict major market fluctuations, let alone micro-fluctuations. The data is updated on a daily basis, so using it to predict changes on shorter timescales would be unreasonable.
Still, it is expected that some degree of predictability will be achievable using 'trademark filing rate'. This will be assessed by comparing how well the market fluctuations can be predicted under two conditions:
- Baseline: The changes over time in the market prices themselves can predict major market fluctuations.
- Hypothesis: Incorporating data on the rate of related trademark filings can improve the prediction of market changes by industry.
It should be noted that the raw data on trademark filings contains information on individuals involved with the filing. Care will be taken to reasonably ensure the personal details of individuals remain anonymous.
Data on trademark filing in the United States is made public by the United States Patent and Trademark Office (USPTO). Specifically, the USPTO provides access to the text, Nice classifications, and other filing data for trademark applications (historical and daily). A breakdown of the XML file structure and the pieces that are likely to be used in this project is shown here.
Because of the nature of this project, real-time market data is unnecessary. At best, daily market values should suffice to achieve the end goal. These can be obtained (freely) through the Alpha Vantage API. This API allows accessing open, close, high, and low price data as well as trading volume for a host of market symbols. A demonstration on how I plan to get this data is shown here.
One particularly interesting period of time surrounding trademark filing rates occurs in July 2019. During this time a rapid increase in the number of filings from foreign sources (particularly from China). This is most likely due to a policy change at the USPTO that starting August 3, 2019 all trademarks filed from abroad would have to be done through an attorney licensed in the US. Because this would infer a non-trivial increase in filing cost for foreign businesses, a surge of filings appears before this rule went into affect in order to get ahead of it. https://www.uspto.gov/trademark/laws-regulations/trademark-rule-requires-foreign-applicants-and-registrants-have-us