Economic Base Analysis using Python
Planning economist Richard Klosterman once said, Information about an area’s future population is incomplete without a parallel understanding of the local economy that largely shapes its future.
There are a few basic methods that are widely taught and used in planning and economic geography, and Economic Base Analysis is the oldest, simplest and most widely used technique for regional economic analysis. The Economic Base Analysis uses the assumption that The economy of a nation/region can only grow when they export their goods and services. The local economy consists of the Basic (export-oriented) and Non-Basic (non-export-oriented) sectors.
The Economic Base Analysis then calculates the location quotient (LQ), which compares a local economy’s share in a specific sector with that of a reference region which is larger than the study area (e.g. county vs. state, region vs. state, state vs.nation, etc.). When the LQ < 1, there is no excess production in the local economy. When the LQ > 1, the excess employment is allocated to the export sector. This employment is regarded as the basic sector employment.
We will then create a small project to automate this calculation process to get the top base sector (industry) in each county in the United States.
def fips_prep():
"""get the county FIPS code ready"""
# a master CSV sheet for all county's FIPS code
fips_url = 'https://raw.githubusercontent.com/kjhealy/fips-codes/master/county_fips_master.csv'
# make sure to use the correct encoding to avoid error here
df = pd.read_csv(fips_url, encoding='ISO-8859-1')
# add a leading zero for certain states, for instance, California's FIPS code is 06
df['fips'] = df['fips'].apply(lambda x: str(x).zfill(5))
df = df[['fips', 'county_name', 'state_name']]
df['full_name'] = df['county_name'] + ', ' + df['state_name']
return df
Next, get the employment data for all industrial sectors. We have two options from the Census Bureau: the economic census and the county business pattern (CBP) data. The economic census data provides more accurate information, but the CBP data is published every year. We will use CBP data as an example.
def eba_data_prep():
# first, get the CBP county data as target geography
cbp20url = 'https://www2.census.gov/programs-surveys/cbp/datasets/2020/cbp20co.zip'
df = pd.read_csv(cbp20url)
# some data cleaning and relabeling work
df['fipstate'] = df['fipstate'].apply(lambda x: str(x).zfill(2))
df['fipscty'] = df['fipscty'].apply(lambda x: str(x).zfill(3))
df['fips'] = df['fipstate'] + df['fipscty']
df = df[['fips', 'naics', 'emp']]
df = df.rename(columns={'emp': 'emp_co'})
# remove all '-' or '/' from df['naics']
df['naics'] = df['naics'].str.replace('-', '')
df['naics'] = df['naics'].str.replace('/', '')
# df[digit] = digit of df['naics']
df['digit'] = df['naics'].str.len()
df.loc[df['digit'] == 0, 'naics'] = '0'
# get the CBP national data as the reference geography
cbp20us_url = 'https://www2.census.gov/programs-surveys/cbp/datasets/2020/cbp20us.zip'
df_us = pd.read_csv(cbp20us_url)
df_us = df_us.rename(columns={'emp': 'emp_us'})
df_us = df_us[df_us['lfo'] == '-']
df_us = df_us[['naics', 'emp_us']]
# get NAICS code description to make industry codes more readable
naics_desc_url = 'https://www2.census.gov/programs-surveys/cbp/technical-documentation/reference/naics-descriptions/naics2017.txt'
naics_desc = pd.read_csv(naics_desc_url, sep=",", encoding='cp1252')
naics_desc.columns = ['naics', 'description']
# join naics_desc into df, using naics as the key
df_us = df_us.merge(naics_desc, on='naics')
# remove all '-' or '/' from df['naics']
df_us['naics'] = df_us['naics'].str.replace('-', '')
df_us['naics'] = df_us['naics'].str.replace('/', '')
# df[digit] = digit of df['naics']
df_us['digit'] = df_us['naics'].str.len()
df_us.loc[df_us['digit'] == 0, 'naics'] = '0'
For some information about NAICS code, see here. Finally, let’s calculate the location quotient.
# turn string naics_digits into a int
naics_digits = int(naics_digits)
# use example of 06073 and 4-digit naics
df_co_wip = df[df['fips'] == county_fips]
# keep only 0 or 4 digit naics
df_co_wip = df_co_wip[(df_co_wip['digit'] == 0) | (df_co_wip['digit'] == naics_digits)]
df_us_wip = df_us[(df_us['digit'] == 0) | (df_us['digit'] == naics_digits)]
# drop digit columns
df_co_wip = df_co_wip.drop(columns=['digit'])
df_us_wip = df_us_wip.merge(df_co_wip, on='naics')
et_co = df_us_wip[df_us_wip['digit'] == 0]['emp_co'].sum()
et_us = df_us_wip[df_us_wip['digit'] == 0]['emp_us'].sum()
df_us_wip['perc_co'] = df_us_wip['emp_co'] / et_co
df_us_wip['perc_us'] = df_us_wip['emp_us'] / et_us
df_us_wip['lq'] = df_us_wip['perc_co'] / df_us_wip['perc_us']
df = df_us_wip[['naics', 'description', 'emp_co', 'lq']]
df = df.rename(columns={'emp_co': 'employment', 'lq': 'location_quotient'})
# return all rows with location_quotient > 1 and sort by location_quotient, keep only top 10
df = df[df['location_quotient'] > 1].sort_values(by='location_quotient', ascending=False).head(items_to_show)
return df
Let’s test it out. Use San Diego County (FIPS code = 06073) as example, we take the most granular NAICS code with 6-digits to the economic base analysis, and we would like to show the top 10 industries in San Diego County:
if __name__ == '__main__':
print(calc_lq('06073', '6', 10))
naics description employment location_quotient
172 333611 Turbine and Turbine Generator Set Units Manufa... 4146 13.991899
529 541714 Research and Development in Biotechnology (exc... 13892 11.813104
113 325413 In-Vitro Diagnostic Substance Manufacturing 2832 8.787377
307 424930 Flower, Nursery Stock, and Florists' Supplies ... 3791 7.933639
658 712130 Zoos and Botanical Gardens 3312 7.910214
219 336611 Ship Building and Repairing 7783 7.876611
111 325411 Medicinal and Botanical Manufacturing 2195 6.764516
182 334220 Radio and Television Broadcasting and Wireless... 3700 6.554643
216 336411 Aircraft Manufacturing 10162 5.744526
188 334417 Electronic Connector Manufacturing 1121 5.576313