Sep 25, 2022

Economic Base Analysis using Python

Planning economist Richard Klosterman once said, Information about an area’s future population is incomplete without a parallel understanding of the local economy that largely shapes its future.

There are a few basic methods that are widely taught and used in planning and economic geography, and Economic Base Analysis is the oldest, simplest and most widely used technique for regional economic analysis. The Economic Base Analysis uses the assumption that The economy of a nation/region can only grow when they export their goods and services. The local economy consists of the Basic (export-oriented) and Non-Basic (non-export-oriented) sectors.

The Economic Base Analysis then calculates the location quotient (LQ), which compares a local economy’s share in a specific sector with that of a reference region which is larger than the study area (e.g. county vs. state, region vs. state, state vs.nation, etc.). When the LQ < 1, there is no excess production in the local economy. When the LQ > 1, the excess employment is allocated to the export sector. This employment is regarded as the basic sector employment.

We will then create a small project to automate this calculation process to get the top base sector (industry) in each county in the United States.

def fips_prep():
    """get the county FIPS code ready"""

    # a master CSV sheet for all county's FIPS code
    fips_url = 'https://raw.githubusercontent.com/kjhealy/fips-codes/master/county_fips_master.csv'

    # make sure to use the correct encoding to avoid error here
    df = pd.read_csv(fips_url, encoding='ISO-8859-1')

    # add a leading zero for certain states, for instance, California's FIPS code is 06
    df['fips'] = df['fips'].apply(lambda x: str(x).zfill(5))

    df = df[['fips', 'county_name', 'state_name']]
    df['full_name'] = df['county_name'] + ', ' + df['state_name']
    return df

Next, get the employment data for all industrial sectors. We have two options from the Census Bureau: the economic census and the county business pattern (CBP) data. The economic census data provides more accurate information, but the CBP data is published every year. We will use CBP data as an example.

def eba_data_prep():
    # first, get the CBP county data as target geography
    cbp20url = 'https://www2.census.gov/programs-surveys/cbp/datasets/2020/cbp20co.zip'
    df = pd.read_csv(cbp20url)

    # some data cleaning and relabeling work
    df['fipstate'] = df['fipstate'].apply(lambda x: str(x).zfill(2))
    df['fipscty'] = df['fipscty'].apply(lambda x: str(x).zfill(3))
    df['fips'] = df['fipstate'] + df['fipscty']
    df = df[['fips', 'naics', 'emp']]
    df = df.rename(columns={'emp': 'emp_co'})

    # remove all '-' or '/' from df['naics']
    df['naics'] = df['naics'].str.replace('-', '')
    df['naics'] = df['naics'].str.replace('/', '')
    # df[digit] = digit of df['naics']
    df['digit'] = df['naics'].str.len()
    df.loc[df['digit'] == 0, 'naics'] = '0'

    # get the CBP national data as the reference geography
    cbp20us_url = 'https://www2.census.gov/programs-surveys/cbp/datasets/2020/cbp20us.zip'
    df_us = pd.read_csv(cbp20us_url)
    df_us = df_us.rename(columns={'emp': 'emp_us'})
    df_us = df_us[df_us['lfo'] == '-']
    df_us = df_us[['naics', 'emp_us']]

    # get NAICS code description to make industry codes more readable
    naics_desc_url = 'https://www2.census.gov/programs-surveys/cbp/technical-documentation/reference/naics-descriptions/naics2017.txt'
    naics_desc = pd.read_csv(naics_desc_url, sep=",", encoding='cp1252')
    naics_desc.columns = ['naics', 'description']
    # join naics_desc into df, using naics as the key
    df_us = df_us.merge(naics_desc, on='naics')

    # remove all '-' or '/' from df['naics']
    df_us['naics'] = df_us['naics'].str.replace('-', '')
    df_us['naics'] = df_us['naics'].str.replace('/', '')
    # df[digit] = digit of df['naics']
    df_us['digit'] = df_us['naics'].str.len()
    df_us.loc[df_us['digit'] == 0, 'naics'] = '0'

For some information about NAICS code, see here. Finally, let’s calculate the location quotient.

    # turn string naics_digits into a int
    naics_digits = int(naics_digits)

    # use example of 06073 and 4-digit naics
    df_co_wip = df[df['fips'] == county_fips]

    # keep only 0 or 4 digit naics
    df_co_wip = df_co_wip[(df_co_wip['digit'] == 0) | (df_co_wip['digit'] == naics_digits)]
    df_us_wip = df_us[(df_us['digit'] == 0) | (df_us['digit'] == naics_digits)]
    
    # drop digit columns
    df_co_wip = df_co_wip.drop(columns=['digit'])
    df_us_wip = df_us_wip.merge(df_co_wip, on='naics')

    et_co = df_us_wip[df_us_wip['digit'] == 0]['emp_co'].sum()
    et_us = df_us_wip[df_us_wip['digit'] == 0]['emp_us'].sum()
    df_us_wip['perc_co'] = df_us_wip['emp_co'] / et_co
    df_us_wip['perc_us'] = df_us_wip['emp_us'] / et_us
    df_us_wip['lq'] = df_us_wip['perc_co'] / df_us_wip['perc_us']

    df = df_us_wip[['naics', 'description', 'emp_co', 'lq']]
    df = df.rename(columns={'emp_co': 'employment', 'lq': 'location_quotient'})
    
    # return all rows with location_quotient > 1 and sort by location_quotient, keep only top 10
    df = df[df['location_quotient'] > 1].sort_values(by='location_quotient', ascending=False).head(items_to_show)
    return df

Let’s test it out. Use San Diego County (FIPS code = 06073) as example, we take the most granular NAICS code with 6-digits to the economic base analysis, and we would like to show the top 10 industries in San Diego County:

if __name__ == '__main__':
    print(calc_lq('06073', '6', 10))

      naics                                        description  employment  location_quotient
172  333611  Turbine and Turbine Generator Set Units Manufa...        4146          13.991899
529  541714  Research and Development in Biotechnology (exc...       13892          11.813104
113  325413        In-Vitro Diagnostic Substance Manufacturing        2832           8.787377
307  424930  Flower, Nursery Stock, and Florists' Supplies ...        3791           7.933639
658  712130                         Zoos and Botanical Gardens        3312           7.910214
219  336611                        Ship Building and Repairing        7783           7.876611
111  325411              Medicinal and Botanical Manufacturing        2195           6.764516
182  334220  Radio and Television Broadcasting and Wireless...        3700           6.554643
216  336411                             Aircraft Manufacturing       10162           5.744526
188  334417                 Electronic Connector Manufacturing        1121           5.576313