Transforming 3-Dimensional Array Table from GTAP into a Matrix using Python

Peter Puszko
3 min readOct 25, 2020

Data is rarely in the format we need it in. It often requires considerable transformations — or as some scientists call it “massaging”. It is no mystery that data preparation represents a large portion of any project involving data. The following steps should at least simplify one type of operation: transforming a 3-dimensional array table into a matrix.

For the following exercise, we will be using data from GTAP9, a database gathered by the Global Trade Analysis Project at Purdue University. It consists of National input-output, trade, macroeconomic, and protection data tables assembled from the World Bank and IMF macroeconomic and Balance of Payments statistics, United Nations Commodity Trade Statistics (Comtrade) Database, and national statistical agencies.

The first step in any data transformation is always understanding the data we are working with. For this example, we will be using a table named VDFM, which consists of the domestic input-outputs for 140 regions. The table consists of 3 dimensions; dim1: the output sector, dim2: the input sector, dim3: the output/input country, and their associated value.

*Values displayed have been amended

When looking at the values, we notice that there are no “0” values. A closer look tells us that null relationships are not entered in the VDFM table. Depending on the computations we hope to do with this dataset, a non-square matrix could cause issues. We also notice that there are sectors in dim1 and dim2 that have the same name and font case as countries in dim3 — which will also need to be changed when building out the domestic input-output matrix.

Jumping into the transformation, we begin by importing our dataset as a pandas DataFrame:

VDFM = pd.read_excel(path + 'VDFM.xlsx')

Next, we ensure that all sector names are lowercase:

VDFM['dim1'] = VDFM['dim1'].str.lower()
VDFM['dim2'] = VDFM['dim2'].str.lower()

And all country names are uppercase:

VDFM['dim3'] = VDFM['dim3'].str.upper()

The following step consists in creating all combinations of “outputCountry_outputSector” and “inputCountry_inputSector” to be used as indexes and column names to guarantee we have a square matrix. To do so, we take the unique values from dim1, dim2, and dim3 and set them in a list:

VDFM_dim1 = VDFM['dim1'].unique()
VDFM_dim1.sort()
output_sector = VDFM_dim1.tolist()
# ---
VDFM_dim2 = VDFM['dim2'].unique()
VDFM_dim2.sort()
input_sector = VDFM_dim2.tolist()
# ---
VDFM_dim3 = VDFM['dim3'].unique()
VDFM_dim3.sort()
output_input_country = VDFM_dim3.tolist()

Once we print the results of our new lists, we notice that the input_sector has 3 additional sectors: ‘c’, ‘g’, and ‘i’. These additional sectors represent final demand, including consumption, government purchases, and investments. For the purpose of this example, we will be removing all three sources of final demand:

final_demand = {'c', 'g', 'i'}input_sector = [sector for sector in input_sector if sector not in final_demand]

Next, we create a list of “outputCountry_outputSector” and “inputCountry_inputSector”:

outputCountry_outputSector = []
for i in output_input_country:
for g in output_sector:
x = i + '-'+ g
outputCountry_outputSector.append(x)
#---
inputCountry_inputSector = []
for j in output_input_country:
for h in input_sector:
x = j + '-'+ h
inputCountry_inputSector.append(x)

In the following step, we reshape our 3-dimensional array table into a 2-dimensional array table to match the indexes and column names for the new DataFrame:

VDFM['ig'] = VDFM['dim3'] + "-" + VDFM['dim1']
VDFM['jh'] = VDFM['dim3'] + "-" + VDFM['dim2']
VDFM = VDFM[['ig', 'jh', 'Value']]VDFM = VDFM.sort_values(by=['ig', 'jh'])

We then create the blank Dataframe that will intake values from the 2-dimensional array table:

blank_VDFM_igjh = np.zeros((len(outputCountry_outputSector), len(inputCountry_inputSector)))VDFM_igjh = pd.DataFrame(data=blank_VDFM_igjh, index = outputCountry_outputSector, columns = inputCountry_inputSector)VDFM_igjh.apply(pd.to_numeric)

Once we have the DataFrame set, we can populate it with values from the array table:

for index, row in VDFM.iterrows():
VDFM_igjh.loc[row['ig'], row['jh']] = row['Value']

The last step consists of transforming the pandas DataFrame into a numpy matrix:

VDFM_mx = VDFM_igjh.to_numpy()

These simple steps will get your GTAP tables in a format that is easy to manipulate for further computations, including the Proportionality Assumption in Factor Content of Trade. Also, these steps can be modified to fit a plethora of different datasets that have similar characteristics as GTAP.

Note: When working with certain GTAP tables, you may need to remove final demand for certain computations, but this step is optional.

--

--