Join Tabular Data (CSV format)
|
QGIS joins are temporary relationships between layers, or between tabular data (in .dbf or .csv formats) and spatial data layers. In this exercise we discuss preparation of the tabular data and the join process.
JOIN IDs
Before considering any table joins, the most vital aspect to consider is the FIELD that contains values that will be able to associate one table to another. For example, if you have a spatial data layer of administrative areas and want to join to each area some numeric values that are measurements of certain variables (like population, income, percentage of unemployment, etc), it is essential to have a join field in both the source table and the target layer. Many users consider the placename to be an obvious field to join on, so that the row in the spatial data with the value = "Springfield" in the Placename field can be joined on the tabular data row that has a matching value "Springfield" in its NAME field. However, please be aware that spelling errors, duplicate use of the same name, capitalization, blank spaces, added unit types, or other variations in the exact spelling used in either table will cause the join to fail. Ideally, you want to use UNIQUE IDENTIFIERS rather than placenames, and you want to AVIOD any other values that are either (a) not unique, or (b) may contain variations of usage or mis-spellings. For geospatial data, there are many standardized encoding systems that we can rely on, such as FIPS Codes, or Chinese GBCODES, Japanese JCODES, and various other administrative coding systems.
If you find that existing geographic codes do not work for your data, create your own set of UNIQUE IDs and don't alter them. Persistence of IDENTIFIERS help to keep your data organized and consistent.
Sample Tabular Data
- Open the sample file /join_tables/TW_POP_TEST_DATA.csv in QGIS (with Encoding = BIG5)
- Right-click on the layer to Open Attribute Table
- Inspect the table contents to see if they look correct (readable characters, and no mixed data types in columns)
- Check our intended JOIN FIELD (COUN_CODE) to make sure it does NOT have NULL values or blank cells
- Close the Attribute Table and go to the menu PLUGINS | TABLE | TABLE MANAGER
- Make sure you know what the data TYPE is for the JOIN FIELD (in this case = String)
You may be tempted to open and view the .CSV table in EXCEL. Note, EXCEL may wreck the encoding. An all-around excellent tabular data & spreadsheet application is OpenOffice, which is freeware and runs on Windows, Linux and Mac. Other freeware branches of the same codebase, such as LibreOffice and NeoOffice are also excellent. Like QGIS, the OpenOffice programs enable the selection of a codepage, both at the time of OPENING the file, and at the time of SAVING the file.
Prepare Spatial Data Layer for Join
- Open the sample file /china/TW_POP_GEOM in QGIS (with Encoding = BIG5)
- Right-click on the layer to Open Attribute Table
- Inspect the table contents (in this case, there is only one field containing a UNIQUE ID)
- As with the Source Data table, browse the target spatial layer JOIN FIELD (COUN_CODE) to make sure it does NOT have NULL values or blank cells
- Close the Attribute Table and go to the menu PLUGINS | TABLE | TABLE MANAGER
- Make sure you know what the data TYPE is for the JOIN FIELD (in this case = String)
- If the Field Types look okay, close your Table Manager and Attribute Table windows
Join Operation
- The TARGET layer is the TW_POP_GEOM which currently only has the COUN_CODE field
- The SOURCE DATA file is the TW_POP_DATA_TEST.csv layer, which contains attributes that we want to join to the spatial objects
- Inspect the table contents to see if they look correct (readable characters, and no mixed data types in columns)
- Click on the TARGET Layer (TW_POP_GEOM) to make it active, then right-click on that layer and select PROPERTIES
- In the Properties window, click on the JOIN tab, then the PLUS Sign button to ADD a JOIN
- Since we only have one other possible layer open, the drop-down will show our desired Join Layer: TW_POP_DATA_TEST
- QGIS will make a guess about what the JOIN FIELD is (though in some cases you may need to select the correct JOIN FIELD name from the Source Data or the TARGET FIELD name from the Target Layer.
- Click OK to accept the settings for the join, then OK to create the JOIN
Check Join Results
- Right-click on the Target Layer (TW_POP_GEOM) and Open Attribute Table
- The table should now contain all the joined attributes
- Close the Attribute table and right-click on the Target Layer and SAVE AS
- Select Format = ESRI Shapefile, Save As = TW_JOIN, Encoding = BIG5, CRS = Original CRS, then click OK
- Click OK when you see export to vector has been completed
- Click on ADD VECTOR DATA and add the new layer called TW_JOIN, with encoding = BIG5
- Right-click on the TW_JOIN layer to view the Attribute Table
- You should be able to see the joined tables in your new layer
- You will notice that the COUN_CODE Field is duplicated, one of them with NULL values. If you want to delete that FIELD called COUN_COD_1, go to the menu PLUGINS | TABLE | TABLE MANAGER and DELETE it. Accept the changes and you should have a cleanly joined table!
|