- TMDT: Taiwan Indigenous Peoples’ Multidimensional Table Data
- By Ji-Ping Lin/林季平製
- 資料庫網址:https://osf.io/a23ec/
- QRCode:
- 誌謝:本計畫"台灣原住民基礎開放研究資料庫" (TIPD, https://osf.io/e4rvz/)主要由原住民族委員會支援研究經費;國科會亦支持本計畫之相關計畫,計畫編號:MOST 106-2420-H-001-008-MY2; MOST 106-2420-H-156-001-MY2; MOST 109-2420-H-001-003; MOST 109-2420-H-156-001; NSTC 112-2410-H-001-075-MY2;我們亦感謝中研院支持部分研究經費。
Relevant projects/相關計畫:
- Lin, Ji-Ping, Ming-Cheng Lee, Hiu Ha Chong, Li-Chuan Liu, Kui Kasirisir, and HSIN-CHUNG WANG. 2024. “TIPD : Taiwan Indigenous Peoples Open Research Data 台灣原住民基礎開放研究資料庫.” OSF https://osf.io/e4rvz/. May 8. doi:10.17605/OSF.IO/E4RVZ.
- Lin, Ji-Ping. 2024. “TPDD : Taiwan Indigenous Peoples Population Dynamics Open Data 台灣原住民族人口動態開放資料.” OSF https://osf.io/ukjgs/. May 7. doi:10.17605/OSF.IO/UKJGS.
- Lin, Ji-Ping. 2024. “TMDT: Taiwan Indigenous Peoples’ Multidimensional Table Data 台灣原住民族多維表資料庫.” OSF https://osf.io/a23ec/ . May 7. doi:10.17605/OSF.IO/A23EC.
- Lin, Ji-Ping. 2024. “TICT: Taiwan Indigenous Peoples’ Contingency Tables 台灣原住民族基礎列聯表.” OSF https://osf.io/9nrwd/. May 7. doi:10.17605/OSF.IO/9NRWD.
- Lin, Ji-Ping. 2024. “TIHD: Taiwan Indigenous Peoples Household Structure Open Data 台灣原住民族家戶結構基礎開放資料庫.” OSF https://osf.io/zbwek. May 7. doi:10.17605/OSF.IO/ZBWEK.
- Lin, Ji-Ping. 2024. “TICD : Taiwan Indigenous Communities Open Data 台灣原住民族部落開放資料庫.” OSF https://osf.io/esw67/. May 7. doi:10.17605/OSF.IO/ESW67.
- Lin, Ji-Ping. 2024. “iqTICD : Integrated Query System of Taiwan Indigenous Community Open Data 台灣原住民族部落開放資料庫綜合查詢系統.” OSF https://osf.io/rfe6p/. May 7. doi:10.17605/OSF.IO/RFE6P.
- Lin, Ji-Ping. 2024. “TIHV : Taiwan Indigenous People’s High-Resolution Visualization of Population Distribution, Migration Dynamics, Traditional Communities by Ethnic Groups 當代台灣原住民高解析度視覺化圖形資料庫:人口分布、遷徙、傳統部落.” OSF https://osf.io/v8zk3/. May 8. doi:10.17605/OSF.IO/V8ZK3.
- Lin, Ji-Ping. 2024. “TIMD : Taiwan Indigenous Peoples Migration Dynamics 台灣原住民各族群遷徙動態.” OSF https://osf.io/6rpz9/. May 7. doi:10.17605/OSF.IO/6RPZ9.
介紹:
Legal and ethical issues are a top priority in academic research. Building open data is not a goal of the joint research program at its initial stage. However, one crucial question is: why does the joint research program bother to build TIPD? It all comes from concerns about privacy, confidentiality, legal, and ethical issues. Because only the PI is allowed to access the micro individual data sets and not all research team members are specialized in coping with complex issues of raw data and/or in conducting scientific computing, it becomes urgent to design a way that allows massive raw data sets to be processed and transformed to a set of data in a systematic and automated way. To overcome the aforementioned challenges and issues, I design a method by reorganizing information of complex source data into a simple multi-dimensional table to overcome ethic and legal issues as well as privacy protection. All computing tasks are conducted in a closed, supervised data lab of the government funding agency (1) to extract valuable info embedded in confidential micro data of the household registration system, and (2) to enrich extracted info through the processes of cleaning, cleansing, crunching, reorganizing, and reshaping the source data to produce a number of data sets that contain no individual info and thus can be opened to the public to promote open administrative data analytics study. To build TIPD big archival data, the research builds automated data processing procedures.
The transformed new sets of data must fit two criteria: first, they must preserve the main features and most information embedded in raw data; second, they must resolve privacy, confidentiality, and thus legal issues; third, they must comply with academic research ethical requirements. With such issues revolved, the built data sets fit the criteria of open data and thus can be utilized directly by the research team members. Because the built open TIPD data sets have been proven effective in promoting the efficiency of the joint research program, the author thus decided to open TIPD to the public, a ray of hope in promoting efficiency, collaboration, mutual trust, and transparency in Taiwan Indigenous Peoples studies. That is why “CopyLeft(L)” is highlighted as a main feature of TIPD. To create open data, the author decided to adopt “old-school” multi-dimensional tables (MDTs), as illustrated in Figure 1, as a simple but effective alternative to protect privacy while keeping the embedded information of source data sets nearly intact. The foundation of MDTs essentially resembles that of modern distributed storage/processing systems like Google’s, Apache Hadoop, etc..
The main reasons for adopting the MDT method are twofold. First, the learning curve of modern distributed file/storage/computing systems is very demanding for team members. Second, building TIPD does not require such a complex system. Fortunately, by reviewing the foundation of the aforementioned contemporary distributed file/storage/computing systems, the author found that their foundation resembles that of classical “old-school” multi-dimensional tables that have been used by, e.g., Statistics Canada and the US Census Bureau for a very long period. As a result, the research adopts conventional multi-dimensional tables as a means for “distributed data storage” and “centralized data integration”.
相關連結:
*台灣原住民基礎開放研究資料庫(TIPD)
*台灣原住民族部落開放資料庫(TICD)
*台灣原住民各族群遷徙動態(TIPsMigDynamics)
*當代台灣原住民高解析度視覺化圖形資料庫:人口分布、遷徙、傳統部落
*台灣原住民族人口動態開放資料(TPDD)
*台灣原住民族部落開放資料庫綜合查詢系統
*台灣原住民族基礎列聯表(TICT)
*台灣原住民族家戶結構基礎開放資料庫(TIHD)