Other approaches Manual editing and conversion The data are exposed only after the second unnest.Īnd finally, the xml is converted to a tidy tabular format for further analysis, and we can use write_csv to export the tibble into csv. After the first unnest, each column is still a type of list with length of 1. Maybe because the values in the cells a in the form of “list in list”/“nested list”? Following are the results after lp_wider is passed through 1. I do not quite understand why this is required though. The data then has to be unnested two times. ) ) %>% # convert data type readr :: type_convert () ) ) %>% # 2nd time to nest the single list in each cell? unnest ( cols = names (. Lp_df = lp_wider %>% # 1st time unnest to release the 2-dimension list? unnest ( cols = names (. Therefore, only the parts wrapped inside the LPS is the “actual” data I need. TYPE_CODE, DIST_CODE and INFO_CODE are data dictionary to store the ID used to encode data. The tree structure of the XML is in the below form:ĭEPARTMENT, GENERATION_DATE and LINK are metadata. The XML data file looks like something below: Unluckily, the dataset is available in XML format only. Currently I need to find data about the restaurant licenses in Hong Kong, and FEHD provides an open dataset. However, sometimes the government discloses open data in XML format only. Tidyr (in tidyverse) provides functions unnest_wider and unnest_longer to transform XML data into dataframe quickly, using the same ideology of pivot_wider and pivot_longer in dplyr.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |