Improve Import from ODK #9792
Replies: 3 comments
-
|
I contacted Sebastian, who is leaving and has been working on this facility in R, though not, of course in R-Instat. Here is his reply. It mentions the relatively new expss package, that might also be useful for us more generally, for labelled data, producing tables and multiple responses. Hi Roger, Sure, I just uploaded the script to the Stats4SD github. Have a look at the comments and documentation at the beginning of the odkFormat function. The main function is odkFormat(): I’ve tested and improved the odkFormat and helper functions over the last two years and think it’s fairly robust by now, e.g. it handles ODK exports with and without group names attached and multiple choice variables. The variable and value labels come from the expss package and are attributes of the variables. I use the function ft(), also in the script, to get the value labels when running analysis. The labelled output data frame should generally be worked on with tidyverse functions as opposed to basic R functions since the latter don’t handle the attributes too well. For example, use dplyr::filter(data, age > 30) instead of data[data$age>30,] . The nice thing about having the value labels as attributes is that you can use the codes to filter: dplyr::filter(data, gender == 1), but the labels to display: table(ft(data$gender)) will display the table with names “Male”, “Female” instead of the codes. If the label attributes cause trouble in any bits of your code further on, you can always remove them with an unlab() call. The main piece of functionality that is lacking is to pull data and form directly from kobo or ona. At the moment you have to download the form and data file and read it in locally. I’ve tried this a few times but failed using the platform APIs in the right way. If you plan to do this, it might then make sense to use the json or xlm representation as opposed to the csv/excel files. Another nice to have would be to re-script it all into tidyverse syntax, although it might be easier to just leave it as it is. It’s not the most elegant code (a fairly clunky loop in the revarnames helper function), but then again I’ve tested it on at least a dozen of forms and data sets so it works quite well and I believe it does what it’s supposed to do. Let me know in case you have any questions. Happy to explain in a call or in the office. Cheers, Sebastian |
Beta Was this translation helpful? Give feedback.
-
|
Links to a conversation in #8564 |
Beta Was this translation helpful? Give feedback.
-
|
I checked from above and note that the Stats4sd github site link still works. This is a long way from the topics where I can give a useful contribution, except to note that we continue to plough on - very slowly - to improve the survey processing aspects in R-Instat. I presume ODK remains a popular way of importing (maybe complex?) survey-type data? If so, then maybe we should re-open this topic? I am not sure of the urgency, given the other tasks that are being raised. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Moved from here: #1728
I have made this into Version 0.4.1, but it could become 0.4.2 once we add that milestone.
Here is information from Sebastian on possible improvements.
He will be with me at a (climatic) workshop in mid-September. But I have already been asked for help on ODK in R-Instat by Lesotho Met. So it could be good to include some enhancements by then if possible. It relates to Stats4SD interests.
Here is his message, sent on 24 July 2017:
"Roger just told me you are working on adding functions to bring ODK collected data into R-Instat.
I sent the attached functions to read and label ODK data into R to Dani around a year ago, but the attached version is more robust.
The main function is odkFormat. It sets the variable format based on the XLSForm type and creates a labeled version of the data set and a list with all codes and labels.
Help functions: revarnames (brings the XLSForm into a format odkFormat can use) and readonadata (simply reads in the data).
At the moment, you need the XLSForm and the data in your working folder. The obvious next steps if you decide to use these functions would be:
• Pull the data and form from the ODK server (I understand this functionality is already there in R-Instat?)
• Use the codebook to label the data in R-Instat
These functions are probably not written or documented the way a programmer would do it, so feel welcome to ask me if you need any guidance using them.
Big advantage is, I have used and tested them on a number of data sets, so they run pretty robust."
https://github.com/africanmathsinitiative/R-Instat/files/1193510/readodk.zip
Beta Was this translation helpful? Give feedback.
All reactions