Interactive EDA is nice but customized interactive EDA is even nicer. To celebrate the new CRAN version of my ‘ExPanDaR’ package I prepare a customized variant of ‘ExPanD’ to explore the U.S. EPA data on fuel economy
Assume that you have some new data that you want to explore. The new CRAN version of the ‘ExPanDaR’ package helps by providing a (customized) R notebook containing all building blocks of an exploratory data analysis with a few clicks.
Install the Package and Start ExPanD First, you need to install the package. I recommend installing the Github development version of the package as it fixes some small bugs that I discovered directly after submitting to CRAN (sigh…).
The ‘ExPanDaR’ package offers a toolbox for interactive exploratory data analysis (EDA). Closing down for the year, I finally wrapped up a new version that now allows exploring all sorts of data interactively and generates notebooks containing the analysis all more or less with one line of code and a click. So here comes my little Christmas present for the wonderful RStats community!
As CRAN is taking its well-deserved Christmas break, you will need to install the new version from Github.
Online appendices detailing the robustness of empirical analyses are paramount but they never let readers explore all reasonable researcher degrees of freedom. Simonsohn, Simmons and Nelson suggest a ‘specification curve’ that allows readers to eyeball how a main coefficient of interest varies across a wide arrange of specifications. I build on this idea by making it interactive: A shiny-based web app enables readers to explore the robustness of findings in detail along the whole curve.
Last week, we started a new course titled “Statistical Programming and Open Science Methods”. It is being offered under the research program of TRR 266 “Accounting for Transparency” and enables students to conduct data-based research so that others can contribute and collaborate. This involves making research data and methods FAIR (findable, accessible, interoperable and reusable) and results reproducible. All the materials of the course are available on GitHub together with some notes in the README on how to use them for self-guided learning.
The ‘ExPanDaR’ package offers a toolbox for interactive exploratory data analysis (EDA). You can read more about it here. The ‘ExPanD’ shiny app allows you to customize your analysis to some extent but often you might want to continue and extend your analysis with additional models and visualizations that are not part of the ‘ExPanDaR’ package.
Thus, I am currently developing an option to export the ‘ExPanD’ data and analysis to an R Notebook.
Following up on a recent blog article that discussed how to use R to explore your researcher degrees of freedom, this post introduces a specification curve plot as suggested in Simonsohn, Simmons and Nelson. With this plot, you can eyeball how various researcher degrees of freedom affect your main outcome of interest.
In a recent post, I introduced my in-development R package ‘rdfanaylsis’ that provides a coding environment allowing researchers to specify their researcher degrees of freedom ex ante and to systematically explore their effects on their findings ex post.
Interactive EDA is nice but customized interactive EDA is even nicer. To celebrate the new CRAN version of my ‘ExPanDaR’ package I prepare a customized variant of ‘ExPanD’ to explore the U.S. EPA data on fuel economy. Our objective is to develop an interactive display that guides the reader on how to explore the fuel economy data in an intuitive way.
First, let’s load the packages and the data from EPA’s web page.
Overview Today, I used a shiny app to run a classroom experiment in the first class of my introductory cost accounting course. I uploaded code, data and materials to github so that everybody can reuse it to construct similar experiments and, of course, to replicate the results from our experiment.
The experiment tests whether cost allocation (variable cost or full cost) affects pricing decisions in a simple one product pricing setting.
I am an applied economist working in the area of accounting and corporate transparency. I work with observational data a lot, meaning with data that is already available and not under my control. Whenever I set sails to design a test, there are a lot of decisions to take: Which sample should I use? What is the appropriate time frame? How do I define my dependent and independent variables? What is the functional relation that I expect between dependent and independent variables?
Last week, the German NGO Open Knowledge Foundation Deutschland e.V. has made German Trade Resister data available via the project OffeneRegister.de, together with the British NGO opencorporates. In my last blog post I checked the general accessibility of the data. In this quick follow-up post I follow an idea inspired by a tweet by Johannes Filter to map the gender balance of German corporate officers.
Here is the code for generating the necessary data.