Interweaving resources and competencies
In my last post about the #DALMOOC I described the course structure, its challenges and relected on the content. It is a nice structure but it does not contribute to readability. This time I will try a different approach. By pointing towards the three exclamation marks (in the figure below) as main themes of the post, I will interweave different resources, competencies and assignments.
The data cycle in data analytics

The data cycle is the main theme for week 2’s content of #DALMOOC. This data cycle (or data loop) consists of seven successive steps which are necessary to make learning data meaningful. Data Cleaning and integration can only take place after a potential data source has been identified and the data has been stored. The cleaning and integration part is especially important when it comes to combining different data sets as both need to be able to “communicate” with each other. The figure visualizes that the actual analysis of data is only the 5th step in the cycle – highlighting the importance of planning the analytics process. The action taken based on data relies mainly on the representation & visualization of it. By representing and visualizing them, data (or better: our analysis results) is “socialized” to ease the understanding of the analyzing results. [related to Competency 2.1: Describe the learning analytics data cycle]
No word on learning so far, but before diving deeper in the data cycle context I want to come back to my definition of learning analytics: “The field of Learning Analytics offers methods to analyse learners’ behavior in a learning environment and by this providing groundwork and significant potential for the improvement of learning environments, and individual learner’s feedback and learning outcomes” The bold parts have been added with regards to Shum, S. B., & Ferguson, R. (2012) [Social Learning Analytics. Educational Technology & Society, 15(3), 3-26]. I adapted my definition as the term “potential” points towards the possible future development of (S)LA and “learning outcomes” as it adds a significant part to what needs to be improved: the result of a learning process.
Shum & Ferguson also introduce the interesting concept of Social Learning Analytics (SLA). It is based on learning theories and pinpoints learning elements that are significant in a participatory online culture. Primarily, they acknowledge that learners are not learning alone but engaging in a social environment, where they can interact directly or their actions can be traced by others.
While this is a new context for thinking about analytics, we need to understand the possibilities offered by SLA, which are either social itself (e.g. social network analysis, discourse analytics) or can be socialized (e.g. social learning disposition analytics, social learning content analytics). Moreover the challenge of implementing these analytics is still present. Shum & Ferguson emphasize the impact of authentic learning from real-world context through the use of practical tools. Important features of a participatory online culture are the needs for a complementary open learning platform (“digital infrastructure”), the understanding of how open source tools can enable people to use the potential of these tools (“free and open”), the importance of SLA as a part of individual identity and credibility of skills (“aspirations across cultures have been shifting”), SLA as integral part of an employee’s toolkit (“innovation in complex, turbulent environments”) and analytics as a new form of trusted evidence (“role of educational institutions is changing”). Social learning adds a particular interest in the non-academic context.
What is left are still the challenges of powerful analytics: which measures do we use? Who is defining them? What do we want to measure? How do we define access rights? Are we focusing too much on by-products of online activity? How do we balance power?
Data cleaning
Ealier I was claiming that usability is not always what we need. Shum & Ferguson are stating “User-centered is not the same as learner-centered: what I want is not necessarily what I need, because my grasp of the material, and of myself as a learner, is incomplete”. When it comes to start working with a data set it is highly important that it is as complex and complete as possible. If not the data set itself, there is no other element in the cycle that has the potential to embody the complexity of the problem. It is the visualization that simplifies and “socialises” the data, not the data cleaning nor the integration and analysis.
That automatically calls for “quality data” [Siemens, G. (2013). Learning analytics: The emergence of a discipline. American Behavioral Scientist.]. Siemens introduces this idea (besides others) with a quote from P. W. Anderson “more is different”. In my opinion, this in one of the key elements of learning analytics and/or analytics in general: we are afraid of complex data because at a certain degree we are not able anymore to process them. Instead of refusing data sets based on complexity, good analytics can help us (as a “cognitive aid” according to Siemens) to process them and make sense of them. Because data will not stop existing – by refusing to handle them we make our lives easier but we might ignore potential sources to understand our lives better. We failed with classification systems so now time has come for big data and algorithm.
In addition to techniques of LA (as in Baker and Yacef 2009, adapted in my last post) Siemens mentions categories of LA applications. These are used for modeling user knowledge, behavior and experience. Thus, they can create profiles of users, model knowledge domains; perform trend analysis, personalization and adaptation. Siemens illustrates his point by giving the quote “A statistician may be more interested in creating probability models to identify student performance (technique), whereas a sociologist may be more interested in evaluating how social networks form based on technologies used in a course application.”
Data representation & visualization
There was a time when I had to work with SAP, data sets and visualization of these sets every day. And I am thankful for one lesson I learned during this time: Never change your data set to make it more illustrative. Keep the data set and base you visualization on it. That’s the reason I like Pivot tables so much. They have the potential to illustrate your analysis results, other users can adapt them and the data set will stay the same.
However, on has to keep in mind that analytic tools are programmed by others and to understand the way they work it is important to be familiar to the methods in use and how they are applied within such tools. During the DALMOOC we will work with different data analyzing tools. One of them is Tableau.
Tableau
[related to Competency 2.2: Download, install, and conduct basic analytics using Tableau software]
Tableau is a software that helps you in visualizing your data (based on business intelligence approaches). It has an impressive range of options for this purpose, which is (from my point of you) easy to understand and to apply. Data can be imported in different formats and besides common bar charts one is able to create “interactive” dashboards, an arrangement of visualizations that other user can adapt via filter and that can show you more details of the data-set on demand.
However, making data fit into Tableau can be challenging. The data sets have to be formatted in a certain way so the software can work with them. That was what I faced when going a little beyond #Assignment56 from the DALMOOC assignment bank “Finding Data”. [As the title says, find data sources that can you used in Tableau (you will need to watch the Tableau videos for this week in order to become familiar with this). Look for educational data on government, university, or NGO sites. Create a blog post (or post to edX forum) on these data sources.]
Coming from Germany, I did some research on the Open data movement there. The Bundeszentrale für politische Bildung BPB (Federal Agency for Civic Education) offers an Open Data Dossier, where they describe the idea of Open Data, case studies and current running projects. In their words, the idea of Open Data is to make them available and usable for free. In this respect it is important to know the potential of data and data journalism for sustainable democratic development. Yet, this platform does not offer data sets itself but refers to a general framework for such a platform and to local pilot projects.
In this context they refer to the term “Hyperlokaljournalismus” that can be seen as opposite towards the classic idea of “Lokaljournalism”, offering very specific and detailed news in a dynamic way. They can be adopted to the location of the user and thus concentrate on immediate surroundings.
Three examples of Open Data platforms are daten.berlin.de, offenedaten.de and the “Statistisches Landesamt Rheinland-Pfalz“. Formats and range of data differ on each platform, but the idea is related to the statements of BPB: offer data for free and available for everyone. Nevertheless, browsing bigger institutions for data sets, it was mostly the visualization and not the data set that was available. For the data set, sometimes you had to download a form, sign it and describe the purpose you want to use the data for and then send it via telefax or e-mail. Why should I do this, when I am just looking for a data set for a individual analysis? I see the point that data collection can be very demanding and a researcher wants to protect his/her work. But when will we finally accept, that Open Data can contribute to a collective researcher community that works on data together? How do you enable easy data-check-ups if I have to send you an e-mail beforehand? And regarding the different data-set formats and integrating them in a certain tool: How long will we still need to clean data so they will fit in our analyzing tool? Will there be a future without the need to edit data formats? I do hope so.
Action
Although action was not a major theme in the course, I find this a very important part of the data cycle one has to consider. Particularly the action depends on data visualization so it is crucial to know whom we are visualizing our analyzing results for and what their needs are. This can be seen in organizational levels as well (“micro-, meso- and macroanalytics layers” according to Siemens) where we can have top-down and bottom-up approaches and the application of different questions and viewpoints. This emphasizes the directions LA can take and how they need to adapt according to the interest group they are serving. The main interest group – the learner and his/her need – however, might not be forgotten in this context. [related to Competency 2.3: Evaluate the impact of policy and strategic planning on systems-level deployment of learning]
Coming back to Siemens, he describes the challenges of LA (besides the data quality) as privacy and centered on human and social processes. I had the chance to touch upon the privacy topic during the Bazaar assignment, where I was in a group with him by chance. This again shows me the value of interactive tools when they are used in a suitable way.
Privacy is also one major topic Duval writes about in Duval, E. (2011, February). Attention please!: learning analytics for visualization and recommendation. In Proceedings of the 1st International Conference on Learning Analytics and Knowledge (pp. 9-17). ACM. She describes machine readable traces for attention and raises the question what should be measured to understand how learning takes place. In regard of privacy she touches upon what elements should be tracked and if the user should be aware of this. She refers to the “Attention-Trust” principles of property, mobility, economy and transparency. The term “filter-bubble” is introduced, as connected to the idea that filtering per se can be a threat and can anticipate user choices and ideas. This is somehow related to Shum & Ferguson’s user-/learner-centeredness as it is always a question of who sets the filter and how do the filter work.
What is missing?
I would love to spend more time with working in Tableau and other tools in this course. But I fear I cannot cover this within the given timeframe. So I will focus on the tool matrix and complete it while using the tools for basics explorations, doing the readings and dealing with other course resources.