UTArlingtonX: LINK5.10x Data, Analytics, and Learning or #DALMOOC (Week 7) – Part I

If I was a Text Mining Expert …

I would work on a model that could predict not only spelling and grammar mistakes but that could help investigating why these mistakes were made. And I would ban the words „mistake“ & „error“ from this model because defining something as such has drastic consequences: the assumption that there is only (a predefined) right or wrong. (Whereas I have to admit that I am still looking for an adequate substitute). Questioning why is made so much harder – because obviously simply choosing the right answer by a right click in the spelling check software saves a lot of time.

While this might appear to be an unsolvable quest at first sight, I think there are numerous patterns that could help to develop such a model. This text is written in English. But I am not a native speaker. I do speak 3.5 languages. Starting a new one is always a special challenge. Because the more languages you learn, the more context exists to frame the new language. This can be good. You know, which words you have to learn first for daily conversations, which grammar structures are particularly useful. You know where to start.

But one thing always stays the same: I am trying to find out WHY I make certain mistakes. This goes beyond simply knowing THAT I made a mistake. It is to find out WHERE I got the structure or word from I am not applying adequatly. It is to understand that the mistake could possibly be correct in another language, make the connection to the current language and store this connection.

(This is how it works for me. And I think that others can benefit from this as well. I am not a professional linguist so this is not grounded on any well-established theory or such. I guess there might be several research undermining or supporting this idea.)

Bist du das? ≠ = Are you it?

Literal interpretation is one example. While the English native would understand at least what „Are you it?“ is supposed to mean („Is it you?“) – does this mean that the translation is wrong? Yes – says the English teacher. No – I say.  I speak both languages and thus can make sense out of it and know what the other person wanted to ask. Isn’t that weird? At first it seemed so plain that the question „Are you it?“ is wrong – but why would some people then understand it? Because of the similar background/ context (of language) they have. Because they understand WHY someone translated the question this way.

We have so much data sources to choose from for translations. Why can’t we make the spelling check process more individual? One could implement settings to choose native language and other languages learned so far. When a mistake is detected, a model could be applied to detect if this is a simple typo or a systematic error that can / can not be connected to another language. Imagine the potential of evaluating writing patterns (e.g. as already available for messaging in Android systems: the system is guessing your next word): You could get a summary of frequently occurring mistakes and working on these in future.

Until my breakthrough with my „why-you-made-this-mistake-model“

Could I ask you for a favor? If you know someone in your social environment who is learning a new language and he/she is asking for a word: Please don’t just translate the word into a language that is easier for him/her! Try to explain the word in the language he/she is learning and give as much context as possible. It helps a lot. I know, it’s not always easier. Because the faster way is a simple translation (like the right-click in your spell-check). But on the long-run, putting language in context is worth the additional expenditure of time.