My goal is to create a dictionary app that will work well for American and African indigenous languages, since the existing dictionary programs are mainly designed for Indoeuropean languages. The languages I work with are agglutinative, so the root is found in the middle of the word with prefixes (in the case of Guarani) and suffixes (in the case of Quechua, Aymara and Guarani), so you need the ability to search in the middle of words or at the end of words. Because languages like Aymara have vowel elision and languages like Guarani have changing roots depending on person and number, it is often necessary to do searches with regular expressions, which is not easy to implement as an afterthought, so it is best to create a new app that is based on a database that supports regular expressions.
Another major problem with indigenous languages is that there often isn’t a universally accepted alphabet and there are competing writing systems. In Quechua, they can’t agree whether to write with 3 vowels or 5 vowels. In Aymara, they can’t agree whether to write with vowel elision or not. In Guarani, the Bolivians are using a totally different alphabet than the Paraguayans. On top of that, most people only know how to write in Spanish, so they sound out the native language based on a Spanish writing system, and finally you have a lot of regional variation in pronunciation and there hasn’t been hundreds of years of standardization like with most Indoeuropean languages so everyone agrees how to write a word, even if they pronounce it differently in their region. To illustrate the problem, take the Quechua word qillqay (to write). Someone who knows the Spanish writing system will write quelcay, because Spanish doesn’t have a postvelar K sound. People who believe in the 5 vowel alphabet will write it as qellqay. In some regions, its beginning Q is aspirated as qhillqay. Many people debate whether it it should be written as a single or double L (and some linguists believe that the single L comes from Spanish and didn’t exist in the original Quechua), so some write it as qillqay and others as qilqay. To deal with this madness, I want to be able to define a customizable sounds-like search function for each dictionary, where the user’s search string and all the content in the dictionaries is converted to a more basic set of sounds for matching. For example, it doesn’t matter if someone writes k (velar), kh (velar aspirated), k’ (velar glotallized), q (postvelar), qh (postvelar aspirated), q’ (postvelar glotallized), qu(ei) (Spanish velar) or c(aou) (Spanish velar), the dictionary app will reduce all of this down to a basic k sound and be able to search for that. It is sort of like a simplified soundex() function, but I want the transformation to the more basic representation to be definable for each dictionary, because each language has different confusable letters and dictionaries can use different writing systems.
I also want the app to run in all the major operating systems because realistically most people are going to run it in Windows and Android, but I want it to work in Linux, so I chose Qt, because it works everywhere.
Another thing that I want to include is a morphological analyzer for Quechua, Aymara and Guarani. If you take a root like yacha in Quecha or yati in Aymara (meaning “to know”), then you can create half a million words in each of those languages from one root word, but if you do a search for a word like yachachiqkunalla (yacha-chi-q-kuna-lla) in Quechua or yatichirinakaki (yati-chi-ri-naka-ki) in Aymara (meaning “only teachers”, from the morphemes: to know + to make + person who + plural + limitative), then you will never find the word in the dictionary. We have created basic finite state machines that can break words into their morphemes for the three languages, so I want to include them, so people can enter a word and then touch/click the root and each morpheme to find their definitions. In other words, if the user enters yachachiqkunalla (or the badly spelled yachacheccunala), then the dictionary will return “yacha + chi + q + kuna + lla” and the user can not only discover the proper spelling, but also understand the meaning of each morpheme in the language. You don’t understand why this is important, until you watch a normal person try to search for a word that they heard. They can’t find words in the dictionary, because they don’t know the writing system and they don’t know the morphology of the language, so my hope is to create a dictionary app that can solve these problems.
That’s probably more info than you wanted.