Who owns the data?

That’s a key question in preserving endangered Indigenous languages, says Te Taka Keegan, a computer scientist and Māori language revivalist.

Keegan was the keynote speaker at HELISET TŦE SḰÁL (Let the Languages Live), a Victoria conference on preserving and protecting languages across the planet that attracted more than 1,000 people.

Keegan and other presenters noted that data — on definitions, grammar, usage — is critical to preserving languages and encouraging their use.

But it’s also critical that Indigenous people own and control that data, said Keegan.

“I’ve heard it said that data colonization is the final colonization,” he said. “If it’s true, that’s the significance of data. We can be colonized through data. We need to be aware of that, and we need to take steps to make sure we’re not.”

Keegan then showed the audience a slide of him and his wife visiting Google’s headquarters.

“We went about 10 years ago,” he said. “I worked for them for six months. We worked on another tool, the Google Translator kit, but after that tool, Google Translate came out.”

Keegan said the goal was to develop a translation service based on statistical analysis and user responses. When he left the Google gig, the translation service wasn’t working “too bad,” he said.

Then he showed a slide with a Māori sentence on it. “Tenei au ka mihi atu ki a koutou katoa.”

Keegan followed that with two more slides. One from five years ago translated the Māori phrase as “Today I greet you all.” The next, from a few weeks ago, offered “I would like to thank you all” as a translation.

“So, the translation has changed,” Keegan said. “Most of the time I think not for the better. We haven’t had any input into that change. The way the system is set up it automatically gathers data, it automatically makes the change.”

The people he worked with at Google were great, he said.

“But to be honest, no one in Google really cared about the Māori language. The people that care about the Māori language are the people that speak the Māori language. If we want to create technologies for our own language, we have to do it ourselves.”

Keegan, whose mother is Māori, said his focus has been Maori language revitalization. But the issue of data sovereignty applies to all Indigenous people preserving languages, he said.

There are positive models.

Keegan talked about working with Microsoft on a translation project. They involved the Māori community, he said. The Microsoft translation hub relies on input and fact checking from Māori speakers, making it community-driven, he said.

Tracey Herbert, CEO of the First Peoples' Cultural Council, which put on the conference in partnership with UNESCO, also highlighted the importance of data management.

Fighting to Save Indigenous Sign Languages read more

The council, a First Nations-run provincial Crown corporation, has an online program called FirstVoices that documents Indigenous languages in B.C.

It has developed strategies to manage and care for the data that has been collected and contributed by elders and people who speak the languages. Others can sign in and learn about the languages.

Daniel Yona, who manages the program, joked that running it was so demanding that he really shouldn’t even have been at the conference — he should have been at home updating the online server.

“Digital technology is one of the important strategies to ensure that B.C.’s endangered First Nations languages have an opportunity to survive and thrive into the future,” says Herbert. “When used together with language learning and cultural immersion, digital technology can strongly support the development of new fluent Indigenous language speakers.”

Herbert said in the past researchers have gone into Indigenous communities and taken the language “data” they wanted.

Indigenous people interested in reviving a language have been “forced to buy back their own knowledge,” she said.

Indigenous Advocates Excited by Funding to Preserve BC’s Oldest Languages read more

“FirstVoices is about acknowledging Indigenous peoples as experts in our languages and as having the capacity to develop our own data and technology to support language learning and revitalization. It’s about taking control of our languages; the languages of the land,” said Herbert.

Keegan said data is at once a threat to Indigenous language sovereignty, and an opportunity.

“Imagine if you could ask the phone in your Indigenous language, what is the meaning of a certain word or a certain place? And have your phone respond, not from an English, not from a colonized perspective, but have it respond in a perspective from your language... from your culture... from your world viewpoint,” he said.

“I don’t think that’s impossible. I don’t think that’ll happen for some of our languages in the immediate future. But I think it’ll happen eventually. So to do that, we must protect our language data.”