Coin Nup Forum

Coin Nup Forum

You are not logged in.

#1 2020-09-01 10:16:52

From: Denmark
Registered: 2020-09-01
Posts: 1

BotXO Releases the First-Ever Norwegian Bert Model

BotXO Releases the First-Ever Norwegian Bert Model, Improves the Danish Model, and Starts a Model Zoo Initiative.

BotXO’s open-source Danish BERT Model has sparked quite a bit of interest
Danish newspaper Børsen wrote an article about it

and many Danish data scientists have participated in discussions about it on GitHub.
Many of our customers here at BotXO are also running experiments and are using the models for different projects.Today, BotXO data science team is releasing the first-ever BERT model trained on Norwegian data –  Norwegian Bert Module.
Most importantly, we hope that the model will help data scientists in Norway build state of the art Natural Language Processing solutions.
We encourage Norwegian data scientists and managers will reach out to us just as the Danish community did. Today, .

We are also releasing an improved version of the Danish model

You can find both the updated Danish BERT model and the new Norwegian BERT model in the same GitHub repository.
Why Release a Norwegian Model.

The Norwegian language is used only in Norway

where there are approximately 4.6 million native speakers.
Like Danish, this means that the language is often overlooked for Natural Language Processing tools. By open-sourcing a Norwegian BERT model, .

We hope to help the community build their own Natural Language Processing solutions

Our chatbots at BotXO support Norwegian out of the box and by using our prebuilt intents for Norwegian, it is easy to get started setting up a state of the art chatbot.
How Are the Models Trained.

We train BERT models on a new kind of computer chip called a TPU

short for Tensor Processing Unit.
In other words, .

The chip is excellent at “Tensor” operations

Exactly the kind of operations needed to train Deep Neural Networks. The same way that “Vector” means a list of numbers, and “Matrix” means a rectangle of numbers.

A “Tensor” is just a fancy word for a box of numbers*. A 1-dimensional tensor is a vector

a 2-dimensional tensor is a matrix and anything with more dimensions (such as a box) is called a tensor.  Renting Google’s TPUs – which is the only way to access them – cost a lot of money.
In short, TPUs are expensive to use, so it is important to make the algorithms run as fast as possible to decrease cost.                Where Do the Training Data Come From.

We use text fetched from the internet to train our BERT models

The non-profit organization Common Crawl is periodically gathering huge amounts of data from the internet.
But automatically detecting the language of the text, we can create a data set of (for example) Norwegian data. Because it takes a lot of time to read through the vast amounts of data, consequently we have run our algorithms on multiple computers at once.
And also make sure our algorithms are super fast.
What Are We Going to Do Next.

Now that we have released a Norwegian model

we are going to target the remaining Nordic languages, which are Swedish and Finnish. Afterwards, we are going to start working on the remaining European languages.However, since Natural Language Processing research is progressing so rapidly, it is increasingly more challenging to maintain a repository of models that are up-to-date with state of the art research.That is why we have decided to pick a different strategy: Rather than releasing more European models, we are going to release our data sets formatted for training new BERT models in many different languages.Importantly, we hope that we can get the European NLP community to help us train models that are up-to-date with state of the art General Purpose Language Models.  Please share this article and remember to check the blog regularly for updates on our new Model Zoo initiative!                                                                                Article written by Jens Dahl Møllerhøj                                   The post BotXO Releases the First-Ever Norwegian Bert Model appeared first on BotXO.


Board footer

Powered by FluxBB