Swift - Natural language recognizer

Hi!

with iOS 12, Apple released a new framework for language recognition and other interesting stuff. Is called NLLanguageRecognizer.

Use the Natural Language framework to perform tasks like language and script identification, tokenization, lemmatization, parts-of-speech tagging, and named entity recognition. You can also use this framework with Create ML to train and deploy custom natural language models.

This framework provides a high-level API for lots of language detection features using text.

Let’s see some example:

こんにちは、私はアルベルトです、私はイタリアに住んでいて、私は不明な言語で書いています。

Which language is this?
How many words contain this phrase?
There are names inside? Places? Company names?

Who knows. Me not.

Answers:

JA, or better JAPANESE
23 words in this phrase
The english translations is “Hi, I’m Alberto, I live in Italy and I write in an unknown language.”, so yes, there is a name and a place inside.

Let’s do it to the new iOS/macOS common framework, NLNaturalLanguage to see how it works.

Examine this phrase:

let string = "Ciao, sono Alberto, vivo in Italia e scrivo in an unknown language. Mi piace la Coca-Cola."

it’s mixed, ITALIAN / ENGLISH. We can use this as a good example.

DETECTING LANGUAGE(s)

import NaturalLanguage  

let string = "Ciao, sono Alberto, vivo a Bergamo e scrivo in an unknown language. Mi piace la CocaCola."

  // create a new recognizer  
let languageRecognizer = NLLanguageRecognizer()   
// that should read your string  
languageRecognizer.processString(string)   
  
// get eventually any language hypoteses  
let hypoteses = languageRecognizer.languageHypotheses(withMaximum: 2) //2   
  
// get the dominant language of the phrase  
let language  = languageRecognizer.dominantLanguage!.rawValue 
  
  print("First language is  : \(language)")  
print("Other languages are: \(hypoteses)")

output in console:

First language is  : it Other languages are:   
[__C.NLLanguage(_rawValue: it): 0.9752411842346191,   
__C.NLLanguage(_rawValue: en): 0.009950380772352219]

We receive the languages and the percentage of the confidence. Good. Italian is about 0.97% so we can trust the algorithm.

TOKENIZE A TEXT:

Let’s count the words (or the paragraph, or the sentences, or the document…) using NLTokenizer:

// create a new tokenizer   
// choose your unit (word, paragraph, sentences, document)   
let tokenizer = NLTokenizer(unit: .word)  
  
  // set your language (or use the discovered one...)   
tokenizer.setLanguage( .italian )  //NLLanguage(language) )    
  
// link your string  
 tokenizer.string = string    
  
// get tokens   
let tokens = tokenizer.tokens(for: string.startIndex..<string.endIndex)    
  
print( "Words: \(tokens.count)" )  
 // Words: 12 .

EXTRACT pieces of information:

Another cool feature is related to TAG, to extract tagged informations like, people names, city, places and organization names, using NLTagger.

Let’s see how:

// create a tagger
 let tagger = NLTagger(tagSchemes: [.nameType])  

// set the text
 tagger.string = string

  // select the options
 let options: NLTagger.Options = [   
  .omitPunctuation,   
  .omitWhitespace,   
  .omitOther,   
  .joinNames ]  

// and the tag to extract
 let tags: [NLTag] = [  
  .personalName,  
  .placeName,  
  .organizationName  // and much more... ]

  // create all the tags 
let tags = tagger.tags(    
  in: string.startIndex..<string.endIndex,   
  unit: .word,   
  scheme: .nameType,   
  options: options) { tag, tokenRange in     
    if let tag = tag, tags.contains(tag) {       
        print("\(tag.rawValue) -> \(string[tokenRange])")     
    }     
  return true 
}

The result is nice… with mixed languages happens something strange, but it’s ok.

PersonalName -> Alberto 
PlaceName -> Bergamo 
OrganizationName -> an unknown language
 OrganizationName -> CocaCola

EXTRA

You are able to know the language of the phrase so, you can easily speech the text in the real and in the correct language using AVSpeechSynthesizer!

let speechUtterance = AVSpeechUtterance(string: string) 

//speechUtterance.rate = 0.7 
speechUtterance.volume = 1.0  

// set your discovered language 
speechUtterance.voice = AVSpeechSynthesisVoice(language: language)  

speechSynthesizer.speakUtterance(speechUtterance)

Instead of using these old techniques that make me laugh now… 😉

[ObjectiveC] Text to Speech with Google Translate

[Objective-C] Use Google speech on iPhone

ObjC – Tesla speech for OSX

And that’s all for now.

Go deeper into this framework because is very interesting.

while (true) { }

notes from a developer, tutorials, how-to, ideas

Swift – Natural language recognizer

Let’s see some example:

DETECTING LANGUAGE(s)

TOKENIZE A TEXT:

EXTRACT pieces of information:

EXTRA

Like this:

Related

Let’s see some example:

DETECTING LANGUAGE(s)

TOKENIZE A TEXT:

EXTRACT pieces of information:

EXTRA

Share this:

Like this:

Related