• PRODUCTS
    • Listen: Social and Digital Intelligence
    • Respond: Social Media Response
    • Analyze: Page Analytics
    • Publish: Social Media Publishing
    • Social Studio Alternative
    • Command Center
    • Sentiment API
  • PRICING
  • RESOURCES
    • Reports & Whitepapers
    • Premium Reports
    • Webinars
    • Radarr Academy
    • System Status
  • BLOG
  • COMPANY
    • About Us
    • Careers
    • Media Coverage
    • Contact Us
  • FREE 7-DAY TRIAL
  • BOOK A DEMO

Behind the Scenes: Learn About the NLP Engine of Radarr (Part 1)

  April 21, 2022
  Radarr Admin
  Behind The Scenes
NLP engine of radarr social listening

You all know of Radarr as one of the top tools for social listening, monitoring and analytics. But what makes us stand out is our technology and in this blog, we’re sharing some behind the scenes information about the NLP engine of Radarr that brings the insights you derive from our dashboard. 

Radarr leverages powerful Natural Language Processing (NLP) algorithms to uncover quick and actionable insights from billions of online conversations. The NLP Engine can process data in more than 140 languages, specializing in Asian Languages like Indonesian, Chinese, Japanese, Vietnamese, Thai, etc apart from English and provide analytics on a micro per-post level and a macro level. 

There are two parts to this article – The first part walks you through some steps that are key to Radarr’s NLP Engine in transforming unstructured and messy data into a structured, machine understandable format by using some data preprocessing techniques.


Steps to organizing your data in the NLP engine

Step 1: Language identification 

The first and most important step is Language Identification – to automatically detect the language(s) present in each conversation and in the queries built using the Query Engine. 

Radarr uses an ensemble of models and techniques to infer the language of a text. 

Language models built using a train dataset that contains a mix of social media and formal language data provides the basis for our language detection. 

In cases of short form text, where it is a challenge to predict the language, we resort to statistical approaches based on the language-specific vocabularies that we have built in-house over time. 

In cases where the language has still not been identified, we use the country and locale to give us hints about the language.

Language identification l Radarr

Once the language has been identified, the data stream is distributed into multiple preprocessing pipelines for groups of languages. 

This step is essential as certain groups of languages have their own unique vocabulary, tokenization methodology and direction of writing among other differences. 

For example languages like Japanese and Chinese do not have spaces between their words and languages like Arabic and Urdu are written from right to left. 

Step 2: Tokenization 

The next step is to break down streams of text into words, terms, sentences, or other meaningful elements called tokens. The Tokenization step has a very important effect on the rest of the pipeline as they form the basis of chunking the text into meaningful pieces for further lexical analysis. 

The simplest form of tokenization is the white space tokenization which is used to split words based on just a white space (which is useful for latin languages). Radarr uses multiple models pre-trained on social data across industries in order to learn the vocabulary per language to tokenize text meaningfully. 

Step 3: Normalization 

Once the posts have been segmented into meaningful tokens, they are converted into their normalized form. 

Normalization is the process of converting tokens to their base standard format in order to make semantic comparison easier across conversations. An ensemble of methods is used for normalizing words like Lemmatization and Stemming. 

Step 4: Vectorization 

Finally, before we glean insights from these billions of conversations, we convert all the text data into a machine understandable, vector format or Embeddings in order to perform advanced NLP techniques such as making word/sentence predictions, finding word/sentence similarities and understanding text semantics. This step of Vectorization or representation of text forms the foundation for all of Radarr’s Advanced NLP models. 

Vectorization l Radarr

So when it comes to being able to listen to and monitor online conversations in multiple languages, our NLP engine is one of the most robust out there. 

But that’s not all about it. 

In Part 2 of this article, we will explain about the insights that we extract using some Advanced NLP techniques after the initial data preprocessing.

To be notified, don’t forget to subscribe to our blog or try Radarr.

Recent Posts

View All Posts

Trusted by 1000+ brands

  • Book A Demo with Radarr

Book A Demo

Name

Company Name

Work Email

Job Title

Phone Number

Thank you for contacting us!
We will be in touch with you shortly.
fake social media accounts and how they impact brand growth
Fake Social Media Accounts and How They Hamper Brand Growth
| Crisis Management

Your brand’s presence and reputation that you built over years of hard work are sacred to you. People trust

social listening for banking and financial institutes
How to Use Social Listening for Banking and Financial Institutes to Humanize Brands
| Social Listening

Learn how to use social listening for the banking and financial institutions. In today’s digital era, social media has

top brand accounts on youtube
Top Brand Accounts on Youtube and What You Can Learn From Them
| Social Media Marketing

deinfluencing and what it means for brands
Deinfluencing and What it means for Brands
| Social Media Marketing

We have mentioned how authenticity is one of the most important things that the audience (especially Gen Z) is

Best ChatGPT Prompts for Social Media Marketers
Best ChatGPT Prompts for Social Media Marketers
| Social Media Marketing

List of the best ChatGPT prompts for social media  Social media has become an excellent tool for businesses to

navigating the new twitter -social media marketing
Navigating the New Twitter: What Marketers Need to Know
| Social Media Marketing

Disclaimer: This article is accurate as of April 26, 2023 Since Elon Musk acquired Twitter in October 2022, we’ve

how to use social media for market research
A Quick Guide on How to Use Social Media for Market Research
| Social Media Marketing

Learn how to use social media for market research. Most brands today know the importance of using social media

social listening on reddit
Social Listening on Reddit – The Untapped Opportunity for Brands
| Social Listening

Quick guide on getting started with social listening on Reddit. With a staggering global user base of 1.1 billion

5A approach to social listening for business
The 5A Approach to Utilize Social Listening in Boosting Sales
| Social Listening

Social listening is a powerful tool to help businesses gain valuable insights into their customers and the market. It

instagram metrics and how to improve them
Boost Your Instagram Success: Instagram Metrics to Measure and How to Improve Them
| Social Media Marketing

 Discover the Key Instagram Metrics You Need to Measure for Success – And How to Improve Them! 2895 Measuring

Radarr Newsletter

Become part of our list for updates and get first dibs on free industry reports. Sign up today!

Radarr

Radarr Command Center

Radarr Sentiment API

Copyright Radarr 2021

Privacy Policy

Terms of Use