• PRODUCTS
    • Radarr
    • Command Center
    • Sentiment API
  • PLANS
  • RESOURCES
    • Reports & Whitepapers
    • Webinars
    • Radarr Blog
  • COMPANY
    • About Us
    • Careers
    • Media Coverage
  • SUPPORT
    • System Status
    • Contact Us
  • ACADEMY
    • Radarr Academy
  • TOOLS & API
    • Sentiment API
  • BOOK A DEMO
  • LOGIN/SIGNUP

Behind the Scenes: Learn About the NLP Engine of Radarr (Part 1)

  April 21, 2022
  Radarr Admin
  Behind The Scenes
NLP engine of radarr social listening

You all know of Radarr as one of the top tools for social listening, monitoring and analytics. But what makes us stand out is our technology and in this blog, we’re sharing some behind the scenes information about the NLP engine of Radarr that brings the insights you derive from our dashboard. 

Radarr leverages powerful Natural Language Processing (NLP) algorithms to uncover quick and actionable insights from billions of online conversations. The NLP Engine can process data in more than 140 languages, specializing in Asian Languages like Indonesian, Chinese, Japanese, Vietnamese, Thai, etc apart from English and provide analytics on a micro per-post level and a macro level. 

There are two parts to this article – The first part walks you through some steps that are key to Radarr’s NLP Engine in transforming unstructured and messy data into a structured, machine understandable format by using some data preprocessing techniques.


Steps to organizing your data in the NLP engine

Step 1: Language identification 

The first and most important step is Language Identification – to automatically detect the language(s) present in each conversation and in the queries built using the Query Engine. 

Radarr uses an ensemble of models and techniques to infer the language of a text. 

Language models built using a train dataset that contains a mix of social media and formal language data provides the basis for our language detection. 

In cases of short form text, where it is a challenge to predict the language, we resort to statistical approaches based on the language-specific vocabularies that we have built in-house over time. 

In cases where the language has still not been identified, we use the country and locale to give us hints about the language.

Language identification l Radarr

Once the language has been identified, the data stream is distributed into multiple preprocessing pipelines for groups of languages. 

This step is essential as certain groups of languages have their own unique vocabulary, tokenization methodology and direction of writing among other differences. 

For example languages like Japanese and Chinese do not have spaces between their words and languages like Arabic and Urdu are written from right to left. 

Step 2: Tokenization 

The next step is to break down streams of text into words, terms, sentences, or other meaningful elements called tokens. The Tokenization step has a very important effect on the rest of the pipeline as they form the basis of chunking the text into meaningful pieces for further lexical analysis. 

The simplest form of tokenization is the white space tokenization which is used to split words based on just a white space (which is useful for latin languages). Radarr uses multiple models pre-trained on social data across industries in order to learn the vocabulary per language to tokenize text meaningfully. 

Step 3: Normalization 

Once the posts have been segmented into meaningful tokens, they are converted into their normalized form. 

Normalization is the process of converting tokens to their base standard format in order to make semantic comparison easier across conversations. An ensemble of methods is used for normalizing words like Lemmatization and Stemming. 

Step 4: Vectorization 

Finally, before we glean insights from these billions of conversations, we convert all the text data into a machine understandable, vector format or Embeddings in order to perform advanced NLP techniques such as making word/sentence predictions, finding word/sentence similarities and understanding text semantics. This step of Vectorization or representation of text forms the foundation for all of Radarr’s Advanced NLP models. 

Vectorization l Radarr

So when it comes to being able to listen to and monitor online conversations in multiple languages, our NLP engine is one of the most robust out there. 

But that’s not all about it. 

In Part 2 of this article, we will explain about the insights that we extract using some Advanced NLP techniques after the initial data preprocessing.

To be notified, don’t forget to subscribe to our blog or try Radarr.

Recent Posts

View All Posts

CASE STUDY
Data is nothing but numbers if you are not able to weave a story and create meaningful insights.
Scroll down to view and download our free reports and whitepapers and learn how we utilize Radarr to give you insights into industry trends, social media happenings and much more.
Download Case Studies
social media ad creatives
Social Media Ad Creatives That Convert
| Image Analytics, Social Media Marketing

Learn more about designing social media ad creatives that engage target audiences. Social media platforms have entirely usurped traditional

social media content ideas
How to Find Never-Ending Social Media Content Ideas
| Social Media Marketing

Struggling to come up with new social media content ideas? This is for you. More than 100 million posts

best social media marketing books
The Best Social Media Marketing Books You Should Have on Your Reading List
| Social Media Marketing

The only list of the best social media marketing books you need to bookmark. There may be a lot

social media data in marketing
Best Ways To Use Social Media Data That You Might Have Overlooked
| Social Media Analytics

97% of Fortune 500 companies are using social media successfully to generate awareness and foster positive communication with the

new instagram updates for marketing
All the New Instagram Updates You Need to Know for Marketing (2022)
| Social Media Marketing

Instagram is one of the strongest platforms for marketing, and it’s here to stay.  With 2 billion users, it’s

list of the best social media marketing blogs
List of the Best Social Media Marketing Blogs We Recommend Subscribing to
| Social Media Marketing

The social media landscape is changing by the day. With Instagram launching new features every month and Meta being

best influencer management platform
How to Choose the Best Influencer Management Platform
| Influencer Marketing

Influencer marketing has become increasingly popular in recent years as brands realize the tremendous value of partnering with individuals

how to find the right influencers
Find the Right Influencers: 10 Steps to Find the Right Influencer for Your Campaign
| Influencer Marketing

Struggling to increase your campaign reach? Find the right influencer. With social commerce gaining immense popularity over the past

social media engagement KPIs
Which of These KPIs Demonstrate Social Media Engagement?
| Social Media Marketing

Tracking the right KPIs helps you evaluate your content better and takes you closer to your social media goals.

influencer marketing campaigns
How to Run Influencer Marketing Campaigns: Best Practices and Examples
| Influencer Marketing

Learn more about the best influencer marketing campaigns and how to run one yourself. Influencer marketing is a method

Radarr Newsletter

Become part of our list for updates and get first dibs on free industry reports. Sign up today!

Radarr

Radarr Command Center

Radarr Sentiment API

Copyright Radarr 2021

Privacy Policy

Terms of Use