Python Natural Language Processing Cookbook book announcement

I am happy to announce that my book, Python Natural Language Processing Cookbook, is now available in print and as a Kindle book. In this blog post I will tell you more about the book so that you can better understand whether the book is for you. The first important thing about the book is …

Python Natural Language Processing Cookbook book announcement Read More »

Ensuring data quality for LaborEdge: a case study

In August I did a data quality project with LaborEdge, automating part of their data standardization process using Natural Language Processing (NLP) techniques. Data standardization is part of a larger host of tasks ensuring data quality. In this post I am going to tell you about the project, data quality and how NLP can help …

Ensuring data quality for LaborEdge: a case study Read More »

How Natural Language Processing (NLP) can reduce costs, improve productivity, and raise profits in business

Almost all businesses deal with texts: invoices, proposals, research papers, reports, resumes, job descriptions, emails, news, and other documents. Some of these text documents require processing, such as sorting, extracting and entering information into a database, evaluating for sentiment, and so on. According to Forbes Magazine, 84% of businesses still rely on some sort of …

How Natural Language Processing (NLP) can reduce costs, improve productivity, and raise profits in business Read More »

Using social media and customer review analysis it is possible to unlock insights present in the data. The reviews are split and the review snippets are classified into topics/trends that provide feedback on all aspects of business.

Analyzing Social Media Posts and Customer Reviews by Topics: an Important Data-driven Marketing Tool that Helps Reveal Trends in Different Business Aspects

Social media posts and reviews from review sites like Yelp are an important and powerful marketing and feedback tool for businesses. They are readily available on the Internet and provide customers’ opinions on different aspects of restaurants, car mechanics, cinemas, IT consulting firms, mobile apps, Internet hosting companies and every other kind of business out …

Analyzing Social Media Posts and Customer Reviews by Topics: an Important Data-driven Marketing Tool that Helps Reveal Trends in Different Business Aspects Read More »

The Latest on the Chatbot Craze: What Are They, How They Can Help Your Business and Why Using Them Is Smart

What are chatbots? So what are chatbots all about? They are a general category of programs that interact with people via messaging or speech. Speech bots are Alexa, Siri, etc. In this post I will talk about messaging chatbots. Messaging chatbots use your favorite messaging platform: Facebook, WhatsApp, Slack, Telegram, Kik, etc. The program interacts …

The Latest on the Chatbot Craze: What Are They, How They Can Help Your Business and Why Using Them Is Smart Read More »

LTE: Extracting relevant text and features from HTML

If you have lots of HTML files that you collected for a project, chances are you can’t really use those files as is. Usually, you are looking to extract some information from these files, for example, an article (like in my project), a product description, or user reviews. Depending on the data you want to …

LTE: Extracting relevant text and features from HTML Read More »

LTE: Urban, suburban and rural discourse

After I labeled all the data by topic, I took a quick look at the topic tallies, both overall and by newspaper, keeping in mind the newspapers’ locations. The overall percentages are shown in the graph below: Some of these stats were expected, others surprising. I expected most of the letters to be about politics …

LTE: Urban, suburban and rural discourse Read More »

LTE: Labeling data for machine learning

For the project of automatically assigning topics to the letters to the editor, I needed labeled data. Sometimes blog posts, or articles in a newspaper will have assigned labels (for example, this post is tagged with “machine learning” and “natural language processing”). However, none of the newspapers I got my data from did that. Thus, …

LTE: Labeling data for machine learning Read More »

LTE: Scraping web-sites to collect data

In this post, I detail how I collected the data for the letters to the editor corpus analysis project. First, I picked several web-sites where I could access letters to the editor archives: Chicago Tribune from Illinois The Citizen from Georgia Daily Herald from Illinois Dubois County Free Press from Indiana Ellsworth American from Maine …

LTE: Scraping web-sites to collect data Read More »