Linguistics of Twitter

Summary

Dialectical changes in America are influencing expression online. This talk will discuss a current project which is using the Natural Language Toolkit to develop up to date reference materials to measure and monitor online natural language.

Description

Contrary to expectations, the prevalence of television did not cause every American to speak in a common standard dialect. Rather, smaller sub-regional dialects are merging into stronger regional dialects with the largest change in spoken English since the 1750's taking place in the Northern Cities Vowel Shift.

Social Media is widely considered a conversational media, users often leaning on their dialect which to express themselves.

Taking a recent tweet for example:

'_andBeautyKills: – after tonight, don’t leave your boy roun’ me, umma #true playa fareal.'

This tweet presents a problem for traditional natural language processing paradigm:

  • Do they build out an extensive reg ex to solve this?
  • Even Worse, do they reject it because of non-Standard English?
  • How do they respond such that communication is effective?

Currently under development with Python using the Natural Language Toolkit are the tools and methodologies to process, understand and respond to communication that falls outside Standard American English. This talk will focus on the status of existing tools, where development stands, challenges for traditional tools and potential opportunities for exploration.

While limited to American English, any participant who is studying natural language processing of any language is welcome and sure to learn. The techniques could be applied to languages around the world for which the motivated programmer is knowledgeable about.