Emoji usage on reddit (Oct 2016 dataset)

December 04, 2016

NB: I'm not a statistician, just bored and wanted to learn some basic C#. This is not meant to be a scientifically rigorous study or analysis. If you have any suggestions on improvements or otherwise, feel free to send a message. Most of my code just throws numbers into Redis using specific key names for quick O(1) lookups.

No effort to filter out bots were made, so some smaller subs might have bots making up the vast majority of emojis.

Here are the percent of comments that contain emojis, by subreddit, sorted by highest percentage first. A single 'comment containing emoji' is defined as one or more emojis within the comment body, for example: a single comment consisting of "🔥🔥🔥🔥🔥" will add 1 to the sum of comments that contain emojis.


Here are emojis by use, relative to their own subreddit. These are not deduplicated: a single comment body containing "🔥🔥🔥🔥🔥" will count as 5 added to the total emoji use count for that subreddit, and 5 added to the total emoji use count for the dimensions (subreddit, 🔥).

For example, given "% of emoji that is 🔥 in /r/furry", we do .


Data set: Jason Baumgartner's collection of October 2016 reddit comments. Raw JSON files before chart/HTML processing: emoji_total_raw, emoji_sub_aggregation