Datamining Flamewars… err… Forums
Timothy Burke’s remarks regarding the value of forums got me thinking about the LACK of tools to expedite forum datamining. I’m more familiar with phpbb and other open, well-supported forum tools, but I couldn’t find many tangible tools browsing the sites for Lithium (used by SOE) and UBB.threads (common with NCSoft’s)
Some things can be done without specialized tools.
- Identify a few good community members with reasonable communication skills and make bookmarks to the player search for each one. Follow them. They can be your “pulse†on the community
- Most boards have the capability of tracking “hot topics.†See where the action is.
Neither solution is optimal. You could have tools that help you identify these community leaders- or rate users privately… something to make identifying key members in a less static or arbitrary manner. Hot topics are often cluttered with faddish forum games or quirky sidenotes. Heuristic analysis is nonexistent.
Even when a hot topic is identified, determining its relevance is a bit tough. Posters can be convinced rather quickly that an issue is big enough to merit a response- and that their point of view is the dominant one. Even devs can fall for this false sense of “urgency.”
City of Heroes had a rather decent flareup a few months back, and I took some time to tinker with a text parser to get a more detailed look. My results follow the break.
The Issue
Some players were unaware of the full mechanics that take place when a character “exemplars†(de-levels) to team with another player. Back when the system was implemented, a forum post detailed it, but the general lack of solid numbers available to the player made the base unaware that a bug made players more powerful- and play much less challenging- than intended.
They noticed the effects of the bugfix, though.
The forum flareup looked impressive enough. Over 600 posts in a two weeks, with a vocal playerbase calling for dev action.Â
Method:Â
I made a basic text parser to cache all the posts of a thread, sort by user, and present me with one post from each user. If that post definitively identified the user as for or against the developer action, they were marked as such. If it didn’t, I was served another post from the same user. I assumed that people would post more than once, but not change their position, allowing me to get a feel for the issue without reading all 600 posts. As I supported the dev position, I gave “benefit of the doubt†to my opponents, in part to offset any personal bias in reading the more ambiguous positions.
I rated all posts as:
- “this is an issue” - people who, in at least one post, said this was bad. They didn’t have to violently oppose it, but they were NOT satisfied with the status quo and may have just provided better alternatives.
- “NBD” - no big deal- people who didn’t see the issue or expressed that they did not consider it worthy of developer time.
- “dev” - nuff said
- “ambiguous” - people who spoke about the topic, but didn’t take a side. They often asked questons without deciding.
- “tangent” - people that were totally off topic.
It wasn’t perfect, but it should give me some idea of where the players stand.
Findings:
- There were 84 unique posters and 628 unique posts.
- 27 users posted 35 tangents unrelated to the topic- impossible to rate pro or con.
- 6 users remained ambivalent, posting 25 times.
- 28 people opposed the dev solution, writing 407 posts.Â
- 21 people saw no issue, with 152 posts. Most just posted once or twice.Â
- 2 were devs
Anyone casually browsing the topic would have assumed overwhelming community opposition. A plurality of the posters- and a majority of the posts- did oppose the dev solution, but with a much smaller margin than a casual review of the thread would give. While I didn’t do a line count, many of the “nbd” posts were extremely short (with a few noteworthy exceptions), and most of the opposition gave vivid testimonials, so the actual thread appeared even more skewered to the “this is an issue.”
Still, as a representative sample of a 162,000-strong subscriber base, this sampling was rather insignifcant.
It Could Have Been Better
Some things I could have done:
- How many unique forum dwellers were there in those 2 weeks of activity? It would be nice to know how many subscribers (162,000, iirc) posted in the forum in that timeframe. That would have not only given us a feel for the representative sample of the forum and the topic.  If only a few hundred forum dwellers were active, this was, indeed, a very hot topic.
- Many of those posting (for and against) only posted once. Knowing the post activity (what % of the posts did the 25% most active posters generate) would be helpful in identifying the ‘leaders’ of the issues. Catering a response directed specifically at these members will do more to quell an uprising than catering to the “one post and forgotten” crowd.
- Player Relationships: looking back at the thread, a notable a few .sig files indicated mains on the same server. Some in the same SuperGroup. It’s possible that this was a more organized response from a much smaller minority. If a typical “hot topic†only gets 50 active participants, and a supergroup has 70 members, a little organized effort can really skewer the balance.
- Post Counts: while not a dead giveaway, an infrequent poster that’s only active in “hot issues†may be a sign of an organized petition movement that will imbalance the “for/against” ratio even more. Those supporting the status-quo did not post as frequently- so they may be less inclined to “organize” a similar response.Â
- Heuristics. Keywords. Automation. Heck, I don’t think I could automate rating this (sarcasm, rampant quoting, etc) but I could *possibly* add a prerating with a reasonable degree of accuracy to speed the whole process up.
One thing’s for sure: any real solution wouldn’t work like my system did… I hate to think of the burden I put on that poor message board, rapidly requesting a few hundred posts. It would have to be integrated to the community tools, available to the community manager, and ALOT more robust than anything available today.
October 17th, 2006 at 8:09 pm
The problem with trying to automate anything is that the personal touch is removed. I am the admin on my company’s community forums and there are a few topics that I would have missed, if I had used your method. Not saying that it doesn’t make things easier, because it certainly would, however nothing beats scanning the forums.
I keep our forums up on one of my monitors all day and between tasks, I will jump in and look over the new posts. After a month, it’s easy to know who is reasoned, who’s a whiner, who is petty and who is emotional. I don’t have to read every post to know the contents of a thread. Most posts are simply reiterations of what was said before.
Get enough talented developers together and you can come up with any kind of tool that will assist, look at the twisting feature added to EQ1, but it will always overlook the best filter, our own minds and perception.
If you admin a high-traffic board, it actually becomes very easy to filter out the good info from the bad info. Also, sometimes within that bad info is a good point that just needs to be tweaked a little bit.
I honestly see the well-meaning behind automation, but I think that in the long run it would actually cause more harm than good. The community might not see it, but it would still be there.
October 17th, 2006 at 9:47 pm
Good to see the human element IS there… and that the companies do see enough value in the community to really invest time into monitoring them.
And… to be honest, I share your concern.
(Time for an Army Flashback) As combat engineers, we had trip flares- perimiter tripwire pyrotechnics that would alert us to someone sneaking up. A good commander understood that it was an added level of security that didn’t take away from the need to provide a properly-equipped watch team.
That’s just what they often did, though. They’d post the trip flares and cut back the watch… or consider the flares an adequate compensation for no NVG’s (night vision goggles). Flares didn’t compensate for active eyes.
(end flashback)
Just like automation tools don’t compensate for active moderators.
If any real data-mining tool is developed, it’s got to be used to supplement what the community manager already does, not supplant it. It should make you aware of key events earlier and give you the information you need to formulate a response faster.
As an academic, I can see it work, but from experience, I can see it doing just the damage you mention.