Selecting conference session proposals: popular vote? selection committee?
I was on the "Ecosystem" track session selection team for Drupalcon London, which motivated me to finally do some more analysis on the traditional pre-selection session voting. Specifically, I wanted to compare the votes a session receives against the evaluations submitted after the conference.
By the way, if you have the opportunity, I highly suggest going to a Drupalcon; they are always great events.
Here are some conclusions based on analysis of the evaluation and voting data from DrupalCon Chicago:
- Voting was not a useful predictor of high quality sessions!
- The pre-selected sessions did not fare better in terms of evaluation than the other sessions (though they may have served a secondary goal of getting attendees to sign up earlier).
- We should re-evaluate how we do panels. They tend to get lower scores in the evaluation.
- The number of evaluations submitted increased 10% compared to San Francisco, which seems great (Larry Garfield theorizes it is related to the mobile app, I think there are a lot of factors involved)
Is voting a good way to judge conference session submissions?
Drupalcon has historically used a voting and committee system for session selection that is pretty common. This is also the default workflow for sites based on the Conference Organizing Distribution.
- Users register on the site
- They propose sessions (and usually there is a session submission cutoff date before voting)
- Voting begins: people (sometimes registered users, sometimes limited to attendees) can vote on their favorite sessions
- During steps 2 and 3, a session selection committee is encouraging submissions and contacting the session proposers to improve their session descriptions
- Selection begins: Voting closes and the session selection committee does their best to choose the right sessions based on factors like appropriateness of content to the audience, the number of votes, their knowledge of the presenter's skill, diversity of ideas
Drupalcon Chicago (the event I'm basing this analysis on) had a few changes to that model. They pre-selected some sessions from the people they knew would submit sessions and get accepted (see their blog post on that and the faq). This allows us to see whether pre-selecting actually brought in sessions that were more valuable to people which seems like a decent proxy for whether or not the committee's choices are right.
The pre-conference voting had 5 stars with the following labels:
- I have no interest in this session
- I would probably not attend this session
- I might attend this session
- I would probably attend this session
- I totally want to see this session
The post-session evaluations had 5 stars with the following criteria:
- Overall evaluation of this session
- Speaker's ability to control discussions and keep session moving
- Speaker's knowledge of topic
- Speaker's presentation skills
- Content of speaker's slides/visual aids
I've previously looked at the percent of the attendee population that actually gets to vote and the distribution of votes (1 to 5) to see if that was actually used in a meaningful way in Chicago (that analysis is on groups.drupal.org). Given the distribution of votes in Chicago across the entire 1 to 5 spectrum, I believe it is useful to use a 5 star system as a rating on a session. However, I don't think the resulting value is directly useful by the session selection committee when they choose individual sessions (more on that later).
My analysis method was to create a nice spreadsheet with the average and count of votes on sessions from the pre-conference period where votes were used to help determine which sessions to include. Then I added in the votes (from 1 to 5 stars) which covered several categories.
I graphed the pre-conference votes compared to the post-conference evaluations and used the "correl" function to see how correlated the data is. I expect a straight line correlation: the higher the average votes, the higher the post-conference evaluation scores. In fact, there was basically no correlation.
What I found was that there is basically no correlation between the pre-conference voting and the post-session evaluations. Here is a table that shows the axis (i.e. one of those 5 elements above) and the correlation between that axis and pre-conference session votes.
As a graph, the overall data looks like:
I graphed it along with a random line that has a correlation value of .95. As you can see, the overall evaluation is not at all correlated to the outcome evaluations.
It isn't surprising that votes don't correlate to session quality. Voting tends to be done by a minority of event attendees who are "insiders" to the event. They are likely to be swayed by friendships, employers, and social media campaigns.
Comparing pre-selected sessions to regular sessions
I also took an average of the evaluation scores across non-pre-selected-sessions and the pre-selected sessions. The average overall evaluation score for non-pre-selected sessions was 80.9 vs. 80.7 for pre-selected sessions. The other axes show similar results except for knowledge and visuals, though it's not clear if those are statistically significant.
|Axis||Pre-selected average evaluation score||Non-pre-selected average evaluation score|
So, we can see that regularly selected sessions got very similar scores to the pre-selected ones. I'm not suggesting that pre-selecting is flawed (it didn't produce lower results, anyway), but I do think we should carefully consider who we pre-select.
The third bit of analysis I did was to look at overall score, and the number of presenters for that session. Here's the average per decile where decile 1 is the 9 sessions that were ranked highest. Seems like a pretty clear trend from nearly 1 person for the top rated sessions to 2.5 people for the bottom rated sessions.
|Average # of presenters||Decile|
I believe there are two big reasons for this. First, panel presentations are rarely done in a well-coordinated manner and the panel members usually don't take time to practice as a group (our distributed community makes that hard). Second, Drupalcon session selection committees often suggest similar topics get merged into one panel. I think we should stop merging independent presenters. The result is often that people who may not have the same story to tell end up putting 45 minutes of information into one-half or one-third of the time.
What can we do to improve session quality and session selection?
One of the great tools for session selection committee members at Drupalcon London was the availability of evaluation data from previous conferences. If a proposed session got a lot of votes (perhaps due to a campaign on twitter or within a large company) but the presenter had horrible evaluations from a previous conference then the evaluator has an easy job: just say "no thanks".
The only problem with using previous conference evaluations to judge sessions is that it can lead to stagnation among the presenters. Part of the value of a conference is in hearing new ideas. This can be reduced by having free-for-all BOF sessions, but I think in the Drupal world that part of the solution is to use Drupalcamps as a ramp into Drupalcon: presenters should give their session at a camp and mention that (and any evaluations from the camp, any video from the camp) in their session proposal. With approval from presenters, Drupalcamp Colorado published our evaluations - we hope this helps other camps and that they will do the same. It's not surprise that some feature requests for COD will help make the process of gathering this information and getting it to the right people much easier.
See also a great discussion on groups.drupal.org: On popular voting and merit-based selection of sessions.
What else can improve session quality?
So far I've talked about identifying good sessions, but I think the nature is more complex. It's also about encouraging and inspiring the presenters to do great work on their sessions. We can tell them "please practice it 10 times" but nobody will do it if they aren't motivated. Sending reminder mails to presenters like "we expect 3,000 attendees including key decision makers from companies like Humongo Inc." could help. There's also the possibility of compensating presenters. Drupalcon Chicago gave a mix of cash and non-cash benefits (massage chair, faster check-in line).
Scott Berkun gives some tips on how to improve the presenter experience at a conference in An open letter to conference organizers. He recommends a lot of things including sharing the results of the evaluation data. I'm in favor of that as well...(provide default terms of attendance).
Extra note: Want to see your evaluations from Chicago? Just needs more code
There were evaluations in Chicago, but the speakers have not seen this data. I got access to it as part of my role on the London session selection team and my work on the infrastructure team/Chicago sites.
However, the fact that presenters can't see it is a result of a bug in software that you can help fix. The organizers of Drupalcon want to share that information, but the code to do that isn't fully working. If you can help make it work then all session presenters will be able to see their evaluations.