One woman's path through doula training, childrearing, and a computer science Ph. D. program

Friday, March 2, 2012

A rant on single-blind peer review

Let's say you write something novel and clever and technical, and you submit it to a conference for consideration. If the paper is accepted, you will go to present the paper, and (depending on the conference) it can be published in some digital or paper proceedings. To determine if your paper (or poster, or whatever) is accepted, it undergoes what is called peer review. The program organizers electronically corral experts (or budding experts) in the field to read the submissions and provide critical feedback. These are your peers because they are in the same general field as you are. Some reviewers have more experience than you do, and others have less.

There are usually three or more reviewers per paper, and the reviews are usually in the following format.

  • There is a score which correlates the degree to which the reviewer thinks the paper should be accepted. For example:
    • -3: Strong reject
    • -2: Reject
    • -1: Weak reject 
    • 0: Neutral
    • 1: Weak accept
    • 2: Accept
    • 3: Strong accept
  • There is a narrative describing the paper's strengths and weaknesses, suggestions for improvement, layout and organization, and critique of the bibliography. Reviewers answer the question of whether the paper makes a significant impact or contribution to whatever field the paper is representing.

Sometimes there is a meta-review, provided by another peer, which synthesizes the other reviews into a short blurb.

Selecting papers for publication goes something like this. The reviewers' scores are tallied for each paper, and the papers are ordered from highest to lowest score. The program organizers decide how many papers to accept, and that number of the top-scoring papers is selected. If the reviewers provided any notes to the program organizers, or made specific suggestions for accepting or rejecting a paper, these submissions are taken on a case-by-case basis.

There are two types of peer review that are used most frequently: single-blind and double-blind.

For single-blind peer review, you submit your paper as it would be published, with your name and affiliation at the top. Reviewers can see this. But when you receive your paper reviews, you do not know who wrote which review because the reviewers' names are not provided. It is single-blind because it is blind to you, the author.

In double-blind peer review, you remove your name and affiliation from the paper, and try to anonymize it as much as possible. For example, if you write somewhere, "Our previous work at University of Waterbucket, our home institution," you would take out the reference to your institution's name. Reviewers are discouraged from trying to infer the authors of the paper, and thus should (in theory) not be biased based on your identity and the identities and affiliations of your co-authors. The reviews are also anonymous. It is double-blind because it is blind to the reviewers, and also to you, the author.

There are two reasons that I dislike single-blind peer review.



Because you are remembered


Pretend for a second that a particular reviewer has a chip on his (or her) shoulder about your research area. Or about specific methods which you might have used. For example, the researcher hates video games and thinks that people that play games are worthless in society, and your paper is about a game to examine social interactions, such as Prom Week (which is a fun and lauded new Facebook game; if you have not tried it yet, do). The review will say something like this.
Rating: -2 (Reject) -- The authors present a video game that lets the player make and break friendships in the week before the high school prom. The authors failed to cite relevant literature regarding the misuse of gaming technology and associated aggressiveness in players. The proposed game promotes poor behavior and sensationalizes high school relationships which are fundamentally flawed. Arguing for not accepting this paper.

Now, you can get a review like this even with an anonymous paper. But what can happen next is that the reviewer remembers your name or affiliation -- consciously or otherwise. And when your name comes up again, either in another conference for which he (or she) is peer-reviewing, or dropped in conversation when networking, or presented as a Nobel Peace Prize recipient, the reviewer will remember you for the paper he (or she) rejected. Your name has been associated with That Thing He (or She) Hates (With a Passion). Even if you write on another topic entirely, you are remembered for the paper that was a flop.


Because of the discourse


The second reason is that when you receive this angry or scathing or unfair review following the single-blind process, there is nothing you can do. When you do not know who he (or she) is, you cannot engage the reviewer in conversation; you cannot further the discourse of your disagreement. And when you do not know where he (or she) lives, you certainly cannot mail the reviewer a box of flaming poop.

Thursday, March 1, 2012

Teaching HCI and Jeopardy

This quarter, like most Winter quarters, I am a teaching assistant for the human-computer interaction (HCI) class on our campus. It is a mixed undergraduate and graduate class, and is cross-listed to two or three departments (this year: two). There is always a group project, and my job as a TA is to advise the groups on their projects. This was a slow week, in terms of project deliverables, so I thought we would spice up the discussion sections with a friendly game of Jeopardy.

A night or two before, I made a game using the software on Jeopardy Labs incorporating the topics in the first or second slide deck that the instructor provided on the course website. I took the questions -- err, the answers -- directly from the class notes, verbatim. One interesting thing to note is that we do not have regular assessments of rote memorization -- that is, there are no quizzes, no multiple-choice tests, and there is no final exam in the class. Instead, every assignment is project-based. It is an engineering course, and as such, we expect students to incorporate elements of theory and coursework into their engineering (or reverse-engineering) as required by the assignment.

So when I pulled out the first month's content in the Jeopardy game (which you can play online for free) I was unsurprised at the number of wrong answers... though I did wish there were more correct answers. What I found surprising was each of the three discussion section's reaction to the game.

In section A, at 11am,  four of the five groups actively participated in the game. Group sizes ranged from two to five students per group, with the two-person group leaving the game with 0 points (likely indicating that they did not answer any questions).


Group Group Size Score
A.1 4 300
A.2 3 -400
A.3 3 -500
A.4 5 -1900
A.5 2 0


Negative points indicated groups that would volunteer to answer a particular question (or provide the question for a particular answer) but would get it wrong -- thus subtracting rather than adding the points. Group A.1 won the game with 300 points; Group A.4 had the lowest number of points at -1900. Several members of the group, representing the largest group in the section, would attempt answering the most difficult questions -- frequently getting the answers wrong -- but engaging the class in merriment commiserating on their loss (after loss, after loss) of points.

The total points awarded in Section A was the sum of the absolute value of each group's points, or 3100.

In Section A, I did not allow other groups to answer the question after one group provided an incorrect answer. I did, however, provide hints when the answers were not given quickly. For example, I read: "This technique is used to test a system or complicated components of a system that do not exist."


One student was rubbing his head, and another was softly muttering under his breath: "Oh, oh, I remember this, oh!" -- or "I can even visualize the diagram, with the one guy in a different room with the curtain drawn."

I said, "That's right, it's like he is a man behind the curtain."

I waited a little longer.

"Dorothy would use this technique."

"Ding ding! What is Wizard of Oz?"

"That's right!" I exclaimed.


Section A played the game with a great, positive attitude. One student said, "This is fun! We should do this again!" to which I replied, with a wink, that next week, another game awaits.

Section B, at 12:30pm, had three groups. In this section, the largest group (B.2) had the most points at the end of the game, and the smallest group (B.1) had the least points. There were 1700 points distributed in Section B, indicating that groups had the chance to make a comeback -- the point total does not capture if a team has a string of bad luck followed by a string of good luck, or otherwise has a mix of correct and incorrect answers. Each of the three groups actively participated in the game, and, when I threatened another game next week, a student responded that it's high time to study. Right answer!


Group Group Size Score
B.1 2 -500
B.2 5 800
B.3 3 400


Students in both Section A and Section B avoided the Grounded Theory category like the plague. The last category standing, one student in Section B asked, "Can you give us a hint on what Grounded Theory is? Before I select it as a category?"

I thought for a moment, about whether to facepalm or giggle. Instead I just stared blankly at the student until he said, "Uh, never mind -- I'll take Grounded Theory for 100."

In Section B, I provided more hints. "These can be administered to large populations and can include open or closed items," I read from the screen. I waited a few moments. "It starts with a Q." I waited a few more moments. "The second letter is a U."

"Ding ding!" a student called.

"Ding?" I asked.

"What is a questionnaire?" the student answered.

"Correct!" I said, bouncing a little. "Good job!"

Section C was the smallest of the three sections, with just two student groups. There were 3000 points awarded in this section. But what struck me most was the feeling that the game was unfair. I mentioned earlier that we do not have regular quizzes or other assessment techniques to test memorization and rote learning. However, a huge amount of content is presented -- content that somehow needs to be learned, mastered, and applied to the course project and other design activities.

GroupGroup SizeScore
C.152300
C.22700


The student argued that this activity, playing Jeopardy with HCI concepts and terms from lecture, was testing just that -- memorization. Further, he said, designing a system using HCI concepts, and calling out the concepts by name, are two different things. You can look up the names. But you should be able to describe the concepts.

Further, he said, the class size was unfair. Assuming that each student can answer five percent of the questions correctly (I raised my eyebrows -- hoping he said 95% and I misheard), the student argued that there were simply not enough students in the section to make critical knowledge mass necessary to produce correct answers.

I argued that part of the course is learning how to convey your ideas to an audience, and how to persuade others in the HCI field that your methods are consistent and well-grounded. And the only way you can do that is to know the terms, to speak the language. What's more, I said, it only takes one person that knows 100% of the content to produce correct answers. The number of students in the class should not matter. You should each know all of the content.

Right?

If an HCI student cannot tell me, the TA, the difference between performance measurement and retrospective testing, or the difference between latent and manifest content, does that mean he or she does not remember the terms, or does it mean that he or she does not understand them? Can you make an affinity diagram if you cannot remember it from lecture? Can you apply Grounded Theory when you do not select it as a category in Jeopardy because the entire concept draws a blank for you?

I have TAed classes with weekly quizzes, and classes without. My opinion is that (short) weekly quizzes help the instructor and teaching staff in two ways:
  1. Weekly quizzes clue me in on each student's progress and performance.
  2. Weekly quizzes give the students a list of solid topics to study each week.
Maybe HCI should bring back the weekly quiz, so that a little bit of repetition and memorization makes its way to the curriculum. Or maybe we need to reconsider the course project, and see how we can better incorporate the terms and concepts from lecture into the project. That is the danger of creating adequate project requirements: leaving the requirements too open allows students to disregard the formalism; closing them too much stifles creativity.

What do you think?
Related Posts Plugin for WordPress, Blogger...