A question they would typically receive from a student (humanities)
A structured query they would use to search a database
The database they would use to search for that question
This is what things looked like after we got all the information back from the librarians
Then we took that information and used it in 2 ways.
The first was to actually run the search query in the suggested database.
We put the first 30 citations into a bibliographic citation manager and saved all of the actual full text
We chose 30 because usability studies (Jakob Neilsen) tell us that less than 1% of all users ever go beyond the 3rd page of results and very few people ever change the defaults (ie, once they run a search they stick with it, success or failure).
Most of our DBs present 10 results per page so 30 results should represent a large enough sample to represent the actual set of results the majority of our users is ever going to see after performing a search.
We ran the same query in Google Scholar and saved the results again in a bibliographic Manager.
We used Zotero to quickly export all of the results.
We also saved the full text of each citation for later use in our study.
So, the first searches we ran using the native DBs and GS was for the query given to us by the librarian
The second set of searches we ran was to see if the citations we found in the DB were available in GS and vice versa
Here is the same screenshot we saw just a minute ago.
We took the bibliographic information for each citation and searched for the citation within Google Scholar.
We then did the same thing in reverse.
We took the 30 results from GS and searched for each citation within the database
This allowed us to later calculate something we called “exclusivity”
We put the citations into 1 of 3 possible “exclusivity” categories
Shows proportion of citations within our study that overlap. As you can see, within our study we found that, on average, GS had a larger result set overall as well as more exclusively than the databases.
So now that we have the citations from the database and the citations from Google Scholar. We used the bibliographic manager to generate a list of references that we input into an Excel spreadsheet. Then, using a random number table, we completely randomized the order of the citations for each subject specialist.
Finally, to deliver the content to the librarians in a way in which it would be easiest for them to evaluate, we saved the full-text of each citation according to its randomly assigned citation number. Then we used Excel to create hyperlinks to the full-text of each citation and delivered this list along with the full-text on a CD to the subject librarians. We asked them to evaluate each citation using a rubric which we provided in hard copy form. As you can see, the subject librarians were only able to see the citation number and the bibliographic information. By clicking on the hyperlinked citation number, the full-text of that citation would appear and the subject librarians could easily rate the citation on the rubric.
Have full text appear on this page after click to simulate linking from provided document.
This screen shows the rubric that we used. It is based on a rubric that has popularly been used to evaluate print resources (Alexander, 1999)
Alexander, J. E. (1999). Web wisdom: How to evaluate and create information quality on the Web.
We asked each subject librarian to assign a score of between 1 and 3 within 6 different categories to each of the citations (1 was below average, 2 was average and 3 was above average).
These six categories were:
Accuracy – which looks at
Authority – specifically the
Objectivity – looking for
Currency – is the information up to date?
How deep is the Coverage
And finally Relevancy – how well does the citation relate to the research question
This resulted in a total possible score of 18 for each citation - we called this a scholarliness score
We used this statistical model to evaluate the data. Essentially this formula says 2 important things about the way we used the data:
We controlled for the differences between the way librarians grade
We controlled for the differences in how exclusively the citation was available
This allowed us to pinpoint and measure any differences there may have been between disciplines in our data as well as any differences that can be attributed to the source of the citations
Citations found only in GS had, on average, a 17.6% higher scholarliness score than citations found only in the DB
Citations found in both GS and the DB were even higher than citations found only in GS
We found no statistically significant difference in the scholarliness scores between disciplines (i.e., humanities citations in GS are just as scholarly as science citations found in GS)
This study can only be extrapolated statistically to the specific topics and subject specialists used in this study
A more robust statistical methodology would need to be employed to make these results generally applicable
We are encouraged by the results we received and feel that they would probably hold up but cannot say so until another study is done
If we had to do it over again, we would have increased the Likert scale on our rubric from 1-3 to 1-7 or 1-10
This would have allowed for a more nuanced statistical analysis and made it easier to spot significant differences, if any, between GS and databases
Our scholarliness calculation, ultimately, was based on the subjective opinions of librarians with subject expertise.
There are lots of ways to create a scholarliness score (citation counts, impact factors, etc). Which is best is still debatable
Our study compared GS to individual library databases. A more appropriate comparison may be GS to federated search tools.