Title: ScienceDirect
Publisher: Elsevier
URL: http://sciencedirect.com
Cost: Subscription fee to be negotiated, abstracts and bibliographic records are open access
Tested: 25-27 August, 2006
Elsevier chose a good time for the announcement of the new release of ScienceDirect (SD), the opening of the World Library and Information Congress in Seoul, attended by about 5,000 participants, including many from the low income countries from Africa, Asia and Europe (but few from Latin America and South America). Although the primary purpose of ScienceDirect is to make full-text journal articles (and reference works) available to subscribers, the open access services for non-subscribers are also of great importance, especially in the low income countries where even indexing and abstracting services are beyond the reach of most researchers. While most of the scholarly publishers now provide open access to their bibliographic records and abstracts, ScienceDirect is far the largest of the single publisher archives.
The next largest publisher’s archive is that of Springer (which now includes the journals of Kluwer which was earlier acquired by Springer). However, its digital archive has less than half the number of articles than that of the ScienceDirect archive. This is somewhat surprising because the ratio between the number of journals published by the two companies is much closer. Springer claims to publish 1,551 journals, and Elsevier about 2,000 (quite an increase from the 1,700 that appeared on its home page until recently, but in the era of fervent swaps and acquisitions of journals it is not surprising). Presumably, Springer has not yet digitized its journals to the same extent as Elsevier, and may not have as many complete runs with digitizing rights as Elsevier. The other large scholarly publishers’ archives, such as Blackwell-Synergy, Oxford University Press, Taylor and Francis, etc. have less than 1 million articles each in their archives.
Elsevier is without doubt the largest publisher of scholarly journals. No wonder that it prominently points out on the home page of ScienceDirect that “[it] offers more than a quarter of the world’s scientific, medical and technical information online”. I don’t doubt that either, especially if you consider the articles published by Elsevier (and its imprints like Academic Press, Prentice Hall) in the social sciences, law, and even art & humanities journals. I would estimate it to be closer to 20% than 25% as the total number of SMT articles are more likely to hover around 40-42 million. For example, the Century of Science edition of Web of Science alone has records for about 38 million articles and ISI does not pick science, social science, and art & humanities journals lightly for processing, and was particularly selective in choosing articles for indexing for the 1990-1944 period.
There are 7.8 million articles in Science Direct, according to the home page. I could not directly verify this number, as I like to do with most claims, because my very atypical test operations (not surprisingly) timed out, but I could find more than 6.6 million records through the Scirus subset of ScienceDirect. Tests for specific words from a variety of disciplines (such as toxoplasmosis, PTSD, solar energy, genocide, glass ceiling, second language) showed that 70%-88% of the articles in ScienceDirect (varying from discipline to discipline) are available in its Scirus version, so the 7.8 million records for articles alone seems to be a realistic figure, and a huge one, for full-text searching. That in turn offers efficient resource discovery even about the most obscure topics. The well-tagged cited references have great potential for powerful subject searching.
The content of the items depend on the status of the user. Guest users receive only the bibliographic citation and the abstract free. Abstracts are available for about 75% of the records. This percentage varies from the low 51% for my humanities test word, to 92% for my physics test word. The average ratio is a good one as book reviews, letters to the editor, yearly acknowledgements of referees (a decent bow to those who spend lots of time anonymously reviewing submitted articles), call for papers for special issues, and other editorial materials usually don’t have abstracts, certainly not substantial and informative abstracts in any databases.
Guest users must pay for the full-text article (which is in PDF format, reproducing the article as it was published). They don’t receive the variety of additional information that subscribers receive. This is common practice in all the publishers’ archives. It is also common practice that the full-text option is limited to PDF (as it is much easier to produce than a links-enhanced HTML file. Only-parts of the full text articles are available in both formats, and the span of dual format coverage varies from journal to journal. In most cases the HTML format seemed to be available from the 1990s onward, along with the many additional bonus options (for subscribers) described below.
The additional information may include the Summary Plus format, which offers (beyond the bibliographic data and abstract) the outline of the paper, as well as the tables, graphs and references. The Summary Plus format serves as a jumping off point to get directly to a specific section of the article, or to scan the illustrations and/or the references.
The appendixes do not appear in the Summary Plus format, but only in the PDF and the Full Text and Links format. The latter displays the list of cited references somewhat differently, adding a link to the author name and year qualifier, which in turn allows the users to jump in the text where the citation is invoked, and from the text to the citation – one of the many great advantages of the HTML version of the articles.
The cited references section includes a number of link types. Where they take the user and what they show depends on what the user’s library has a subscription for – just as when the search results are displayed. Obviously, the result list format is much more simple as all the items must be from the ScienceDirect archive. In the cited references section, however, the links also show a variety of external resources and the labeling of those links varies, displaying the richness of ScienceDirect. It does not reflect the richness of the searchers’ library, as the links may lead to only a bibliographic record of the citing article, or to its open access abstract, or to its open access full text, or to its subscribers-only version.
The links labeled Full Text+Links take the user to the article in an Elsevier journal that was cited by the article at hand. If the subscription of the user’s library covers that year of the journal then the abstract and the full text both will be offered, however, if the library’s subscription to the journal does not extend back to that year only the abstract will be displayed. Logically, this is the same principle used also when displaying the direct search results.
In case of links labeled Full Text via CrossRef the same applies, with some variations depending on the publisher of the target (cited) journal. For example, the first cited reference in our sample refers to an article in the IEEE Digital Library. If the user’s library has a subscription covering that year, the user will be shown –in addition to the abstract and bibliographic data- the PDF document, the list of controlled and uncontrolled indexing term, the list of cited references and the list of citing references.
If there is no subscription, or it does not cover the year the cited article was published, then only the abstract is displayed, and the user is offered to buy the article, and is reminded that the index terms, and the list of cited and citing references are available only for subscribers.
A few publishers are generous, and show not only the abstract and the bibliographic data elements, but also the classification codes, subject headings, descriptors, and even more generous ones also the list of cited and citing references. Wiley shows the index terms, and ACM shows also the list of cited and citing references along with many other value added pieces of information for the cited paper published in an ACM publication. This is an outstanding and rare feature of ACM.
The two types of links to Scopus work only if the user’s library has a subscription to Scopus. While there is no bonus service offered by Scopus the links of cited and citing references can be very useful leading the users to the many other topically related articles. In the sample link there are no cited references, but there are 13 citing references. Cited references are available in Scopus from 1996 onward.
The final content element is labeled in the sidebar of the record simply as Cited by. This displays a list of articles published in journals covered by ScienceDirect which cite the article at hand. Although in the detailed list there is a clear note about the domain of citing journals it would be more obvious when looking at the article level record if it were labeled as Cited by in ScienceDirect. This takes me to the software issues.
The PR material for the new edition emphasized the newly introduced software features and the makeover of the interface. Of course, there are also content enhancements, but indeed, these would not come through so swiftly without the new software. I liked the previous version, too, but the new one is – for lack of a better word –breezier, and makes users more comfortable. It is like the difference between my black suit (which I hurriedly took to the conference) and my tropical suit (which I should have taken considering the humidity of the city, the exhibit area, and the conference room where I sweated delivering my 3-hour seminar. Not as much because of the content (State of the art in citation-based searching), but because of the under-air-conditioned venue.
The new interface makes the navigation and browsing easier. I especially like that a list of the recent search actions (up to 100 queries) shows up not only when I get back to the home page but also when I log in again, i.e. the list is saved beyond the session. This would not be a big deal as you can save your query in several other online systems but it is, because it is done automatically, and can be turned off by the click of a button. Not only the queries are saved but also the other recent actions, such as the display of articles. How often we have to repeat the same query over and over again to get back to an article, which we cursorily looked at, and later in the search it turns out to remain one of the most promising article to be retrieved, and fully read. The same is true for checking the query strategy after sleeping on it. The new software makes these tasks very easy.
There are other software “accessories” that make it fleeting to navigate and jump back and forth among the many kinds of links and services, including the Quick Search box which remains permanently displayed during the session. Many of these are demonstrated in the new FLASH tutorials and the also new interactive and multilingual help files which even users of the iPod Generation may find cool enough for using, so let me focus on a few issues which are closest to my heart, and are of great importance.
It would be nice to allow searching of the full text of the archive by non-subscribers. This would not mean giving away the family jewels for two reasons. One is that about 85% of the ScienceDirect archive is already full-text searchable coming through the legitimate backdoor of Scirus by anyone. It is another question that in the ever broadening target base of Scirus, ScienceDirect does not show up by default, (although by default it is searched) but only when the list is extended. The second reason is that subscribers have plenty of other reserved features, so offering full text searching for any ScienceDirect users would not make subscribers feel as business class passengers when coach class passengers are bumped up for free.
It would be of great help if the asterisk symbol would be used for unlimited truncation (as it is in most Web-born systems), instead of the exclamation mark. The query librar* will retrieve only library (a single character after the root term), but not libraries or librarian, or librarians or librarianship. From my teaching experience it is clear that most users don’t realize this limitation.
The number of times an article has been cited in ScirusDirect should be displayed already in the short result list (and without any action by the user). This is the way Scopus, WoS and CSA do it. It would immediately give a sense about the potential importance of the article among the dozens or hundreds of articles retrieved by the query. The users must click the Cited By link from the detailed record to see that informative (or potentially informative) value. Currently, however, there is a restrictive approach and a potentially serious (although not necessarily prevalent) glitch that test-users apparently may not have noticed before the release. Indeed, identifying and counting the citing references is a complex issue, but it is a very precious piece of information (which until recently has been done only by the ISI Citation Index databases) , so deserves some explanation.
In ScienceDirect Elsevier seems to use a restrictive approach in counting the citing references. Many times the endnotes in an article are split between References and Bibliography items. Intellectually and academically I understand the differences, but it adds further confusion that most (but not all) of the items appear both under the References and the Bibliography section. If an item appears only in the Bibliography, ScienceDirect does not increase its citedness count, presumably not considering it to be a cited reference in the strict sense.
The Scopus database of Elsevier, however, uses the item list in the Bibliography (but not the one listed under References) and counts such appearance for its citedness score, and so does WoS. I prefer this approach even though I mourn the loss of those items which are cited in the strictest sense, and are included in the References section but do not appear in the Bibliography, therefore their citedness count is not increased in this case. Before turning to the software problem let me make a small detour.
The hundreds of different citation styles, contents and formats have been the bane of scholarly publishing for a long time. The shibboleth of peppering the reference list with the op. cit and Ibid entries in many citation styles did nothing for the appalling inaccuracies and inconsistencies in author names, journal titles, chronological and numerical designations as listed by us, the authors, who were too busy following the formatting requirements of the citation style of the journal, especially when the citations had to be re-formatted for re-submissions to a second or third journals. Now there are the busy-bodying qualifiers about the date and the server through which the web versions of the cited articles were accessed, all the while the author’s name is missing from the reference, and reference entries are missing from the Bibliography without rhyme and reason. These extra pieces of information are not conducive to verification by the readers, but crowd the already enigmatic references.
Faculty members up for tenure and/or promotion have endured (had to endure) the “insolence of (editorial) office and the spurns” and followed the style at any cost. They will have no choice but to include the time of the day (+/- GMT) and note the daylight saving time in effect during the access, if so required by a journal’s new citation style. All these make citation matching more and more difficult.
Computer programs (and programmers), however, are driven crazy by the Tower of Babel in citation languages, because they cripple even the good matching algorithms which make use of the field tags, subfield identifiers, and various marker codes to tell apart a 4-digit page number from a publication year, and don’t mistake section headings for authors’ last name - (unlike Google Scholar ) which has been profusely producing absurd citedness scores and phantom citations from day one, and never bothered to correct the matching algorithm. Unfortunately, not only shallow journalists but also librarians and scientists believe the citedness scores of Google Scholar and dispense misleading comparison statistics.
I brought up this issue because it gave me a Maalox moment when I saw a few phantom citations in ScienceDirect which otherwise seems to have the conservative, “better to be safe than sorry” policy in matching citing/cited items.
Just as I was spot-checking the citing references for some items in very limited subject areas where potential cuckoo’s eggs can raise my suspicion by sight, I found this list of citing references for an article about “ Subject indexing and citation indexing” . The first citing item about “ Thermal and mechanical properties of in situ polymerized PS/EPDM blends” seemed to be very unlikely to cite the article at hand, and indeed it turned out to be a phantom citation. The other three were accurate ones.
I would have not lost sleep over one phantom citation attributed to an in-press article, but in the second part of that article which was purportedly cited by 16 papers, the first four citing papers seemed to be phantom citations, then the next two, and then the12th item. Apart from having cited references to articles written by authors with the same rather common name (Shaw), I found no clue how could these be mistaken for references citing the article at hand, and I saw no telling patterns.
I was already late with submitting my review, so I could not explore this syndrome more systematically, neither to ask my contacts at Elsevier to investigate it, but it is a serious issue, likely to be a problem in the algorithm, not merely caused by a typo here and there. This must get top priority before any other enhancements, and before neophytes in citation analysis start publishing their findings without first checking.
I trust that the problem will be fixed because Elsevier has always taken seriously (if not necessarily with pleasure) my criticism, and it proved by the re-design and functional enhancements of ScienceDirect that it strives to make the search process swifter and smarter, and through the superb citation matching algorithm of Scopus that it knows how to recognize the correctly as well as the incompletely and/or sloppily cited articles without missing citing references or producing phantom ones.— Péter Jacsó