Create a dictionary of authors and author attributes and values for a journal article using the Scopus API and Python

As an exercise to brush up my Python skills, I decided to tinker around with the Scopus API. Scopus is an online database maintained by Elsevier that records and provides access to information about peer reviewed publications. Not only does Scopus let users search for journal articles based on key words and various other criteria, but the web services also allows users to explore these articles as networks of articles, authors, institutions, and so forth. If you’re interested in risk factors that lead to scholarly publications, publication citations, or impact factors, this is a place to start.

The following code yields a dictionary of author information by requesting content through the abstract retrieval API. This request is made using the Python package requests and parsed using the package BeautifulSoup. Enjoy!

#### Import python packages
import requests
from bs4 import BeautifulSoup


#### Set API key
my_api_key = 'YoUr_ApI_kEy'


#### Abstract retrieval API
# API documentation at http://api.elsevier.com/documentation/AbstractRetrievalAPI.wadl
# Get article info using unique article ID
eid = '2-s2.0-84899659621'
url = 'http://api.elsevier.com/content/abstract/eid/' + eid

header = {'Accept' : 'application/xml',
          'X-ELS-APIKey' : my_api_key}

resp = requests.get(url, headers=header)

print 'API Response code:', resp.status_code # resp.status_code != 200 i.e. API response error

# Write response to file
#with open(eid, 'w') as f:
#    f.write(resp.text.encode('utf-8'))

soup = BeautifulSoup(resp.content.decode('utf-8','ignore'), 'lxml')

soup_author_groups = soup.find_all('author-group')

print 'Number author groups:', len(soup_author_groups)

author_dict = {}

# Traverse author groups
for i in soup_author_groups:

    # Traverse authors within author groups
    for j in i.find_all('author'):

        author_dict.update({j.attrs['auid']:j.attrs}) # Return dictionary of attributes
      
        j.contents.pop(-1) # Pop dicitonary of attributes
 
        # Traverse author contents within author
        for k in j.contents:

            author_dict[j.attrs['auid']].update({k.name : k.contents[0]})
            
print author_list
Advertisements