0

Unable to extract field data from the web page, it is not a common web scraping problem. It associated with the javascript as well. I tried with python-requests as well, but unable to solve the problem.

I am trying to extract doi from the webpage. The doi is lying within the javascript. I am able to read the page and the code works up to{print(soup)}. When I am trying to extract the doi value ( in the given code, for the example webpage the doi is as follow: "doi":"10.1109/LAWP.2014.2364296" ) I wanted to print "10.1109/LAWP.2014.2364296" which is extracted from the webpage.

import urllib
from bs4 import BeautifulSoup
web_page = 'https://ieeexplore.ieee.org/document/6933872'
page = urllib.request.urlopen(web_page)
soup = BeautifulSoup(page, 'html.parser')        
print(soup)
soup.body.findAll(text='doi')

When using webpage "https://ieeexplore.ieee.org/document/6933872" the output is 10.1109/LAWP.2014.2364296. How I can?

7
  • Check out html.python-requests.org it has full javascript support Commented Feb 9, 2019 at 0:34
  • Possible duplicate of Web-scraping JavaScript page with Python Commented Feb 9, 2019 at 0:35
  • I go through the [link]( stackoverflow.com/questions/8049520/…) but it is different. the doi is different for each paper, and only extract that value Commented Feb 9, 2019 at 0:38
  • I wii check html.python-requests.org Commented Feb 9, 2019 at 0:39
  • When executing the line r.html.render(), it create the error. Any other way? Commented Feb 9, 2019 at 0:52

1 Answer 1

1

A possible solution that just skips over the Javascript web scraping issue is to use the IEEE API (https://developer.ieee.org/ ). While they do require registration and approval to get an API key, once you have it it will be much easier to send in a bunch of IEEE article numbers and get back their DOIs and other metadata in a structured way.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.