1

I'm creating python project which goal is to extract some data from estate portal. I work in python and I use selenium package. To find elements I use Xpath's .

Generally every works fine but when i try to extract text of span i encounter a problem.

span's html:

<span class="some-class">
    <svg width="1em" height="1em" viewBox="0 0 24 24" xmlns="http://www.ty.org/1000/svg"  class="other-some-class">
        <path d="some-path" fill="currentColor" fill-rule="evenodd">
        </path>
    </svg> 
text to scrap
</span>

I extract this span using xpath .

my_obj = i.find_element(By.XPATH, './div/div/div[2]/div[3]/div/span'

I think it is correct because it returns selenium object and when i try to get class attribute using:

print('my_obj',my_obj.get_attribute('class'))

it returns correct class some-class

My problem is that's i cannot extract text of this span. I mean text to scrap.

I think i have tried everything .

my_obj.text
my_obj.get_attribute('innetText')
my_obj.get_attribute('textContent')
my_obj.get_attribute('innerHTML')

These obove doesnt't work.

Any Idea whats's I 'm doing wrong?

1 Answer 1

1

Given the HTML:

<span class="some-class">
    <svg width="1em" height="1em" viewBox="0 0 24 24" xmlns="http://www.ty.org/1000/svg"  class="other-some-class">
        <path d="some-path" fill="currentColor" fill-rule="evenodd">
        </path>
    </svg> 
    text to scrap
</span>

The text i.e. text to scrap is a within a Text Node and the lastChild of it's parent <p>. So to extract the desired text you can use either of the following locator strategies:

  • Using xpath, execute_script() and textContent:

    print(driver.execute_script('return arguments[0].lastChild.textContent;', driver.find_element(By.XPATH, "//span[@class="some-class"]")).strip())
    
  • Using xpath, get_attribute() and splitlines():

    print(driver.find_element(By.CSS_SELECTOR, "span.some-class").get_attribute("innerHTML").splitlines()[2])
    

Alternative

As an alternative you can also use Beautiful Soup as follows:

Code Block:

from bs4 import BeautifulSoup

html_text = '''
<span class="some-class">
    <svg width="1em" height="1em" viewBox="0 0 24 24" xmlns="http://www.ty.org/1000/svg"  class="other-some-class">
        <path d="some-path" fill="currentColor" fill-rule="evenodd">
        </path>
    </svg> 
    text to scrap
</span>
'''

soup = BeautifulSoup(html_text, 'html.parser')
last_text = soup.find("span", {"class": "some-class"}).contents[2]
print(last_text.strip())

Console Output:

text to scrap

Another Alternative

As another alternative you can also use lxml.etree as follows:

Code Block:

from lxml import etree

html_text = '''
<span class="some-class">
    <svg width="1em" height="1em" viewBox="0 0 24 24" xmlns="http://www.ty.org/1000/svg"  class="other-some-class">
        <path d="some-path" fill="currentColor" fill-rule="evenodd">
        </path>
    </svg> 
    text to scrap
</span>
'''
x = etree.HTML(html)
result = x.xpath('//span[@class="some-class"]/text()[2]') # get the text inside span
print(result[0].strip()) # since LXML return a list, you need to get the first one

Console Output:

text to scrap

References

You can find a couple of relevant detailed discussions in:

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.