1

I have html that looks like this

    <tr><td align="center" class="listas" colspan="0">
<div id="b8b7c9523733e026bf89de9b3cf8f73811ddb579" style="display: none;">
<table border="0" cellpadding="2" cellspacing="2" width="100%">
<tbody><tr><td align="left" class="lista" width="100%"><table align="left" border="0" cellpadding="0" cellspacing="0" width="90%"><tbody><tr><td style="background-image: url(./style/classicx/line_a.gif); background-repeat: no-repeat; background-position:top right; width=60px; height:4px;" width="60"></td><td style="background-image:url(./style/classicx/line_b.gif); background-repeat: repeat-x; height:4px;" width="75%"></td><td style="background-image: url(./style/classicx/line_c.gif); background-repeat: no-repeat; height:4px;" width="60"></td></tr></tbody></table></td></tr>
<tr><td align="left" class="lista"><table align="left" border="0" cellpadding="0" cellspacing="0" width="70%"><tbody><tr><td align="left" class="lista">
<b>Options: </b></td>
<td align="left" class="lista" title="Download: The Wolverine 2013 Theatrical Cut 1080p Blu-ray AVC DTS-HD MA 7.1-o0o"><table align="left" border="0" cellpadding="0" cellspacing="0" onclick="window.open('download.php?id=b8b7c9523733e026bf89de9b3cf8f73811ddb579&amp;f=The+Wolverine+2013+Theatrical+Cut+1080p+Blu-ray+AVC+DTS-HD+MA+7.1-o0o.torrent','_self')" style="cursor:pointer; cursor:hand;"><tbody><tr><td align="center" style="background-image: url(images/download.gif); background-repeat: no-repeat; width:17px; height:17px;"></td><td> Download</td></tr></tbody></table></td>
<td align="left" class="lista" title="Details for: The Wolverine 2013 Theatrical Cut 1080p Blu-ray AVC DTS-HD MA 7.1-o0o"><table align="left" border="0" cellpadding="0" cellspacing="0" onclick="window.open('details.php?id=b8b7c9523733e026bf89de9b3cf8f73811ddb579&amp;hit=1','_self')" style="cursor:pointer; cursor:hand;"><tbody><tr><td align="center" style="background-image: url(images/torrent_name.gif); background-repeat: no-repeat; width:17px; height:17px;"></td><td> Details</td></tr></tbody></table></td>
<td align="left" class="lista" title="Add to WishList: The Wolverine 2013 Theatrical Cut 1080p Blu-ray AVC DTS-HD MA 7.1-o0o"><table align="left" border="0" cellpadding="0" cellspacing="0" onclick="window.open('wishlist.php?do=add&amp;torrent_id=b8b7c9523733e026bf89de9b3cf8f73811ddb579','_self')" style="cursor:pointer; cursor:hand;"><tbody><tr><td align="center" style="background-image: url(images/wishlist.gif); background-repeat: no-repeat; width:17px; height:17px;"></td><td> Add to WishList</td></tr></tbody></table></td>
<td align="left" class="lista" title="Report: The Wolverine 2013 Theatrical Cut 1080p Blu-ray AVC DTS-HD MA 7.1-o0o"><table align="left" border="0" cellpadding="0" cellspacing="0" onclick="window.open('report.php?torrent=b8b7c9523733e026bf89de9b3cf8f73811ddb579','_self')" style="cursor:pointer; cursor:hand;"><tbody><tr><td align="center" style="background-image: url(images/report.gif); background-repeat: no-repeat; width:16px; height:17px;"></td><td> Report</td></tr></tbody></table></td>
    <td align="left" class="lista" style="white-space:nowrap"><table align="left" border="0" cellpadding="0" cellspacing="0"><tbody><tr><td align="center" style="background-image: url(images/torrent_comments.gif); background-repeat: no-repeat; width:17px; height:17px;"></td><td> Comments (<b><span style="color:#006699">0</span></b>)</td></tr></tbody></table></td>
</tr></tbody></table></td>
</tr><tr><td align="left" class="lista" width="100%"><table align="left" border="0" cellpadding="0" cellspacing="0" width="60%"><tbody><tr><td style="background-image: url(./style/classicx/line_a.gif); background-repeat: no-repeat; background-position:top right; width=60px; height:4px;" width="60"></td><td style="background-image:url(./style/classicx/line_b.gif); background-repeat: repeat-x; height:4px;" width="75%"></td><td style="background-image: url(./style/classicx/line_c.gif); background-repeat: no-repeat; height:4px;" width="60"></td></tr></tbody></table></td></tr>
<tr><td align="left" class="lista"><b>Technical Info:</b></td></tr>
<tr><td align="left" class="lista"><table border="0" cellpadding="0" cellspacing="0" width="60%"><tbody><tr><td></td></tr></tbody></table></td></tr>
<tr><td align="left" class="lista" width="100%"><table align="left" border="0" cellpadding="0" cellspacing="0" width="90%"><tbody><tr><td style="background-image: url(./style/classicx/line_a.gif); background-repeat: no-repeat; background-position:top right; width=60px; height:4px;" width="60"></td><td style="background-image:url(./style/classicx/line_b.gif); background-repeat: repeat-x; height:4px;" width="75%"></td><td style="background-image: url(./style/classicx/line_c.gif); background-repeat: no-repeat; height:4px;" width="60"></td></tr></tbody></table></td></tr>
</tbody></table></div></td></tr>
<tr>
    <td align="center" class="mainblockcontent" max-width="25px"><a href="torrents.php?category=5"><img alt="Movie/1080p/i" border="0" src="images/categories/MOVIES-1080PI.png"/></a></td> <td align="left" class="mainblockcontent"><b><a href="details.php?id=948499d24362fc5d1da0bc83cc0afe2dd2d5bf55" onmouseout="return nd();" onmouseover="if(popup_mode){ overlib(' &lt;img src=\'cache/imdb/images/1430132.jpg\' width=\'200\' ', CAPTION, '');}">The Wolverine 2013 EXTENDED 1080p BluRay DTS-ES x264-PublicHD</a></b>                                        <br/><span style="color: #999999 ">Action, Adventure, Fantasy, Sci-Fi</span>  <span style="color:DarkSlateGray "> <a href="http://www.imdb.com/title/tt1430132/" target="_blank"><u>IMDB: 6.9</u></a></span></td>  <td align="center" class="mainblockcontent" title="Comments"><a href="details.php?id=948499d24362fc5d1da0bc83cc0afe2dd2d5bf55#comments" title="View details: The Wolverine 2013 EXTENDED 1080p BluRay DTS-ES x264-PublicHD">5</a></td>  <td align="center" class="mainblockcontent" title="Download"><a href="download.php?id=948499d24362fc5d1da0bc83cc0afe2dd2d5bf55&amp;f=The+Wolverine+2013+EXTENDED+1080p+BluRay+DTS-ES+x264-PublicHD.torrent"><img alt="torrent" border="0" src="images/download.gif"/></a></td>
    <td align="center" class="mainblockcontent" title="Add to Wishlist"><a href="wishlist.php?do=add&amp;torrent_id=948499d24362fc5d1da0bc83cc0afe2dd2d5bf55"><img alt="torrent" border="0" src="images/add_wishlist_star.png"/></a></td>
    <td align="center" class="mainblockcontent">09:03:59 <bthu, +0200="" 07="" 09:03:59="" 2013="" nov=""><bthu, +0200="" 07="" 09:03:59="" 2013="" nov=""> 07/11/2013</bthu,></bthu,></td>
    <td align="center" class="mainblockcontent">12.53 GB</td>
    <td align="center" class="mainblockcontent">Anonymous</td>
    <td align="center" class="green"><a href="peers.php?id=948499d24362fc5d1da0bc83cc0afe2dd2d5bf55" title="Click here to view peers details"><b>416</b></a></td>
    <td align="center" class="green"><a href="peers.php?id=948499d24362fc5d1da0bc83cc0afe2dd2d5bf55" title="Click here to view peers details"><b>3</b></a></td>
    <td align="center" class="mainblockcontent"><a href="torrent_history.php?id=948499d24362fc5d1da0bc83cc0afe2dd2d5bf55" title="History - The Wolverine 2013 EXTENDED 1080p BluRay DTS-ES x264-PublicHD">1,251</a></td></tr>

If it is displaying more then 1 item it repeats until the last item. From what i understand i need somthing like

result_table = BeautifulSoup(data)
entries = result_table.find_all('td', attrs = {'align' : 'center', 'class' : 'listas'})

for result in entries:

This works but only gets the first block, how can i adjust the code so that it also gets the second block?

5
  • I think you missed some tags to paste a working example to test. Commented Nov 17, 2013 at 13:22
  • find_all should do what you want. Perhaps the second one is not identical to the first as you think. Secondly - your loop might have a problem, are you sure you loop over the entries correctly? Try to include the code you've written so far, example input, the expected output, and the output you actually get (console output, stack traces, compiler errors - whatever is applicable). Commented Nov 17, 2013 at 13:24
  • The td tag ends before the second block so it only captures the first block. the next block is a tr block so i thought somthing like entries = result_table.find_all('td', attrs = {'align' : 'center', 'class' : 'listas'}) + result_table.find('tr') Commented Nov 17, 2013 at 14:00
  • So, do you want to extract the content of <td align="center" class="listas"> inside a <tr> and the next sibling of that <tr>? Commented Nov 17, 2013 at 15:17
  • yes, then the 'for result in entries' should leave result containing all the code i need to process. Commented Nov 17, 2013 at 15:44

1 Answer 1

1

To find the first block your code was correct, but as only exists one <td> element with those attributes, a find is enought:

block1 = soup.find('td', attrs={'align' : 'center', 'class' : 'listas'})

To find the second block, from the first one search its <tr> parent and then the next sibling:

block2 = block1.find_parent('tr').find_next_sibling('tr')

EDIT to find all items (not tested):

entries = result_table.find_all('td', attrs={'align' : 'center', 'class' : 'listas'})
for result in entries:
    block2 = result.find_parent('tr').find_next_sibling('tr')
Sign up to request clarification or add additional context in comments.

3 Comments

Thank for that code as it does do some of what i need apart from it appears to only grab the first item, as i said above, there could be multiple items like the html above, thats why i thought i needed a find_all.
@user1620852: Then use a find_all and iterate over the results.
It turned out that the first item was formatted differently then the rest, once that item was processed your code worked great at processing the remainder. I had to add in 'if not block2:' as after the last item block2 was empty. Thanks again for your help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.