Splitting Beatifulsoup4's findAll() of HTML into list of lines in Python -

- March 15, 2015

i've begun learning python, there project had in mind suspected possible using method of web-crawling through python.

i have been using tutorial series, , aware findall() appears looked down upon being primitive (i'm unsure why, , i'll gladly learn better alternatives, simple @ same time - started python yesterday).

right now, have extremely project visits specified website , grabs code. however, want implement if statement - find if line present.

(where using soup.findall('a', {'href': '/login'}) , soup = beautifulsoup(requests.get(url).text)) - attempts, trying things like: if '/login' in soup have failed, , not sure how implement if statement find single word, or line, in html found.

if aware of simpler methods use here i'd grateful, solution identified have html split lines and/or in array, , use if <the entire line> in soup:.

i think want is

if soup.findall('a', {'href': '/login'}):...

findall returns empty list ([]) if element isn't found, , python evaluates empty lists false in if-statements.

for example, here trivial example:

>>> soup = bs('<html></html>') >>> assert soup.findall('html') >>> assert soup.findall('login') traceback (most recent call last):   file "<stdin>", line 1, in <module> assertionerror

Search This Blog

Add

Splitting Beatifulsoup4's findAll() of HTML into list of lines in Python -

Comments

Post a Comment

Popular posts from this blog

c++ - QTextObjectInterface with Qml TextEdit (QQuickTextEdit) -

xcode - Swift Playground - Files are not readable -

jboss7.x - JBoss AS 7.3 vs 7.4 and differences -