use-of-beautiful-soup-in-python.png
Blogger_637082614977503781.jpg
By: Noushid Khan

Use of BeautifulSoup in Python

Technical

In python, BeautifulSoup is used for operating with HTML queries and XML queries. It helps to take HTML and XML codes is based on tags. Tags can take on the basis of id and class also this will get as an object here we can do several operations.

* To parse a document it can be open as a file or given as a string

#Code

from bs4 import BeautifulSoup
           with open("index.html") as fp:
           soup = BeautifulSoup(fp)
           soup = BeautifulSoup("<html>data</html>")

Operations:


If we need to find any tag in the HTML and XML,

#Code

from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_code,  'html.parser')
print(soup.find(‘tag_to_find’))

#Example
html_cont = “<div>
<p>HTML FILE</p>
<img>Image</image>
      <p>END</p>
</div>”
soup = BeautifulSoup(html_cont,  'html.parser')
print(soup.find(‘p’))

#Output

	<p>HTML FILE</p>

In the above method, it will find the first matched tag in the HTML code.
If you want to find all the matched tags, you need to call the find_all method.
It returns as a list of matched tags.

#Example
html_cont = “<div>
<p>HTML FILE</p>
<img>Image</image>
      <p>END</p>
</div>”
soup = BeautifulSoup(html_cont,  'html.parser')
print(soup.find_all(‘p’))

#Output

	 [<p>HTML FILE</p>, <p>END</p>]

A tag may have any number of attributes, we can access attributes treating it has a dictionary.

#Code

from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_code,  'html.parser')
tag = soup.find(‘tag_to_find’)
print(tag[‘id’])

#Example
html_cont = “<div>
<p id=”boldest”>HTML FILE</p>
<img>Image</image>
     <p>END</p>
</div>”
soup = BeautifulSoup(html_cont,  'html.parser')
tag = soup.find(‘p’)
print(tag[‘id’])

#Output

	boldest

If you want to call all the attributes of the specified tag.

#Code

from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_code,  'html.parser')
tag = soup.find(‘tag_to_find’)
print(tag.attrs)

#Example
html_cont = “<div>
<p id=”boldest” class=”bold-class”>HTML FILE</p>
<img>Image</image>
      <p>END</p>
</div>”
soup = BeautifulSoup(html_cont,  'html.parser')
tag = soup.find(‘p’)
print(tag.attrs)

#Output

	 {id: ”boldest”, class: ”bold-class”}

If you want, add a new attribute to the tag.

#Code

from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_code,  'html.parser')
tag = soup.find(‘tag_to_find’)
tag[‘your_attribute’’] = “attribute_value”

#Example
html_cont = “<div>
<p id=”boldest” class=”bold-class”>HTML FILE</p>
<img>Image</image>
      <p>END</p>
</div>”
soup = BeautifulSoup(html_cont,  'html.parser')
tag = soup.find(‘p’)
tag[‘new_attribute’] = “1”
      print(tag)

#Output

	<p id=”boldest” class=”bold-class” new_attribute = “1”>HTML FILE</p>

If you want to remove any attribute from tag it can be done by in the below method, by using it we can delete attributes int the specific tag.

#Code

from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_code,  'html.parser')
tag = soup.find(‘tag_to_find’)
del tag[‘your_attribute’’] 

#Example
html_cont = “<div>
<p id=”boldest” class=”bold-class”>HTML FILE</p>
<img>Image</image>
      <p>END</p>
</div>”
soup = BeautifulSoup(html_cont,  'html.parser')
tag = soup.find(‘p’)
del tag[‘id’] 
      print(tag)

#Output

	<p class=”bold-class”>HTML FILE</p>

You can find the tags with only not its tag name and we can also find the tags with id and class.

#Code

from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_code,  'html.parser')
print(soup.find(id=‘id_to_find’, class_=’class_to_find’))

#Example
html_cont = “<div>
<p class=”p_class” >HTML FILE</p>
<img id=”img”>Image</image>
      <p>END</p>
</div>”
soup = BeautifulSoup(html_cont,  'html.parser')
print(soup.find(id=’img’))
     print(soup.find(class_= “p_class”))

#Output

<img id=”img”>Image</image>
<p class=”p_class” >HTML FILE</p>

Sometimes we need to see all the texts in the code it can be easily done by using Beautifulsoup. In the below method explains how to get all texts in the code.

#Code

from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_code,  'html.parser')
print(soup.get_text())
#Example
html_cont = “<div>
<p>HTML FILE</p>
<p></p>
<img>Image</image>
     <p>END</p>
</div>”
soup = BeautifulSoup(html_cont,  'html.parser')
print(soup.get_text())

#Output

HTML FILE
Image
      END

If need to change the tag name in an HTML

 #Code

from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_code,  'html.parser')
tag = soup.p
tag.name = ‘h1’ 
#Example
html_cont = “<div>
<p>HTML FILE</p>
<p></p>
<img>Image</image>
     <p>END</p>
</div>”
tag = soup.p
      tag.name = ‘h1’ 
print(tag)

#Output
	<h1>HTML FILE</h1>

We can take texts in a tag using the below method

#Code

from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_code,  'html.parser')
print(soup.find(‘p’).string)
#Example
html_cont = “<div>
<p>HTML FILE</p>
<p></p>
<img>Image</image>
     <p>END</p>
</div>”
print(soup.find(‘p’).string)

#Output
	HTML FILE
 
If we need to change any string of the tag, we can’t edit string on its place. It can be replaced with the other string.

#Code

from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_code,  'html.parser')
print(soup.find(‘p’).replace_with(‘new_string’))
#Example
html_cont = “<div>
<p>HTML FILE</p>
<p></p>
<img>Image</image>
<p>END</p>
</div>”
soup.find(‘p’).replace_with(‘edited file’)
print(soup.find(‘p’))

#Output

	<p>edited file</p>



cybrosys youtube

Comments

0


Leave a comment

 
Calicut

Cybrosys Technologies Pvt. Ltd.
Neospace, Kinfra Techno Park
Kakkancherry, Calicut
Kerala, India - 673635

London

Cybrosys Limited
Alpha House,
100 Borough High Street, London,
SE1 1LB, United Kingdom

Kochi

Cybrosys Technologies Pvt. Ltd.
1st Floor, Thapasya Building,
Infopark, Kakkanad,
Kochi, India - 682030.

Bangalore

Cybrosys Techno Solutions
The Estate, 8th Floor,
Dickenson Road,
Bangalore, India - 560042

Send Us A Message
 
 
 
 
Close
cybrosys