In python, BeautifulSoup is used for operating with HTML queries and XML queries. It helps to take HTML and XML codes is based on tags. Tags can take on the basis of id and class also this will get as an object here we can do several operations.
* To parse a document it can be open as a file or given as a string
#Code
from bs4 import BeautifulSoup
with open("index.html") as fp:
soup = BeautifulSoup(fp)
soup = BeautifulSoup("<html>data</html>")
Operations:
* If we need to find any tag in the HTML and XML,
#Code
from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_code, 'html.parser')
print(soup.find(ātag_to_findā))
#Example
html_cont = ā<div>
<p>HTML FILE</p>
<img>Image</image>
<p>END</p>
</div>ā
soup = BeautifulSoup(html_cont, 'html.parser')
print(soup.find(āpā))
#Output
<p>HTML FILE</p>
* In the above method, it will find the first matched tag in the HTML code.
* If you want to find all the matched tags, you need to call the find_all method.
* It returns as a list of matched tags.
#Example
html_cont = ā<div>
<p>HTML FILE</p>
<img>Image</image>
<p>END</p>
</div>ā
soup = BeautifulSoup(html_cont, 'html.parser')
print(soup.find_all(āpā))
#Output
[<p>HTML FILE</p>, <p>END</p>]
* A tag may have any number of attributes, we can access attributes treating it has a dictionary.
#Code
from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_code, 'html.parser')
tag = soup.find(ātag_to_findā)
print(tag[āidā])
#Example
html_cont = ā<div>
<p id=āboldestā>HTML FILE</p>
<img>Image</image>
<p>END</p>
</div>ā
soup = BeautifulSoup(html_cont, 'html.parser')
tag = soup.find(āpā)
print(tag[āidā])
#Output
boldest
* If you want to call all the attributes of the specified tag.
#Code
from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_code, 'html.parser')
tag = soup.find(ātag_to_findā)
print(tag.attrs)
#Example
html_cont = ā<div>
<p id=āboldestā class=ābold-classā>HTML FILE</p>
<img>Image</image>
<p>END</p>
</div>ā
soup = BeautifulSoup(html_cont, 'html.parser')
tag = soup.find(āpā)
print(tag.attrs)
#Output
{id: āboldestā, class: ābold-classā}
* If you want, add a new attribute to the tag.
#Code
from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_code, 'html.parser')
tag = soup.find(ātag_to_findā)
tag[āyour_attributeāā] = āattribute_valueā
#Example
html_cont = ā<div>
<p id=āboldestā class=ābold-classā>HTML FILE</p>
<img>Image</image>
<p>END</p>
</div>ā
soup = BeautifulSoup(html_cont, 'html.parser')
tag = soup.find(āpā)
tag[ānew_attributeā] = ā1ā
print(tag)
#Output
<p id=āboldestā class=ābold-classā new_attribute = ā1ā>HTML FILE</p>
* If you want to remove any attribute from tag it can be done by in the below method, by using it we can delete attributes int the specific tag.
#Code
from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_code, 'html.parser')
tag = soup.find(ātag_to_findā)
del tag[āyour_attributeāā]
#Example
html_cont = ā<div>
<p id=āboldestā class=ābold-classā>HTML FILE</p>
<img>Image</image>
<p>END</p>
</div>ā
soup = BeautifulSoup(html_cont, 'html.parser')
tag = soup.find(āpā)
del tag[āidā]
print(tag)
#Output
<p class=ābold-classā>HTML FILE</p>
* You can find the tags with only not its tag name and we can also find the tags with id and class.
#Code
from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_code, 'html.parser')
print(soup.find(id=āid_to_findā, class_=āclass_to_findā))
#Example
html_cont = ā<div>
<p class=āp_classā >HTML FILE</p>
<img id=āimgā>Image</image>
<p>END</p>
</div>ā
soup = BeautifulSoup(html_cont, 'html.parser')
print(soup.find(id=āimgā))
print(soup.find(class_= āp_classā))
#Output
<img id=āimgā>Image</image>
<p class=āp_classā >HTML FILE</p>
* Sometimes we need to see all the texts in the code it can be easily done by using Beautifulsoup. In the below method explains how to get all texts in the code.
#Code
from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_code, 'html.parser')
print(soup.get_text())
#Example
html_cont = ā<div>
<p>HTML FILE</p>
<p></p>
<img>Image</image>
<p>END</p>
</div>ā
soup = BeautifulSoup(html_cont, 'html.parser')
print(soup.get_text())
#Output
HTML FILE
Image
END
* If need to change the tag name in an HTML
#Code
from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_code, 'html.parser')
tag = soup.p
tag.name = āh1ā
#Example
html_cont = ā<div>
<p>HTML FILE</p>
<p></p>
<img>Image</image>
<p>END</p>
</div>ā
tag = soup.p
tag.name = āh1ā
print(tag)
#Output
<h1>HTML FILE</h1>
* We can take texts in a tag using the below method
#Code
from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_code, 'html.parser')
print(soup.find(āpā).string)
#Example
html_cont = ā<div>
<p>HTML FILE</p>
<p></p>
<img>Image</image>
<p>END</p>
</div>ā
print(soup.find(āpā).string)
#Output
HTML FILE
* If we need to change any string of the tag, we canāt edit string on its place. It can be replaced with the other string.
#Code
from bs4 import BeautifulSoup
soup = BeautifulSoup(your_html_code, 'html.parser')
print(soup.find(āpā).replace_with(ānew_stringā))
#Example
html_cont = ā<div>
<p>HTML FILE</p>
<p></p>
<img>Image</image>
<p>END</p>
</div>ā
soup.find(āpā).replace_with(āedited fileā)
print(soup.find(āpā))
#Output
<p>edited file</p>