본문 바로가기
App Programming

REST API (Webscraping) (3)

by goatlab 2022. 5. 11.
728x90
반응형
SMALL

Webscraping

 

1.

from bs4 import BeautifulSoup

html="<!DOCTYPE html> <html> <head> <title>Page 
Title</title> </head> <body> <h3><b id='boldest'> Lionel 
Messi</b></h3> <p> Salary: $ 100,000,000 </p> <h3> Christiano 
Ronaldo</h3> <p> Salary: $ 150,000,000 </p> <h3> Neymar Junior</h3> <p> 
Saraly: $ 85,000,000</p> </body> </html>"

soup = BeautifulSoup(html, 'html5lib')
tag_object=soup.title
tag_object=soup.h3
tag_child = tag_object.b
parent_tag=tag_child.parent

sibling_1 = tag_object.next_sibling

tag_child.attr
tag_child.string

 

2.

from bs4 import BeautifulSoup

html = "<table> <tr> <td>Pizza Place</td> <td>Orders</td> 
<td>Slices</td> </tr> <tr> <td>Domino Pizza</td> 
<td>10</td> <td>100</td> </tr> <tr> <td>Pizza 
hut</td> <td>12</td> <td>144</td> </tr> </table>"

table = BeautifulSoup(html, 'html5lib')
table_row = table.find_all(name='tr')
[<tr> <td>Pizza Place</td> <td>Orders</td> <td>Slices</td> </tr>,
<tr> <td>Domino Pizza</td> <td>10</td> <td>100</td> </tr>,
<tr> <td>Pizza hut</td> <td>12</td> <td>144</td> </tr>]
first_row = table_row[0]
<tr> <td>Pizza Place</td> <td>Orders</td> <td>Slices</td> </tr>
for i, row in enumerate(table_rows):
	print("row", i)
	cells=row.find_all("td")

	for j, cell in enumerate(cells):
		print("column", j, "cell", cell)

 

3.

import requests
from bs4 import BeautifulSoup

page = requests.get("http://EnterWebsiteURL...).text

# creates a BeautifulSoup object
soup = BeautifulSoup(page, "html.parser")

# pulls all instance of <a> tag
artists = soup.find_all('a')

# clears data of all tags
for artist in artists:
	names = artist.contents[0]
	fullLink = artist.get('hred')
	print(names)
	print(fullLink)
728x90
반응형
LIST

'App Programming' 카테고리의 다른 글

[Metaverse] ZEP (젭)  (0) 2022.11.08
Swift  (0) 2022.06.29
REST API (HTTP Requests / POST) (2)  (0) 2022.05.11
REST API (1)  (0) 2022.05.11
python flask error :[Errno 48] Address already in use  (0) 2022.01.18