python-urllib.parse

2020/06/29 15:00
Reading number 68

preface

In the process of writing interface automation test cases recently, the editor needs to replace some parameters in the get request url with preset data, and replace the timeliness auth in the url with the return value of the auth generation method. After some research, we finally selected the parse module of python's urllib library.

The urllib.parse module provides a series of functions for manipulating URLs and their components. These functions are used for splitting or assembling.

Introduction to urllib.parse function

analysis:

1.ulrparse()

The return value of the function is a ParseResult object, which is similar to the tuple containing six elements.

 urllib_parse_urlparse.py from urllib.parse import urlparse url = ' http://test.dis.e.sogou/adlist?offset=0&auth=69CF80EA062863279B72612FA5443B6F&requestId=0025500016111592878436805&count=1&network=1 ' parsed = urlparse(url) print(parsed)

The six parts of the URL address that can be obtained by tuple indexing are: scheme, network location, path, path segment parameter (separated from the path by semicolon), query string and fragment.

 python3 urllib_parse_urlparse.py ParseResult(scheme='http',  netloc='test.dis.e.sogou', path='/adlist', params='', query='offset=0&auth=69CF80EA062863279B72612FA5443B6F&requestId=0025500016111592878436805&count=5&model=2&terminal=3&network=1', fragment='')

2.urlsplit()

The urlsplit() function can be used as an alternative to urlparse(), but it does not split the parameters in the URL.

Inverse analysis:

1.geturl ()

There is more than one way to get a complete URL string by reassembling the parts of the split URL. The parsed URL object has a geturl () method.
 urllib_parse_geturl.py
from urllib.parse import urlparse
original = ' http://test.dis.e.sogou/adlist?offset=0&auth=69CF80EA062863279B72612FA5443B6F&requestId=0025500016111592878436805&count=5&model=2&terminal=3&network=1 '  print('ORIG  :', original)  parsed = parse.urlparse(original)  print('PARSED:', parsed.geturl())
 $ python3 urllib_parse_geturl.py ORIG  :  http://test.dis.e.sogou/adlist?offset=0&auth=69CF80EA062863279B72612FA5443B6F&requestId=0025500016111592878436805&count=5&model=2&terminal=3&network=1 PARSED:  http://test.dis.e.sogou/adlist?offset=0&auth=69CF80EA062863279B72612FA5443B6F&requestId=0025500016111592878436805&count=5&model=2&terminal=3&network=1

Geturl () is only valid for objects returned by urlparse() or urlsplit().

2.ulrunparse()

You can use urlunparse () to assemble a regular string tuple into a URL address.

Splicing:

1.urljoin()

In addition to the urlparse() function used to parse URLs, the urllib.parse module also contains the urljoin() function, which can be used to create absolute URLs from fragments of relative addresses.

 urllib_parse_urljoin.py
from urllib.parse import urljoin print(urljoin(' http://www.example.com/path/file.html ', 'anotherfile.html')) print(urljoin(' http://www.example.com/path/file.html ', '../ anotherfile.html'))


In this example, when splicing the second URL, the ("../") representing the relative path is taken into account.

 $ python3 urllib_parse_urljoin.py http://www.example.com/path/anotherfile.html http://www.example.com/anotherfile.html

Non relative paths are handled in the same way as os. path. join().

 urllib_parse_urljoin_with_path.py
print(urljoin(' http://www.example.com/path/ ', '/subpath/file.html')) print(urljoin(' http://www.example.com/path/ ', 'subpath/file.html'))

If the path to be spliced to the URL address starts with a slash (/), the URL address will be reset at the top level with that path. Otherwise, it is only added to the end of the URL path

 $ python3 urllib_parse_urljoin_with_path.py
http://www.example.com/subpath/file.html http://www.example.com/path/subpath/file.html

Code Query Parameters

1.ulrencode()

Query parameters must be encoded before adding URL addresses

 urllib_parse_urlencode.py
from urllib.parse import urlencode query_args = { 'q': 'query string', 'foo': 'bar', } encoded_args = urlencode(query_args) print('Encoded:', encoded_args)

The encoding process will replace some special characters, such as spaces, to ensure that the format of the query string passed to the server is standard.

 $ python3 urllib_parse_urlencode.py
Encoded: q=query+string&foo=bar

In the query string, you can set doseq to True when calling urlencode() in order to make each of a sequence of variable values appear in a separate way.

2.parse_qs()

The result returned by parse_qs() is a dictionary. Each item in the dictionary is a list of query names and their corresponding (one or more) values, while parse_qsl() returns a list of tuples. Each tuple is a pair of query names and query values

 $ python3 urllib_parse_parse_qs.py
parse_qs : {'foo': ['foo1', 'foo2']} parse_qsl: [('foo', 'foo1'), ('foo', 'foo2')]


The use of ulllib.parse in the framework

 test_dippatcher_adlist.py
url= ' http://test.dis.e.sogou/adlist?offset=0&auth=69CF80EA062863279B72612FA5443B6F&requestId=0025500016111592878436805&count=3&model=ios&terminal=1&version=2&network=1 ' http: //test.dis.e.sogou/adlist? offset=0&auth=69CF80EA062863279B72612FA5443B6F&requestId=0025500016111592878436805&count=3&model=ios&terminal=1&version=2&network=1' #Get request_id and auth request_id = generate_requestId( expect [ 'platformId' ],  expect [ 'posId' ]) auth = generate_auth(request_id,  expect [ 'token' ]) #Modify the parameters in Url and replace request_id and auth #Analysis URL url_parsed = parse.urlparse(url) bits = list(url_parsed) qs = parse.parse_qs(bits[ four ]) #Replace interface input parameters in qs qs[ 'requestId' ] = request_id qs[ 'auth' ] = auth qs[ 'offset' ] =  expect [ 'offset' ] qs[ 'count' ] =  expect [ 'count' ] qs[ 'model' ] =  expect [ 'model' ] qs[ 'terminal' ] =  expect [ 'terminal' ] qs[ 'version' ] =  expect [ 'version' ] qs[ 'network' ] =  expect [ 'network' ] #Edit Query Parameters bits[ four ] = parse.urlencode(qs) #URL reverse resolution url_new = parse.urlunparse(bits) print(url_new)

For better understanding, output the results of each part.

 $ python3 test_dispatcher_adlist.py
bits: [ 'http' , 'test.dis.e.sogou' , '/adlist' , '' , "offset=0&auth=69CF80EA062863279B72612FA5443B6F&requestId=0025500016111592878436805&count=3&model=ios&terminal=1&version=2&network=1' http://test.dis.e.sogou/adlist?offset=0&auth=69CF80EA062863279B72612FA5443B6F&requestId=0025500016111592878436805&count=3&model=ios&terminal=1&version=2&network=1 " , '' ] qs: { 'offset' : [ '0' ], 'auth' : [ '69CF80EA062863279B72612FA5443B6F' , '69CF80EA062863279B72612FA5443B6F' ], 'requestId' : [ '0025500016111592878436805' , '0025500016111592878436805' ], 'count' : [ '3' , '3' ], 'model' : [ 'ios' , 'ios' ], 'terminal' : [ '1' , '1' ], 'version' : [ '2' , '2' ], 'network' : [ "1' http://test.dis.e.sogou/adlist?offset=0 " , '1' ]} bits[ four ]: offset= zero &auth= eight thousand two hundred and fifteen f55af287a62a29efe7a70fd3ba0d&requestId= 0025500016111593405114583 &count= one &model=eee&terminal= one &version=eee&network= one http: //test.dis.e.sogou/adlist? offset=0&auth=8215f55af287a62a29efe7a70fd3ba0d&requestId=0025500016111593405114583&count=1&model=eee&terminal=1&version=eee&network=1



Sogou test WeChat signal: Qa_xiaoming

Sogou test QQ fan group: 459645679




This article is shared from the WeChat official account Sogou QA.
In case of infringement, please contact support@oschina.cn Delete.
Participation in this article“ OSC Source Innovation Plan ”, welcome you to join us and share with us.

Expand to read the full text
Loading
Click to lead the topic 📣 Post and join the discussion 🔥
Reward
zero comment
zero Collection
zero fabulous
 Back to top
Top