Regular expression range matching

2020/03/17 16:00

Reading number 108

preface

Recently, when making the evaluation corpus, the editor involved in filtering and extracting some complex strings. For example, to find the words in a sentence under a specific sentence structure can be solved by using statements such as loops and if else, but it is troublesome and convenient to use regular expressions.

No.1

Regular expression definition

Regular expression, also known as regular expression (RE), uses a single string to describe and match a series of strings that conform to a certain syntax rule. In many text editors, regular expressions are usually used to retrieve and replace text that matches a certain pattern.

No.2

Regular expression use

For example, suppose you use a crawler to get an HTML source code, one of which is:

 str1=r" < html > < body > < h1 > hello.world < h1 > </ body > </ html > < h1 > hello:world < h1 > < h1 > helloAworld < h1 > "

At this time, if you want to extract all the hello (x) worlds, you usually need to split the string. But when the second<h1>becomes</h1>, the problem becomes complicated. The following regular expression can directly translate hello (x) Word extracts:

 import re p1 = r"hello.world" pattern =re.compile(p1) print(re.findall(pattern, str1))

Where p1 is Regular expression string, Between hello and the world Of “ . ”It is an object that can match any character Metacharacter (introduced later) , pattern is obtained after compiling Regular expression object The purpose of this is to facilitate reuse in later matching. The result of the above expression is:

No.3

Regular expression matching method

In addition to the findall method described above, the common matching methods for regular expressions are match and search. The difference between the three methods is:

match： Match the regular expression from the start of the string. If it matches, the result of successful matching will be returned; If it does not match, it returns None;

search: When matching, it scans the whole string and returns the first successful matching result. If the search is complete and the result is not found, it returns None;

findall: This method will search the entire string, and then return a list that matches all the contents of the regular expression.

No.4

Regular expression range matching

Within a string, we can select some characters or not within a range, such as the following strings:

 str2 = r"lap &ap nap rap xap xap pap"

For str2, we want all normal strings except&ap to be completed by the following expression:

 p2 = r"[lnrxp]ap"       #All letters beginning with ap pattern = re.compile(p) print (re.findall(pattern,str2))

The result of the above code is:

In the above regular expression, for the (x) ap with fewer categories in str2, it is obviously unrealistic to use this statement when there are 26 letters and case sensitive, Python has many simple regular expressions built in to avoid the need to write the desired letters one by one during the extraction process. Common ones are:

[0-9] 0123456789 Any one of
[a-z] Any one of the lowercase letters
[A-Z] Any one of capital letters
\D is equivalent to [0-9]
\D is equal to [^ 0-9], that is, it matches non numbers
\W is equivalent to [a-z0-9A-Z] matching uppercase and lowercase letters, numbers and underscores
\W is equal to the previous item

Therefore, for the above regular expression p2, using r " wap" and r "[a-z] ap" gives the same result. In addition, we have already introduced“ . ”Metacharacters, in p2“ [] ”It is also a metacharacter, which indicates that the current expression matches any character in []. Python also has many built-in metacharacters, which makes it easier for us to write regular expressions more succinctly. Readers can visit the following website to view the definition: https://www.runoob.com/regexp/regexp-metachar.html 。

No.5

Greed and Laziness of Regular Expression

Assume the following string:

 str3 = r" sogoutest@sogou-inc.com.cn "

If we want to match the contents in str3 from @ to ".", we can do this

 p3 = r"@.+\. " pattern= re.compile(p3) print (re.findall(pattern,str3))

Among them, "+" is also a metacharacter, which means matching the nearest character before it one or more times. As you can see, at this time, the expression matches as many as possible, and matches the "." after com, that is, greedy pattern. If you only want to match to the end of the first point, you can use the following statement to change to lazy mode.

 p3 = r"@.+?\. " pattern= re.compile(p3) print (re.findall(pattern,str3))

At this point, we can see that the result ends when it matches the "." after inc.

Reference link:

one https://www.py.cn/spider/guide/14488.html

two https://www.py.cn/faq/python/12021.html

three https://www.runoob.com/regexp/regexp-metachar.html

epilogue

2020/03/17

Regular expressions play a very important role in string processing. Combined with the metacharacter list provided by Python, more complex statements can be realized. There may be many solutions to the same problem, which need to be mastered in ordinary use.

This article is shared from the WeChat official account Sogou QA.
In case of infringement, please contact support@oschina.cn Delete.
Participation in this article“ OSC Source Innovation Plan ”, welcome you to join us and share with us.

abeet 2024-06-08 20:38

There are no pictures, for fear that we will learn, right

zzeric 2024-04-28 20:01

Although France is the parent community, the core developers of OCCT on github are all Russians. Without Russians, the French parent community cannot continue to operate. So Huawei took over, moved to China, changed its name and resumed open source and community operations. What's the problem?

SnailJob 2024-06-09 09:13

Yes, please continue to follow Snail Job

monkey_cici 2024-05-09 00:25

My I9 CPU, 64GB memory module and 3080Ti computer are inferior to the top configuration of 19999 on a tablet

-SORA- 2024-04-30 17:07

When this happened in a foreign country, the comment area suddenly became very objective and rational**

Chief taxi captain 2024-05-17 11:17

I suggest that 360 open source all its products, and then become the leading enterprise in the domestic open source industry through open source, leading everyone to compete with foreign enterprises

yh2216 2024-06-09 23:03

I remember saying that one year C++was the language of the year,

zhy 2024-05-16 13:16

At the end of Shannon is Nong

kakai 2024-05-10 10:21

The world only knows that Android was created by Google. Several people know that Android is only a product acquired by Google. Similarly, what is the problem with Huawei's contribution to the collection of OGG open source work and integration into its own proprietary product line?

Small and beautiful software development 2024-06-08 23:03

It's mainly about waist training

Bright 2024-05-19 23:25

What a fool! I killed myself. How can people deal with me later.

CodeDoger 2024-05-02 20:48

35 It's too old to go to work and too early to retire at 60

Francesca 2024-06-09 13:21

But the end of closed source must be open source, because many people who are dissatisfied with closed source have created open source, so the end of open source is not necessarily closed source, but to find a business model that is open source= Free Admission

muwanqing123 2024-06-09 08:28

Bullshit authentication

gamedot 2024-05-17 11:14

Old Zhou is deeply concerned about Huawei's great cause of open source. He is not a Huawei person, but has Huawei's soul.

golyu 2024-06-10 14:45

If only this was the library of solidjs

Ma Nong Little Fatty Brother 2024-05-16 14:40

I give you six seconds. I give you six moves with the same effect in the martial arts contest, which shows the invincibility and confidence of the master

Single structure 2024-05-11 10:09

Selected as Open Source China's disgrace pillar

One code Yma 2024-05-09 09:58

Recently, I often go to interviews. People who hate Ali background most regard me as a fool, even though I am a fool

zoujiaqing 2024-06-07 21:22

I dare not use it

Kevin586 2024-06-08 14:41

Dream is garbage, which can also be listed and refresh my cognition

yh2216 2024-06-09 13:15

Like c++

MrChen89 2024-04-29 09:18

There are a group of people like this. I don't know what they have experienced. When it comes to HW, I can't say anything good, even if it's neutral

Xiaoxia cat ball 2024-06-09 21:29

Very good, come on

kangert 2024-06-09 20:10

Really need to practice

GDWhisperer 2024-05-15 17:23

I transferred tens of thousands of yuan to my own account, which was under risk control. How did I do this? The bank should be responsible for this**

pan3793 2024-06-07 22:26

Let AI give AI a score

oldpig 2024-04-28 09:59

”Huawei contributed all the source code "?, the title is completely inconsistent with the content.

Ai East 2024-06-10 19:11

Absolutely easy to use

zhuzhua 2024-05-21 10:08

I'm laughing to death. Those who have been deeply kidnapped dare not pay? Who will use the domestic open source framework of small companies in the future will be 213!!! Wait for harvesting later

osc_27546117 2024-06-09 22:36

Learned electric programming and expected its progress

H Fine water and long flow H 2024-06-10 09:39

I haven't heard about whether fartran has paid. I'm in the top ten

Francesca 2024-05-19 18:00

Wine runs the Android emulator of Windows. Chrome OS is installed in the Android emulator. Linux environment is installed in chrome OS. Linux environment is installed in the Linux environment. Wine is installed in the Android emulator

Li Yinghui 2024-05-09 16:40

Buddhism has a good word, evil opinion. In dealing with the world, it is meaningless to draw conclusions from preset positions; It is also important to receive good logic training.

Qin Liming 2024-05-11 09:12

be devoid of any sense of shame

Xiao_f 2024-06-07 22:59

One thing to say, compared with other domestic manufacturers, Qwen's relaxed licensing fully demonstrates the style of a large factory

Ding Yun H 2024-06-07 20:44

There is no querydsl. Since querydsl was used, I can't look at other forms anymore

brucepapa 2024-06-09 21:02

I also have several backaches... After a few days of exercise, it will be much better to focus on stretching the back muscles.

osc_92224065 2024-04-29 10:57

Long term oppressed outsourcing of state-owned enterprises

sunday12345 2024-05-15 18:31

What does the bank do? It's blamed on the remote desktop. Persimmons really pick up soft pinches~?

infoworld 2024-05-11 15:12

Universities should use open source free software instead of commercial ones. In this way, hands and feet will not be tied technically.

One code Yma 2024-05-06 09:14

My technical article was moved by CSDN. Why didn't anyone step on the sewing machine? This kind of report is a joke to me. The monsters with background are fine, and the monsters without background fight to death

Monkeys think of apes 2024-05-31 18:31

You can cheat your brother. Just don't cheat yourself

Happy LeapFrog 2024-05-18 09:18

But the question is: "What's the use of this for ordinary Android users?" Now the answer seems to be: "Almost nothing.".

Xiao Xu Middle aged 2024-06-08 12:43

Do AI functions need networking? Will it be 404?

kangert 2024-06-09 20:07

The problem of docker hub is very uncomfortable

lyh97157268 2024-06-09 20:58

Xiao Xu Middle aged 2024-06-10 07:05

Learn

osc_566335 2024-04-28 14:44

This is also called floor washing? Does it mean that Tesla will not wash the floor if it releases all the source code? Some people HWptds? That is to say, the language is ambiguous, which will also rise to the washing ground? Are some people too focused? Think the people he pays attention to must be staring at?

zhangleijie 2024-06-08 10:08

pretty good

Shuimu Yi'an 2024-05-20 09:58

The news should be read continuously. I'm waiting for the third news besides rustdesk and teamviewer. Localized remote desktop software is far ahead.

Yeah, for 2024-05-17 13:42

That's too right. Old Zhou can't control Google, but he can control 360. Do not do to others what you do not want. All 360 products should be opened first.

Francesca 2024-06-10 16:19

Be ignorant. This thing has a long history. It is used for scientific computing and has high performance

generation

Code e person 2024-06-09 10:03

Prepare the next project and try it

Yoona520 2024-05-17 16:34

Zhou Hongyi is now living more and more like a clown. If he stays behind the scenes, he has to become an online celebrity. Can you learn from Lei Jun?

iVista 2024-06-10 18:13

I was blinded by the math test

Wang Zheng 2024-06-08 09:46

You said, "All the tests are graduate students" and smiled. I don't know my level is low.

xiaoqibabby 2024-05-15 17:36

The bank is strongly required to be responsible for

zoujiaqing 2024-06-07 21:21

Spring boot was not updated last year

Xiao Xu Middle aged 2024-06-08 10:12

First place in making money!! Money and treasures will be plentiful

Hot content

Popular comments of the whole site

About the author

Author's Album

Author's other popular articles

Hot News

Hot software

OSCHINA Community

Online tools

Introduction

QQ group

Public account

Video number

Regular expression range matching

Hot content

Popular comments of the whole site

About the author

Author's Album

Author's other popular articles

Hot News

Recommended attention

Hot software

OSCHINA Community

Online tools

Introduction

QQ group

Public account

Video number