Use Pandas to solve data processing problems

Today Sunday, I suddenly got a call from a friend who needed help to solve the data processing problem. Today's question, which sounds simple, is to filter out the last data of the same person's date in the data [name, date, amount].

At the beginning, Excel processing was used. With Wenxin's one word help, the solution was as follows:
{MAX (IF (name column=current name, date column)}, the result is accurate.

I also tried to use Pandas. It is true that python is powerful. Using simpler statements can solve the problem.
Import pandas as pd # Read CSV file
Df=pd. read_xls ('data. xlsx ') # Sort by ID number and payment date
Df=df. sort_values (by=['name ',' date ']) # Group by ID number and select the last row of data in each group
Df=df. groupby ('name '). last(). reset_index() # Save results to Excel file
df.to_excel('result.xlsx', index=False)

Some concepts of WEB3.0

Recently, there is a very popular concept, that is, WEB3.0, as the name implies, the third generation Internet. The first generation of Internet is mainly content publishing websites. Through websites, visitors can only read the content provided by websites; The second generation of Internet refers to social networking websites, where anyone can publish information and opinions, such as Facebook and Twitter, which are very popular nowadays. For such websites, their content is generated by users, and the amount of data is huge.

The third generation Internet is a technology closely integrated with the blockchain and virtual currency that have become popular in recent years. It emphasizes users' autonomy over website data and information. It is a decentralized Internet where everyone can publish information.

Data on the elderly in China in 2021

At the beginning of the year, the New Office of the People's Republic of China announced the population data of China in 2021, of which the total population was 1412.6 million, and the annual birth population was 10.62 million, a net increase of 480000 compared with the end of 2020. According to age groups, the working age population of 16-59 years old is 882.22 million, accounting for 62.5% of the national population; There are 267.36 million people aged 60 and above, accounting for 18.9% of the national population, including 200.56 million people aged 65 and above, accounting for 14.2% of the national population.

According to the classification standard of the United Nations, when the proportion of the population aged 60 and above exceeds 10% or the proportion of the population aged 65 and above exceeds 7%, the country is considered to have entered an "aging" society; When these two indicators double (that is, the proportion of the population aged 60 and above exceeds 20% or the proportion of the population aged 65 and above exceeds 14%), it is considered that the country has entered an "aging" society, or a "moderately aging" society.

Reflections on Old Horses' Wisdom

Marxist philosophy says that direct experience and indirect experience are two ways for people to acquire knowledge, and they are the relationship between "source" and "flow". Many people learn from indirect experience, which is based on the direct experience of others. A person's time and experience are limited. It is impossible for him to practice everything directly to gain knowledge. In many cases, he still needs to learn from others and obtain useful information from others. Guan Zhong and Xi Peng are very smart people, but if they get lost, they need to let the old horse know his way. If they can't find water, they need to find water through ants. It can be seen that if they encounter problems that they don't understand, they must learn from others.

Allusion Guan Zhong and Xi Peng followed Duke Huan of Qi to crusade against the Guzhu State. They went on an expedition in spring and returned in winter, lost their way. Guan Zhong said, "The wisdom of Lao Ma can be used." He let Lao Ma go, and everyone followed him, so he found his way. When they reached the mountain, there was no water to drink. Xi Peng said, "Ants live in the south of the mountain in winter and in the north of the mountain in summer. If the ant seal is an inch high on the ground, there will be water eight feet deep underground." So he dug the ground and found water. With Guan Zhong's wisdom and Xi Peng's cleverness, when they met something they didn't know, they would not hesitate to learn from old horses and ants; Now people don't know to use their foolish hearts to learn from the wisdom of saints. Isn't it a mistake?

 

On Zi Yu's Recommendation of Confucius

It is said that when Duke Ping of Jin was in power, Duke Ping asked Qi Huangyang to elect a magistrate of Nanyang County, but Qi Huangyang elected his enemy Xie Hu. Duke Ping asked Qi Huangyang to recommend a lieutenant, and Qi Huangyang recommended his son Qi Wu. As a result, Xie Hu and Qi Wu did a good job, and Qi Huangyang has become a model of not evading relatives and enemies through the ages.

It is a complex matter to recommend talents, that is, it is related to whether the recommended objects can do a good job in the future, and also to their own vital interests. Just like Qi Huangyang, he is totally public minded and needs a high level of consciousness. More often, he considers his own interests more or less. "Han Feizi · Shuo Lin Shang" tells the story of Ziyu's recommendation of Confucius. Finally, he gave up the recommendation because Confucius was too excellent to cover his own light after the recommendation.

[Original] Ziyu saw Confucius in Shangtaizai. Confucius goes out, son takes in, ask the guest. Tai Zai said, "I have seen Confucius, and I regard him as a flea and a louse. I can see it now. " Ziyu was afraid that Confucius was more valuable than you, so he said, "When you have seen Confucius, you will regard your son as a flea." Taizai Yinfu saw him again.

Translation Zi Yu introduced Confucius to the Song Taizai. After Confucius left, Zi Yu came in and asked Tai Zai what he thought of Confucius. Tai Zai said, "After I saw Confucius, you look like a tiny flea louse. I will introduce him to the monarch now." Zi Yu was afraid that Confucius would be valued by the monarch, so he told Tai Zai, "When the monarch saw Confucius, he would also regard you as a flea louse." So Tai Zai stopped introducing Confucius to Song Jun.

Personal Pension System Learning Notes

In order to promote the construction of a multi-level and multi pillar endowment insurance system, promote the sustainable development of the endowment insurance system, and meet the growing needs of the people for diversified endowment insurance, the State Council issued《 Opinions on Promoting the Development of Personal Pension 》(GBF [2022] No. 7). The individual pension insurance system can be voluntarily participated in and purchase qualified financial products to maintain and increase the value of pension, which is a supplement to the basic pension system and the enterprise pension system.

Scope of participation : Workers who participate in the basic old-age insurance for urban workers or urban and rural residents in China can participate in the individual pension system.

Institutional model everything The individual pension adopts the individual account system, where the payment is entirely borne by the participants themselves, and is fully accumulated; Second Individual pension accounts can enjoy preferential tax policies; Third Participants can use their personal pensions to purchase financial products in financial institutions that meet the requirements or in the sales channels entrusted by them in accordance with the law, and bear the corresponding risks; Fourth The pension account is closed and cannot be withdrawn in advance;

Payment level : The maximum amount of personal pension paid by participants every year is 12000 yuan.

How to expand when the Windows system C disk is full

The computer has always used the win7 system. Recently, I encountered a problem that the C disk is full. No matter what application you open, or even use Word or Excel, you will be prompted that the disk is full and you need to clear the space. I don't know if the Windows system is suitable to take up more and more space. I remember that the C disk partition of the first laptop is only 10G, and there is no problem using xp. Now, for Win7 system, the C disk allocation of 40G is not enough.

There is really no software to uninstall, so I thought of expanding the capacity of disk C. First try Use the disk manager that comes with the system to delete the partition of disk D and assign it to disk C. However, the attempt fails. When you extend disk C, the option is grayed out. Second attempt , using a third-party partitioning tool, we learned a partitioning tool through search“ Aomei District Assistant ”First, adjust the partition of disk D and shrink it; Then adjust the partition of disk C to expand the space of disk D.

Through the "Aumei Partition Assistant", the partition was successfully completed. After the execution, the C disk was restarted, and the C disk was expanded to 100G. You no longer need to worry about the C disk.

An Attempt to Import CSV Double Character Separator Data into Database

CSV is a comma separator file, and each column of data uses a comma as the separator. It has strong versatility, and can be processed by Notepad, EXCEL, various text processing tools, or databases. I once encountered such a problem. I imported the CSV file data into the database for processing. The CSV file separator is not a comma, but two characters | |. In order to solve this problem, I made some attempts and successfully imported the database. These methods are now recorded for future use.

1、 Characteristics of the CSV raw data

The CSV data contacted is exported from other databases. Each column uses a double character | | as the separator. The first line is the title. However, there should be a problem with the export settings. For example, the empty column does not have any marks, that is, the | | | separator is connected twice, which is not standardized.

Therefore, the first attempt was to open the data with EXCEL. When looking at the overall situation first, if feasible, the EXCEL data was directly imported into the database. This method failed. When separating EXCEL data into columns, there is the option "continuous delimiters are treated as a single". Since there are no marks in empty columns of data, it is obviously not feasible to treat continuous delimiters as a single. Instead, 4, 6, 8 consecutive delimiters will be treated as one.

2、 Replace double delimiters with single characters

Later, we tried a new method to replace double characters with single characters, so that there would be no problem in importing directly into the database. It is planned to replace the delimiter with a comma. First, query the entire CSV file for no comma. After replacement, the data will completely become a comma delimiter file. This method succeeds.

3、 Create table structure before importing

The database uses MariaDb, and the client is HeidiSql that comes with the system. When importing CSV data, it can automatically identify the title and data. In fact, when I tried the above method, I thought about importing the database directly, and entering | | as the separator. But somehow, there were always errors. If you use the above method, you can change the data into a comma separator file, and then directly import it into the database. There is no problem.

However, if the data volume is too large to be replaced with common single characters, how to solve it? So we still need to find an ultimate solution. After an attempt, first create a table structure in the database, then import the CSV data, and enter the separator | |. The import is successful. Note that the previous attempt to import CSV directly to automatically identify the title failed. The reason for this success is that the table structure was established in advance.

An experience of large-scale data processing Excel version

I wrote an article some time ago《 A large-scale data processing experience PYTHON version 》, try to use PYTHON and PANDAS to solve the calculation problem of large-scale data. If you use Excel to process nearly a million pieces of data, the speed is relatively slow. For complex calculations, you also need to use VBA programs, so PYTHON was used last time. However, the advantages of Excel cannot be ignored. For example, it is simple and intuitive, and can quickly filter and summarize. Finally, Excel must be used to generate reports. This time, we plan to use Excel to reprocess the data, and find appropriate methods to avoid the shortcomings of Excel, so that Excel can quickly process large-scale data.

To continue with the previous question, the data scale is in millions, in CSV format, and the data needs to be calculated. According to the difference of the first three columns of data, the formula is divided into four groups, which are similar to the step price calculation scheme of electricity charges, but the specific standards are different, [Question 1] The final value needs to be calculated; [Question 2] Later, an exploratory algorithm is proposed, that is, the condition for adding "times" to the calculation formula, The calculation formula is different each time.

[Question 1] To solve this problem, we initially used if to nest formulas. It was found that nesting was complicated, and the machine was very stuck when copying formulas, even if automatic calculation was turned off. So it was decided to use VBA to write a program to solve the problem, which was divided into two parts: one was to calculate the function in sections; The second is the main function called. The idea to solve the problem is to calculate the data through VBA. At this time, the data results in the table are static without any formula, so the problem of stuck is avoided. Read on An experience of large-scale data processing Excel version

Collection of underwriting policies of local Huimin Insurance

"Huimin Insurance" is a very popular insurance product in recent years, which is characterized by low premium, high coverage, no age limit, and no health notification. Ordinary health insurance is informed by the health department that it is impossible to apply for health insurance if there are problems with the body. "Huimin Insurance" meets the insurance needs of such personnel. It can be said that "Huimin Insurance" has a certain inclusive nature. The premium standard and underwriting scheme of "Huimin Insurance" are different in each place, so this article will collect various "Huimin Insurance" policies.

1. "Huimengbao"

Underwriting company: Inner Mongolia Branch of PICC Property and Casualty Insurance Co., Ltd;
technical support: Beijing Yuanxin Huibao Technology Co., Ltd;
brokerage agency: Kunpeng Insurance Brokerage (Hainan) Co., Ltd;
Premium standard: 86 yuan/year;
Insured group: As long as the basic medical insurance of Inner Mongolia Autonomous Region and all the leagues and cities under its jurisdiction is in the insured state, the insured can participate. It includes urban employees' basic medical insurance and urban and rural residents' basic medical insurance.
Insurance conditions: No age limit, no occupation limit, no health limit, No waiting period , can participate voluntarily.

Underwriting plan: three 1 million yuan

everything Responsibility for self paying hospitalization medical expenses within the scope of basic medical insurance:
During the insurance period, the medical expenses (excluding outpatient chronic diseases) that must be reasonable and meet the basic medical insurance coverage due to the insured's diagnosis, hospitalization and examination in the designated medical institution due to accident or disease (excluding outpatient chronic diseases) shall be reimbursed at 80% after deducting the deductible of 15000 yuan from the basic medical insurance compensation, and the maximum compensation shall be 1 million yuan.

Second Responsibility for self paid hospitalization medical expenses beyond the scope of basic medical insurance:
During the insurance period, the medical expenses (including self paid medical expenses of Class B individuals, self paid medical expenses of Class C individuals, and self paid medical expenses of other individuals) incurred by the insured due to accident or disease diagnosed and hospitalized in the designated medical institution for treatment and examination (including self paid medical expenses of Class B individuals, self paid medical expenses of Class C individuals, and self paid medical expenses of other individuals), after deducting the deductible of 15000 yuan, shall be reimbursed at 40% of the proportion, with a maximum compensation of 1 million yuan.

Third Responsibility for medical expenses guarantee of specific drugs:
The insured is diagnosed and prescribed by a specialist in a designated medical institution, and purchases and uses specific drugs (including 30 domestic drugs and 20 foreign drugs) in an agreed hospital or pharmacy. Therefore, the necessary and reasonable expenses for specific drugs will be reimbursed at 80% (including 20% for past diseases), with a maximum compensation of 1 million yuan. Read on Collection of underwriting policies of local Huimin Insurance

Follow your conscience and you will be able to be peaceful inside and invincible outside.