Lucky Grass Blog

DFA (Deterministic Finite Automaton) multi keyword search

Original address: DFA Algorithm for Filtering Sensitive Words in Games

For a game with chat function, we hope our chat system can judge the player's input. If the player's input contains some sensitive words, we forbid the player to send chat, or convert the sensitive words to * to replace them.

Why use DFA algorithm
If we already have a thesaurus of sensitive words (obtained from relevant departments or online), the easiest way to filter sensitive words is:
Go through the whole sensitive word library, get the sensitive word, and then judge whether there is the sensitive word in the string entered by the player. If there is, replace the sensitive word character with*

But in this way, we need to traverse the entire sensitive thesaurus and replace the string entered by the player. The whole sensitive thesaurus usually contains thousands of strings. The character string entered by players for chat is generally 20~30 characters.
Therefore, the efficiency of this method is very low and cannot be applied to real development.

The DFA algorithm can be used to achieve efficient filtering of sensitive words. Using the DFA algorithm, we can replace all the existing sensitive words only by traversing the string entered by the player once.

DFA algorithm principle
DFA algorithm constructs a tree like search structure in advance (actually, it should be said that it is a forest), and then it can perform very efficient search in the tree like structure according to the input.

Suppose we have a sensitive thesaurus, and the words in Ciku are:
I love you!
I love him
I love her
I love you
I love him
I love her
I love her

Then we can construct such a tree structure:

Set the string entered by the player as: Baiju I love you hahaha

We traverse the string str entered by the player, and set the pointer i to point to the root node of the tree structure, that is, the leftmost blank node:
When str [0]='white', the tree [i] does not point to the node whose value is' white ', so the matching condition is not met, and continue to traverse
Str [1]='chrysanthemum', also does not meet the matching condition, continue traversing
Str [2]='I'. At this time, the tree [i] has a path connecting to the 'I' node, which meets the matching conditions. I points to the 'I' node, and then continues traversing
Str [3]='love'. At this time, the tree [i] has a path connecting to the node 'love', which meets the matching conditions, and i points to 'love' to continue traversing
Str [4]='you', there is also a path, i points to 'you', continue traversing
Str [5]='ah', there is also a path, and i points to 'ah'
At this point, our pointer i has pointed to the end of the tree structure, that is, a sensitive word judgment has been completed at this time. We can use variables to record the subscript of the string entered by the player at the beginning of the sensitive word match and the subscript at the end of the match, and then iterate again to replace the character with *.
After a match, we re point the pointer i to the root node of the tree structure.
At this time, the string entered by our player has not yet reached the end, so we continue to traverse:
Str [6]='ha', do not meet the matching conditions, continue traversing
Str [7]='ha'
Str [8]='ha'

It can be seen that we can find the sensitive words in the string entered by the player once.

Under the title of this paragraph, I said that the structure constructed by DFA algorithm at the beginning is actually a forest, because for a more complete sensitive thesaurus, the structure constructed by it is as follows:

If you do not look at the root node of the structure, that is, the blank node, it can be seen as a forest composed of tree structures.

After understanding how the DFA algorithm matches the filter words, we begin to discuss how to construct such a forest structure based on the sensitive thesaurus from the code level.

Construction of forest structure for DFA algorithm
Both trees and forests are composed of nodes, so we will discuss what information should be stored by nodes in this structure.

According to the normal tree structure, the node ends storing its own value and the pointer of its connected child node.

But for the structure of DFA algorithm, the number of sub nodes is uncertain at first. Therefore, we can use a List to store the pointers of all child nodes, but in this case, we need to traverse the entire List to find the path when matching, which is relatively slow.

To achieve O (1) lookup efficiency, we can use a hash table to store pointers to child nodes.

We can also directly use the hash table as the entry node of the forest:

This hash table stores a series of sensitive words with different keys. The starting character Value is the key value pair representing the node of this character

And because the hash table can store objects of different types (as long as it inherits from the object), we can also store a key value pair whose Key is' IsEnd 'and Value is 0. A value of 0 means that the current node is not the end of the structure, and a value of 1 means that the current node is the end of the structure.

Then other nodes in the structure can also be constructed using a hash table. For the characters represented by this node, we have stored them in the key value pair contained in its parent node (because our structure eventually has a blank root node, in which the key value pair stores the beginning character of the sensitive vocabulary, and Value is also a hash table, that is, its child node)

Each node, namely the hash table, also stores a key value pair with Kye of "isEnd" and Value of 0/1. Then it also stores a series of characters represented by Key as its child node, and Value as the key value pair of its child node (hash table).

Let's give another specific example:

The structure is as follows:

The structure starts with its blank root node, namely the hash table. We set it as map

Then, for the sensitive word "I love you", the search process is:
Map 'I' 'You' ['IsEnd']==1

After the above analysis, we can get the general process of code construction of the structure:

1. Create a hash table as the blank root node of the structure

2. Traverse the sensitive word thesaurus to get a sensitive word string

3. Traverse the sensitive word string to get a current traversal character

4. Find whether the current traversal character has been included in the tree structure. If so, go directly to the existing node in the tree structure, and then continue to traverse the character downward.

The search process is:

For the first string of sensitive words:

IndexMap=map//It is equivalent to the pointer to the tree structure node

if(indexMap.ContainsKey(‘c’)) indexMap = indexMap[‘c’]

In this way, our indexMap is equivalent to a pointer to the same node that already exists in the tree structure

The same applies to the following characters:

if(indexMap.ContainsKye(‘c’)) indexMap = indexMap[‘c’]

If the tree structure does not exist, or if the current pointer points to a node and all its child nodes do not represent the characters to be traversed, then we need to create a child node, that is, add a key value pair, whose Key is the character currently traversed, and Value is a new hash table.

5. Determine whether the current traversal character is the last of the current string. If yes, add a key value pair whose Key is "IsEnd" and Value is 1. If not, then the key value pair to be added is "IsEnd" and the value is 0.

This is the end of the discussion on the structure construction of the DFA algorithm. Next, the construction code (implemented in Java) is given.

DFA algorithm structure initialization construction code

 /** *Construct sensitive information tree * * @param words */ private static void InitFilter(List<String> words) { map = new HashMap(words.size()); for (int i = 0; i < words.size();  i++) { String word = words.get(i); HashMap indexMap = map; for (int j = 0; j < word.length();  j++) { char c = word.charAt(j); if (indexMap.containsKey(c)) { indexMap = (HashMap) indexMap.get(c); } else { HashMap newMap = new HashMap(); newMap.put("IsEnd", 0); indexMap.put(c, newMap); indexMap = newMap; } if (j == word.length() - 1) { if (indexMap.containsKey("IsEnd")) indexMap.put("IsEnd", 1); else indexMap.put("IsEnd", 1); } } } }

DFA algorithm search process
The principle of the DFA algorithm search process has been discussed above, and examples are also given. In fact, the search process is somewhat similar to the process of initializing the structure. Therefore, the code is given directly without going into details.

Code Implementation of DFA Algorithm Searching Process

 /** *Find Procedure * * @param txt * @param beginIndex * @return */ private static int CheckFilterWord(String txt, int beginIndex) { boolean flag = false; int len = 0; HashMap curMap = map; for (int i = beginIndex; i < txt.length();  i++) { char c = txt.charAt(i); HashMap temp = (HashMap) curMap.get(c); if (temp != null) { if ((int) temp.get("IsEnd") == 1) flag = true; else curMap = temp; len++; } else break; } if (!flag) len = 0; return len; } /** *Find Using * * @param txt * @return */ public static String SerachFilterWordAndReplace(String txt) { int i = 0; StringBuilder sb = new StringBuilder(txt); while (i < txt.length()) { int len = CheckFilterWord(txt, i); if (len > 0) { for (int j = 0; j < len; j++) { sb.replace(i + j, i + j + 1, "*"); } i += len; } else ++i; } return sb.toString(); }

]]>

Principle and method of RSA encryption, decryption, signature and signature verification

Original address: Principle and method of RSA encryption, decryption, signature and signature verification

1、 Introduction to RSA Encryption
RSA encryption is an asymmetric encryption. The decryption can be completed without passing the key directly. This can ensure the security of the information and avoid the risk of being cracked due to the direct transmission of the key. It is the process of encryption and decryption by a pair of keys, called public key and private key respectively. There is a mathematical correlation between the two, and the principle of the encryption algorithm is to ensure the security by factoring a maximum integer with difficulty. The private key is usually kept by individuals, and the public key is public (it may be held by multiple people at the same time).

2、 Difference between RSA encryption and signature
Both encryption and signature are for security, but they are slightly different. It is often asked whether encryption and signature use a private key or a public key? In fact, the functions of encryption and signature are confused. In short, encryption is to prevent information from being leaked, while signature is to prevent information from being tampered with. Here are two examples.

The first scenario: On the battlefield, B will send A a message containing an order.

RSA encryption process is as follows:

(1) A generates a pair of keys (public key and private key). The private key is not public and A keeps it. The public key is public and can be obtained by anyone.

(2) A passes its own public key to B, and B encrypts the message with A's public key.

(3) A receives the encrypted message from B and decrypts the message using A's own private key.

In this process, there are only two transmission processes. The first is that A transmits the public key to B, and the second is that B transmits the encrypted message to A. Even if both are intercepted by the enemy, there is no danger, because only A's private key can decrypt the message, preventing the disclosure of the message content.

The second scenario: After receiving the message sent by B, A needs to reply "Received".

The RSA signature process is as follows:

(1) A generates a pair of keys (public key and private key). The private key is not public and A keeps it. The public key is public and can be obtained by anyone.

(2) A signs the message with its private key to form a signature, and passes the signed message and the message itself to B.

(3) After receiving the message, B obtains A's public key for signature verification. If the content of the signature verification is consistent with the message itself, it proves that A replied to the message.

In this process, there are only two transmission processes. The first is that A transmits the signed message and the message itself to B, and the second is that B obtains A's public key. Even if it is intercepted by the enemy, it is not dangerous, because only A's private key can sign the message. Even if the message content is known, it is impossible to forge a signed reply to B, preventing the tampering of the message content.

However, combining the two scenarios, you will find that in the first scenario, although the intercepted message is not disclosed, you can use the intercepted public key to encrypt the false instruction and then pass it to A. In the second scenario, although the intercepted message cannot be tampered with, the content of the message can be obtained by public key signature verification, which does not prevent leakage. Therefore, in practical applications, encryption and signature can also be used according to the situation. For example, A and B both have their own public key and private key. When A wants to send a message to B, it encrypts the message with B's public key first, and then signs the encrypted message with A's private key, so as to achieve neither disclosure nor tampering, and better ensure the security of the message.

　　 Summary: public key encryption, private key decryption, private key signature, public key signature verification.

3、 The code example of RSA encryption and signature method is as follows:

 import java.io.ByteArrayOutputStream; import java.security. KeyFactory; import java.security. KeyPair; import java.security. KeyPairGenerator; import java.security. PrivateKey; import java.security. PublicKey; import java.security. Signature; import java.security.spec.PKCS8EncodedKeySpec; import java.security.spec.X509EncodedKeySpec; import javax.crypto. Cipher; import org.apache.commons.codec.binary. Base64; public class TestRSA { /** *RSA maximum encrypted plaintext size */ private static final int MAX_ENCRYPT_BLOCK = 117; /** *RSA maximum decrypted ciphertext size */ private static final int MAX_DECRYPT_BLOCK = 128; /** *Get Key Pair * *@ return Key pair */ public static KeyPair getKeyPair() throws Exception { KeyPairGenerator generator = KeyPairGenerator.getInstance("RSA"); generator.initialize(1024); return generator.generateKeyPair(); } /** *Get Private Key * *@ param privateKey private key string * @return */ public static PrivateKey getPrivateKey(String privateKey) throws Exception { KeyFactory keyFactory = KeyFactory.getInstance("RSA"); byte[] decodedKey = Base64.decodeBase64(privateKey.getBytes()); PKCS8EncodedKeySpec keySpec = new PKCS8EncodedKeySpec(decodedKey); return keyFactory.generatePrivate(keySpec); } /** *Get public key * *@ param publicKey public key string * @return */ public static PublicKey getPublicKey(String publicKey) throws Exception { KeyFactory keyFactory = KeyFactory.getInstance("RSA"); byte[] decodedKey = Base64.decodeBase64(publicKey.getBytes()); X509EncodedKeySpec keySpec = new X509EncodedKeySpec(decodedKey); return keyFactory.generatePublic(keySpec); } /** *RSA encryption * *@ param data *@ param publicKey public key * @return */ public static String encrypt(String data, PublicKey publicKey) throws Exception { Cipher cipher = Cipher.getInstance("RSA"); cipher.init(Cipher.ENCRYPT_MODE, publicKey); int inputLen = data.getBytes().length; ByteArrayOutputStream out = new ByteArrayOutputStream(); int offset = 0; byte[] cache; int i = 0; //Encrypt data segments while (inputLen - offset > 0) { if (inputLen - offset > MAX_ENCRYPT_BLOCK) { cache = cipher.doFinal(data.getBytes(), offset, MAX_ENCRYPT_BLOCK); } else { cache = cipher.doFinal(data.getBytes(), offset, inputLen - offset); } out.write(cache, 0, cache.length); i++; offset = i * MAX_ENCRYPT_BLOCK; } byte[] encryptedData = out.toByteArray(); out.close(); //Get the encrypted content and use base64 to encode it, and convert it to a string based on UTF-8 //Encrypted string return new String(Base64.encodeBase64String(encryptedData)); } /** *RSA decryption * *@ param data *@ param privateKey * @return */ public static String decrypt(String data, PrivateKey privateKey) throws Exception { Cipher cipher = Cipher.getInstance("RSA"); cipher.init(Cipher.DECRYPT_MODE, privateKey); byte[] dataBytes = Base64.decodeBase64(data); int inputLen = dataBytes.length; ByteArrayOutputStream out = new ByteArrayOutputStream(); int offset = 0; byte[] cache; int i = 0; //Decrypt data segments while (inputLen - offset > 0) { if (inputLen - offset > MAX_DECRYPT_BLOCK) { cache = cipher.doFinal(dataBytes, offset, MAX_DECRYPT_BLOCK); } else { cache = cipher.doFinal(dataBytes, offset, inputLen - offset); } out.write(cache, 0, cache.length); i++; offset = i * MAX_DECRYPT_BLOCK; } byte[] decryptedData = out.toByteArray(); out.close(); //Content after decryption return new String(decryptedData, "UTF-8"); } /** *Signature * *@ param data *@ param privateKey *@ return Signature */ public static String sign(String data, PrivateKey privateKey) throws Exception { byte[] keyBytes = privateKey.getEncoded(); PKCS8EncodedKeySpec keySpec = new PKCS8EncodedKeySpec(keyBytes); KeyFactory keyFactory = KeyFactory.getInstance("RSA"); PrivateKey key = keyFactory.generatePrivate(keySpec); Signature signature = Signature.getInstance("MD5withRSA"); signature.initSign(key); signature.update(data.getBytes()); return new String(Base64.encodeBase64(signature.sign())); } /** *Signature verification * *@ param srcData original string *@ param publicKey public key *@ param sign *@ return Whether the signature verification is passed */ public static boolean verify(String srcData, PublicKey publicKey, String sign) throws Exception { byte[] keyBytes = publicKey.getEncoded(); X509EncodedKeySpec keySpec = new X509EncodedKeySpec(keyBytes); KeyFactory keyFactory = KeyFactory.getInstance("RSA"); PublicKey key = keyFactory.generatePublic(keySpec); Signature signature = Signature.getInstance("MD5withRSA"); signature.initVerify(key); signature.update(srcData.getBytes()); return signature.verify(Base64.decodeBase64(sign.getBytes())); } public static void main(String[] args) { try { //Generate key pair KeyPair keyPair = getKeyPair(); String privateKey = new String(Base64.encodeBase64(keyPair.getPrivate().getEncoded())); String publicKey = new String(Base64.encodeBase64(keyPair.getPublic().getEncoded())); System. out. println ("private key:"+privateKey); System. out. println ("public key:"+publicKey); //RSA encryption String data="Text content to be encrypted"; String encryptData = encrypt(data, getPublicKey(publicKey)); System. out. println ("Encrypted content:"+encryptData); //RSA decryption String decryptData = decrypt(encryptData, getPrivateKey(privateKey)); System. out. println ("decrypted content:"+decryptData); //RSA Signature String sign = sign(data, getPrivateKey(privateKey)); //RSA signature verification boolean result = verify(data, getPublicKey(publicKey), sign); System. out. print ("signature verification result:"+result); } catch (Exception e) { e.printStackTrace(); System. out. print ("encryption and decryption exception"); } } }

PS: RSA encryption limits the length of plaintext. It stipulates that the maximum length of plaintext to be encrypted=key length - 11 (in bytes, that is, byte). Therefore, encryption and decryption should be performed in blocks. The key is 1024 bits by default, that is, 1024 bits/8 bits - 11=128-11=117 bytes. Therefore, by default, the maximum length of plaintext before encryption is 117 bytes, and the maximum length of decrypted ciphertext is 128 words. So why are they 11 bytes apart? The reason is that RSA encryption uses padding mode, that is, when the content is less than 117 bytes, it will be automatically filled. Filling mode naturally takes up a certain number of bytes, and these bytes are also involved in encryption.

The key length is set in line 32 of the above example. It can be adjusted by itself. Of course, the security of asymmetric encryption will increase while the performance will decline as the key grows.

]]>

Every day without dancing is a betrayal of life!

"Every day without dancing is a betrayal of life", which is a classic quote of the famous philosopher Nietzsche. This quotation does not mean that everyone should sing and dance every day to live in a peaceful life. It actually means that we should all pay our heart to live a wonderful life, and the most worthy life is to live each day well and fully. And no matter who we are, where we are in life, what we are doing now, or what you want to do in the future, how far away you are from your goal, we should always be enthusiastic about life. So whether we dance in our daily life depends on our own will, and our life state is completely determined by ourselves.

But what is our current living condition?

Every day I go to work, get off work, go home, go to the first line at three o'clock, and shuttle home, subway, and company every day. Playing various roles in the workplace; When you get home, you play a good daughter in front of your parents, a good wife in front of your husband, and a role that allows everyone to see your best side in front of all kinds of people.

At the party with friends after work, listen to friends talk about what they say XX is doing, how good the treatment is, and how easy the work is; Seeing a TV play in which men and women meet each other, imagine whether they can have such romantic love; When I saw a creative advertisement outside, I felt that the company was fashionable, creative, good at marketing, and envious. So you fell madly in love with this feeling, envied the current state of life of others, but did not know that the success of others is not simple.

So you live in such a state every day. You don't want to travel, take risks, achieve this year's goals, contact the outside world, and even try different lives. All day long, I have my mobile phone, WeChat, QQ, Douban Weibo, Korean dramas, Taobao and online games. Why are you still young when you do what retired people do now? How can you let your life dance when you have failed your youth, wasted time and wasted life?

So, my friends, please don't choose to be stable at the age when you should strive most. You should cherish every day when you were young, because youth can't be duplicated once it's gone.

Take a look at the plans and list you made last year, and review what you have accomplished and done in the past year?

If you have completed 90% of the goal list, congratulations. You can become a little expert in the implementation of the plan, which shows that you have lived up to your last year. You have made your life last year at least very full and meaningful.

But if you are still in the state of "what is my plan, what will I accomplish this year, and what is the goal", it doesn't matter, because in your life, if you don't do something now, you won't do it in the future. But if you want to do something and make up your mind to do it, the world can make way for you. It is the first day of March, and there is still three quarters of the time before the end of this year. Next, please take advantage of the remaining time and do something you want to do to enrich yourself, so that you can dance every day after that.

My friends, please don't be a spectator in your life, and don't let down every day of your life. Every today is a memory of tomorrow. Do what you should do, meet the people you want to meet, lose your troubles, live your own day, and make every day your best memory of tomorrow.

Time is too short, the world is too big, live in the present, and cherish!

Let's be the leader of time and let our daily life dance!

]]>

Install commands in the docker container

Original address: https://blog.csdn.net/qq_22211217/article/details/80637971

 Apt get update # # Update //vi apt install vim //weget apt install weget //yum apt install yum //ifconfig  apt install net-tools        //ping apt install iputils-ping

]]>

Vmware16 virtual machine configuration centos7 network

Original address: https://blog.csdn.net/u014650004/article/details/108865912

1) Use bridge mode

2) View local computer configuration

3) Configure virtual machine network

 1. Open the virtual machine network configuration file, vim/etc/sysconfig/network scripts/ifcfg-ens33

 2. The configuration is as follows

 3) Restart the network service

4) Test whether the Internet can be accessed successfully

]]>

Network (I) Network layering

Original address: https://segmentfault.com/a/1190000014767181

1、 Network layering

There are two sets of reference models:

OSI Reference Model (seven layers): The model is too idealistic to be widely promoted on the Internet.
TCP/IP reference model (or TCP/IP protocol, four layers): the de facto international standard.

How to encapsulate and split data in different layers:

In 90% of cases, Java code will work in the application layer and only need to talk to the transport layer. The other 10% of the time will be processed at the transport layer and will talk with the application layer or the Internet layer.

Internet layer

In the OSI model, the Internet layer uses a more general name, called the network layer. The network layer protocol defines how data bits and bytes are organized into larger packets, called packets, and also defines an addressing mechanism by which different computers can find each other. Internet Protocol (IP) is the most widely used network layer protocol in the world and the only network layer protocol that Java understands. In fact, these are two protocols: IPv4 and IPv6. In IPv4 and IPv6, data is transmitted on the Internet layer by packets, which are called datagrams.

In addition to routing and addressing, the second role of the Internet layer is to support different types of hosts to talk to each other in the network layer. The Internet router will complete the conversion between WiFi and Ethernet, Ethernet and DSL, DSL and optical fiber round-trip and other protocols. If there is no Internet layer or similar layer, each computer can only talk to other computers on the same type of network. The Internet layer is responsible for connecting heterogeneous networks using homogeneous protocols.

Transport layer

The raw datagram has some drawbacks. The most obvious disadvantage is that reliable transmission cannot be guaranteed, and even if it can be transmitted, it may be damaged in transmission. The header checksum can only detect the damage in the header, but not the data part in the datagram. Finally, even if datagrams can reach the destination without being destroyed, they may not arrive in the order in which they were sent. Each datagram may go through different routes from the source to the destination. If datagram A is sent before datagram B, it does not mean that datagram A will arrive before datagram B.

The transport layer is responsible for ensuring that each packet has been sent and received sequentially, and that there is no data peer or damage. If a packet is lost, the transport layer will request the sender to replace the packet. To achieve this goal, the IP network will add an additional header to each datagram, which contains more information. There are two protocols on this layer. The first is the Transmission Control Protocol (TCP), which is a high overhead protocol. It supports the replacement of lost or damaged data and the transmission in the order of sending. The second protocol is the User Datagram Protocol (UDP), which allows the receiver to detect damaged packets, but does not guarantee that these packets are transmitted in the correct order (or that the packets may not be transmitted at all). But UDP is usually faster than TCP. TCP is called reliable protocol; UDP is an unreliability protocol. As we will see later, unreliable protocols are much more useful than they sound.

application layer

The layer that transmits data to users is called the application layer. The three layers below it define how data is transferred from one computer to another. The application layer determines the operation after data transmission. For example, application layer protocols such as HTTP (for the Internet) can ensure that Web browsers display images as images instead of a long string of data. Most of the network related parts of your application spend time on the application layer. In addition to the HTTP of the user's Web, there are SMTP, POP, IMAP for e-mail; FTP, FSP and TFTP for file transfer; NFS for file access; Gnutella and BitTorrent for file sharing; Session Initiation Protocol (SIP) and Skype for voice communication. In addition, your program can customize its own application layer protocol when necessary.

II IP and port

IP address: InetAddress

Unique identification of computers on the Internet
Local loopback address (hostAddress): 127.0.0.1 HostName: localhost
Hard to remember
Port number: identify the process (program) running on the computer
Different processes have different port numbers
It is specified as a 16 bit integer 0~65535. Among them, 0~1023 is occupied by predefined service communication (for example, MySql occupies port 3306, http occupies port 80, etc.). Unless we need to access these specific services, we should use one of these ports 1024~65535 for communication to avoid port conflicts.
The combination of port number and IP address results in a network socket. For example, QQ communication can accurately locate two QQ clients through IP address and port number.

III TCP, UDP and Socket

Network communication protocol: There must be some agreements to realize communication in the computer network, that is, communication protocol, which establishes standards for speed, transmission code, code structure, transmission control steps, error control, etc.
The idea of communication protocol layering:
Because the connections between nodes are very complex, when making a protocol, the complex components are decomposed into some simple components, and then they are combined. The most commonly used composite method is the hierarchical method, that is, the same layer can communicate with each other, and the upper layer can call the next layer without any relationship with the next layer. Each layer does not affect each other, which is conducive to the development and expansion of the system.

1. TCP/UDP protocol

There are two very important protocols in the transport layer protocol:

Transmission Control Protocol
User Datagram Protocol (UDP).

1. TCP protocol:

Before using the TCP protocol, you must first establish a TCP connection to form a transmission data channel
Before transmission, it is reliable to use the "triple handshake" mode
Two application processes for TCP communication: client and server
Large amount of data can be transferred in the connection
After transmission, the established connection needs to be released, which is inefficient

2. UDP protocol:

Encapsulates data, source and destination into data packets without establishing connections
The size of each datagram is limited to 64K
Unreliable because no connection is required
No need to release resources when sending data, fast

2.Socket

The use of socket to develop network applications has long been widely adopted, so that it has become a de facto standard.
Both ends of the communication must have sockets. The end point network communication between two machines is actually the communication between sockets.
Socket allows the program to regard the network connection as a stream, and data is transmitted between two sockets through IO.
Generally, the application program that initiatively initiates communication belongs to the client, and the server is waiting for the communication request.
Socket is an intermediate software abstraction layer for communication between application layer and TCP/IP protocol family. It is a group of interfaces. In the design mode, Socket is actually a facade mode. It hides the complex TCP/IP protocol family behind the Socket interface. For users, a group of simple interfaces is all. Let the Socket organize data to conform to the specified protocol.

Difference between socket and http

-HTTP protocol: Simple object access protocol, corresponding to the application layer, is based on TCP connection
-TCP protocol: corresponding to the transport layer
-IP protocol: corresponds to the network layer
1. TCP/IP is a transport layer protocol, which mainly solves how to transmit data in the network; HTTP is an application layer protocol, which mainly solves how to package data.
2. Socket is the encapsulation of TCP/IP protocol. Socket itself is not a protocol, but a calling interface (API). TCP/IP protocol can only be used through Socket.
3. http connection: http connection is the so-called short connection, that is, the client sends a request to the server, and the connection will be broken after the server responds;

Socket connection: A socket connection is a so-called long connection. Theoretically, once the connection between the client and the server is established, it will not be automatically broken; However, due to various environmental factors, such as the server or client host down, network failure, or no data transmission between the two for a long time, the network firewall may disconnect the connection to release network resources.

]]>

TCP/IP protocol, differences and characteristics between TCP and UDP

Original address: https://blog.csdn.net/zzfightingy/article/details/88383635

This blog is mainly used to record your personal understanding of TCP/IP and UDP. If there is anything wrong, please correct it generously

Some basic knowledge

IP address: used to identify the address of a communication entity in the network. Communication entities can be computers, routers, etc. For example, each server of the Internet must have its own IP address, and each LAN computer must also be configured with an IP address to communicate. At present, the mainstream IP address is IPV4, but with the continuous expansion of the network scale, IPV4 is facing the danger of exhaustion, so IPV6 is launched.
IPV4: 32-bit address, 8 bits as a unit, divided into four parts, expressed in dotted decimal, such as 192.168.0.1. Since the count range of 8-bit binary is 00000000 -- 11111111, which corresponds to 0-255 in decimal system, -4.278.4.1 is the wrong IPV4 address.
IPV6: 128 bits (16 bytes) are written as eight 16 bit unsigned integers, each integer is represented by four hexadecimal bits, and each number is separated by a colon (:), such as: 3ffe: 3201:1401:1280: c8ff: fe4d: db39:1984
Port: According to the previous item, we know that the IP address is used to identify a computer, but a computer may provide multiple network applications. How can we distinguish these different programs? This requires ports. Port is a virtual concept, which does not mean that there are several ports on the host. Through the port, you can run multiple network applications on a host. The port is a 16 bit binary integer, corresponding to 0-65535 in decimal system. Oracle, MySQL, Tomcat, QQ, msn, Xunlei, E-donkey, 360 and other network programs have their own ports.
URL: On the www, each information resource has a unified and unique address. This address is called the URL (Uniform Resource Locator), which is the uniform resource locator of the www. The URL consists of four parts: protocol（ http://) The host domain name (www.google. com), port number (can be omitted, such as 80), and resource file name (index. html) where the resources are stored, followed by other parameters, such as hyperlinks and anchors (? Xxx=xxx&xxx # xxx). If the port number is not specified, the protocol default port is used. For example, the default port of the http protocol is 80.

definition

TCP/IP is the general name of various protocol families related to the Internet, such as TCP, UDP, IP, FTP, HTTP, and so on, which belong to the TCP/IP family. (The explanation is not unique)

The reference model of TCP/IP is mainly divided into four layers: application layer, transport layer, network layer and data link layer. (Sometimes, the physical layer is added, because no matter what protocol and reference model, data is transmitted on the physical media in the form of binary data)

application layer
Provide users with a group of commonly used applications, such as e-mail, file transfer, etc. The application layer generates and processes application data, and sends the data to the transport layer for transmission.
Transport layer
The transport layer is responsible for the communication between applications, formatting and encapsulating the information flow from the application layer, providing reliable transmission, and realizing direct process communication. In order to ensure the reliability of information transmission, the transport layer protocol stipulates that the receiver must send back an acknowledgement, and if the information is lost, it must be sent again. See the process of "triple handshake" for details. The transport layer protocols are mainly: Transmission Control Protocol (TCP) and User Datagram protocol (UDP).
Network layer (also called IP layer)
The network layer packs formatted information flows from the transport layer into IP packets, which are sent to the data link layer for processing to achieve communication between different hosts. Its functions mainly include three aspects.
1. When sending data, process the sending request from the transport layer. After receiving the request, load the data to be sent into an IP packet, and then send the IP packet to the specified network interface.
2. Process input packets received from other hosts.
3. Handle problems such as path, flow control and blocking.
data link layer
The data link layer is responsible for receiving IP data packets from the network layer, packaging them into frames, and sending them through the network; Or, it can receive physical frames from the network, extract IP packets, and deliver them to the network layer.

(The data link layer mainly solves three problems: https://blog.csdn.net/SouthWind0/article/details/80038014 ）

TCP

TCP (Transmission Control Protocol) is a connection oriented, reliable and byte stream based transport layer communication protocol.

In order to ensure the reliable transmission of data packets, TCP will give each packet a serial number. At the same time, this serial number also ensures that the host sent to the receiving end can receive in order. Then the receiving host sends back a corresponding acknowledgement character (ACK, Acknowledgement) to the successfully received packet. If the sending host does not receive the acknowledgement character ACK within a reasonable round trip delay (RTT), the corresponding packet is considered lost and will be retransmitted.

Establish TCP connection (three handshakes)
When a connection based on TCP transport layer protocol is established, the sender and receiver send a total of three packets, which is called three handshakes when establishing a connection.

When the sending end sends a SYN connection request (SYN: Synchronize Sequence Numbers), wait for the receiving end to reply to SYN and ACK signals. After receiving the reply confirmation signal, the sending end executes ACK confirmation on the opposite side's SYN. This method of establishing connection can prevent wrong connection. The flow control protocol used by TCP is the variable size sliding window protocol. As shown in the figure below:

 1. The first handshake: The client sends a packet with SYN flag position 1 of TCP indicating the port of the server the client intends to connect to, and the initial sequence number Seq=X, which is saved in the sequence number field of the packet header. Enter SYN_SEND state. 2. The second handshake: the server sends back an acknowledgement packet (SYN+ACK), that is, the SYN flag bit and the ACK flag bit are both 1. At the same time, set the ACK as the client's Seq plus 1, that is, X+1. At this time, the ACK packet consists of SYN (Seq=Y) and ACK ((ACK=X+1). Enter SYN_RECV status. 3. The third handshake: After the client receives the SYN+ACK, it responds to the ACK of the acknowledgement packet again (the SYN flag bit is 0, and the ACK flag bit is 1), and sends the Seq field+1 (that is, ACK=Y+1) sent by the server to the server in the acknowledgement field. Enter the Established state.

After three handshakes are completed, the TCP client and server successfully establish a connection, and data transmission can begin.

Terminate TCP connection (four waves)
The termination of a TCP connection requires sending four packets, so it is called a four way handshake. This is caused by the half close of TCP. Under a TCP connection, either the client or the server can actively initiate a wave. In socket programming, calling the close () method at any end can generate a waving operation.

 1. An application process first calls close(), saying that the end executes an "active close". The TCP of the active end then sends a FIN segment, indicating that the data transmission is completed. Its S/N Seq=X (equal to the S/N of the last byte of the previously transmitted data plus 1). At this time, the passive end enters the FIN-WAIT-1 (stop waiting 1) state. 2. The passive end receives the connection termination message sent from the active end, sends back the acknowledgement message ACK=X+1, and carries its own serial number Seq=Z. At this time, the passive end enters the CLOSE-WAIT state# Note: At this time, the TCP server notifies the application layer that the active end of the application process is released to the passive end. At this time, it is semi closed, that is, the active end has no data to send, but if the passive end sends data again, the active end still receives it. This semi closed state will last for a period of time, that is, the duration of the entire CLOSE-WAIT state. 3. After receiving the confirmation field sent back by the passive end, the active end will enter FIN-WAIT-2 (termination waiting 2) status and wait for the passive end to send the connection termination message (before this, the active end needs to accept the last data sent by the passive end). After the passive end has sent the last data, it will send the connection termination message to the active end, FIN=1, ACK=X. Since the server may have sent some data in the semi closed state previously, it is assumed that the serial number at this time is Seq=Y. At this time, the passive end enters the LAST-ACK (final confirmation) state and waits for the confirmation of the active end. 4. After receiving the connection termination message from the passive end, the active end must send a confirmation, ACK=Y, and its own serial number is Seq=X (at this time, the active end enters the TIME WAIT state. Note that the TCP connection has not been terminated at this time, and it must pass 2 ∗ MSL (the longest message segment life) before entering the CLOSED state. As long as the passive end receives the confirmation sent by the active end, it immediately enters the CLOSED state. As you can see, the passive end of the TCP connection is earlier than the active end.

This process is vividly shown in the dynamic diagram (source: https://blog.csdn.net/qzcsu/article/details/72861891 ）

UDP

UDP (User Datagram Protocol) protocol, namely user datagram protocol, is a connectionless transport layer protocol in the OSI (Open System Interconnection) reference model, which provides simple and unreliable transaction oriented information transmission services. UDP, like TCP, is used to process data packets, but UDP is a connectionless protocol that does not guarantee reliability. UDP has the disadvantage of not providing packet grouping, assembly and sorting, that is, after the message is sent, it is impossible to know whether it has arrived safely and completely. UDP is used to support network applications that need to transfer data between computers.

Differences between TCP and UDP

TCP is connection oriented (for example, dial to establish a connection before making a call); UDP is connectionless, that is, there is no need to establish a connection before sending data
TCP provides reliable services. UDP does its best to deliver, that is, it does not guarantee reliable delivery. In other words, the data transmitted through TCP connection is error free, not lost, not repeated, and arrives in sequence. TCP realizes reliable transmission through checksum, retransmission control, serial number identification, sliding window, acknowledgement response and other mechanisms. For example, the retransmission control in case of packet loss can also control the sequence of the disordered packets.
UDP has better real-time performance and higher working efficiency than TCP. It is suitable for communication or broadcast communication with high speed transmission and real-time performance.
Each TCP connection can only be point-to-point; UDP supports one-to-one, one to many, many to one and many to many interactive communications.
TCP requires more system resources than UDP.

]]>

MySql small table drives large table

Original address: https://blog.csdn.net/codejas/article/details/78632883

1、 Optimization principle

Small tables drive large tables, that is, small datasets drive large datasets. Before we know what a small table drives a large table, let's first understand the two query keywords, IN and EXISTS. Let's first understand their functions through two query statements. I have created two tables, an employee table and a department table. The employee table has the attribute of department id, which is used to associate the two tables.

First, we use IN to query data:

 SELECT *  FROM t_emp  WHERE dept_id IN (SELECT dept_id FROM t_dept)  LIMIT 5;

Query result: Since there is a lot of employee information, I only query 5 pieces of data here.

 +-------------+----------+------------+--------------+---------+ | emp_id      | emp_name | emp_gender | emp_email    | dept_id | +-------------+----------+------------+--------------+---------+ | 00000000177 | 41d80    | m          |  41d80@zc.com  |       1 | | 00000000178 | a74b8    | m          |  a74b8@zc.com  |       1 | | 00000000179 | 661ca    | m          |  661ca@zc.com  |       1 | | 00000000180 | 9413d    | m          |  9413d@zc.com  |       1 | | 00000000181 | 7d577    | m          |  7d577@zc.com  |       1 | +-------------+----------+------------+--------------+---------+

Next, use EXISTS to query data:

 SELECT *  FROM t_emp  WHERE EXISTS  (SELECT 1  FROM t_dept  WHERE t_dept.dept_id = t_emp.dept_id)  LIMIT 5;

Query Result: Same as above.

 +-------------+----------+------------+--------------+---------+ | emp_id      | emp_name | emp_gender | emp_email    | dept_id | +-------------+----------+------------+--------------+---------+ | 00000000177 | 41d80    | m          |  41d80@zc.com  |       1 | | 00000000178 | a74b8    | m          |  a74b8@zc.com  |       1 | | 00000000179 | 661ca    | m          |  661ca@zc.com  |       1 | | 00000000180 | 9413d    | m          |  9413d@zc.com  |       1 | | 00000000181 | 7d577    | m          |  7d577@zc.com  |       1 | +-------------+----------+------------+--------------+---------+

Since both IN and EXISTS can be used to query data, what's the difference between them?

 SELECT *  FROM t_emp  WHERE dept_id IN  (SELECT dept_id  FROM t_dept); //This SQL statement is equivalent to: for SELECT dept_id FROM t_dept for SELECT * FROM t_emp WHERE t_emp.dept_id = t_dept.dept_id

Although the SQL statements we wrote here are mainly to query employee information and sub query department IDs, the execution order of MySql will execute sub queries first, then the main query, and then obtain the data we want to query.

 SELECT *  FROM t_emp  WHERE EXISTS  (SELECT 1  FROM t_dept  WHERE t_dept.dept_id = t_emp.dept_id); //This SQL statement is equivalent to: for SELECT * FROM t_emp  for SELECT * FROM t_dept  WHERE t_dept.dept_id = t_emp.dept_id

We can understand EXISTS syntax as: put the data of the main query in the sub query for conditional verification, and determine whether the data in the main query needs to be retained according to the results of TRUE and FALSE. The EXISTS subquery only returns TRUE or FALSE, so the SELECT * in the subquery can be SELECT 1 or other. MySql officials say that the SELECT list will be ignored during actual execution, so there is no difference. When the EXISTS sub query is actually executed, MySql has made some optimizations to it, not to compare each data.

2、 Summary

In the actual operation, we need to set the index on dept_id of both tables. At the beginning, we talked about an optimization principle: small tables drive large tables. When we use IN for association queries, we first query the department table and then query employee information according to the ID information found in the department table according to the execution order of the IN operation above. We all know that there will be a lot of employee information in the employee table, but the department table generally has very little data information. We query employee information in advance by querying the department table information, and use the query results of the small table (t_dept) to drive the large table (t_emp). This query method is highly efficient and worth advocating.

But when we use EXISTS query, we first query the employee table, and then determine whether to retain the information in the employee table according to the TRUE or FALSE returned by the query criteria of the department table. Isn't it just using large data tables (t_emp) to drive small data tables (t_dept)? Although this method can also find the data we want, it is not worth advocating.

When the data in the t_emp table is more than the data in the t_dept table, we use IN instead of EXISTS. When the data in the t_dept table is more than the data in the t_emp table (we just assume here), we use EXISTS better than IN. Therefore, whether to use IN or EXISTS depends on our needs. However, if the amount of data in the two tables is the same, there is little difference between IN and EXISTS.

In is the sub query first and then the parent query. The sub query is a small table, and the parent query is a large table, so the small table drives the large table, which conforms to the concept
Exist queries large tables first and then small tables (to determine whether the data of large tables need to be retained), so it is a large table driven small table, which does not conform to the concept

]]>

In a set (intersection, union, difference, complement, symmetric difference)

]]>

Differences among Docker, Docker Compose, Docker Swarm, and Kubernetes

Original address: https://blog.csdn.net/notsaltedfish/article/details/80959913

Recently, I was learning about Docker containers and learned some related technologies, such as Kubernetes, Docker compose, Docker Swarm. I can't tell the difference between these things. I specially studied and shared them, which is suitable for beginners to learn about containers.

Dcoker

The role of Docker is easy to understand. It is a container engine. That is to say, in fact, our container is ultimately created by Docker and runs in Docker. Other related container technologies are based on Docker. It is the core of our use of other container technologies.

Docker-Compose

Docker Compose is used to manage your containers. It is a bit like a container steward. Imagine that there are hundreds of containers in your Docker that need to be started. If you start one by one, it will take more time. With Docker Compose, you only need to write a file, declare the containers to be started in this file, configure some parameters, and execute this file. Docker will start all containers according to the configuration you declare, but Docker Compose can only manage Dockers on the current host, that is, it cannot start Docker containers on other hosts

Docker Swarm

Docker Swarm is a tool used to manage Docker containers on multiple hosts. It can help you start containers and monitor the status of containers. If the status of containers is abnormal, it will help you restart a new container to provide services, and also provide load balancing between services. Docker Compose cannot do these things

Kubernetes

The role of Kubernetes is the same as that of Docker Swarm, that is to say, the work they are responsible for in the container field is the same, but of course, they also have some different characteristics. Like Eclipse and IDEA, this is also a cross host container management platform. It is a container management platform developed by Google based on its years of operation and maintenance experience. Docker Swarm is developed by Docker Company.

Since these two things are the same, we are faced with the problem of choice. Which technology should we learn? In fact, Kubernetes has become the default container management technology used by many large companies in the past two years, and Docker Swarm has gradually lost ground in the competition with Kubernetes. Now the container management field has gradually been dominated by Kubernetes. Therefore, it is suggested that you should consider whether this technology is used by many people in the industry when learning.

It should be noted that although Docker Swarm is defeated in the competition with Kubernetes, it has nothing to do with the Docker container engine. It is also the cornerstone of the technology in the entire container field. Kubernetes is nothing without it.

summary

Docker is the core and foundation of container technology. Docker Compose is a single host container orchestration tool based on Docker. Its functions are not as rich as those of Docker Swarm and Kubernetes, which are cross host container management platforms based on Dcoker.

]]>