In the last chapter, I gave you the principle of search results page optimization, or my personal opinion.The drainage and conversion of search result pages.Optimization of each moduleoperateMethods and indicators.Then this chapter will discuss it,On site searchThe process of how the user's search term is turned into the search results page in front of us.
To facilitate your understanding, here is a flow chart that is easy to understand. There is a general context, and I will introduce it to you according to this context.Not much to say, above!(I simplified it a lot in order to appear less product flavor and easy to understand)
Generally speaking, the dry splitting process is tasteless, so we will take examples to go through it, and almost understand it. Let's make it clear that what I said is not necessarily true, and don't follow blindly.
Let's start with Chinese, the key word is "men's printed T-shirt"
First, you will enterPretreatment, this phase will remove the useless parts of the keyword, such as stop words.Among them, "de" is the part that we want to cut off. In fact, in the pre-processing stage, some useless spaces on the left and right sides will be eliminated. How to determine the stop words? Any word may be useful in Chinese, but in a specific language environment, many phrases become stop words, such as obscene words, extreme sensitive words (refer to the advertising law), etc.
After pretreatment, enterIntelligent error correction or manual rewriting, you need to determine whether this keyword has typos (algorithm/artificial thesaurus) and whether it hits the artificially rewritten thesaurus (tyrant function).
After this process, enterlanguage identification Link, domestic e-commerce also supports the English environment, and the keyword will belanguage identification Confirm the language environment in which the word should be searched. Some e-commerce companies do not have this link because it is really unnecessary.It is necessary for export oriented cross-border e-commerce, such as AliExpress, Xiapi, Amazon, etc.
Enter at this timePart of speech restorationAt the stage, part of speech restoration, as its name implies, refers to singular and plural restoration, tense restoration, word stem extraction, etc. For English, it refers to identifying the keyword backbone, men's printed T-shirt ("de" has been removed from pretreatment), and the whole word is the backbone.
Then enterParticiple stageAt this time, the word segmentation system will segment the "men's printed T-shirt". Generally speaking, the Chinese language will perform n-gram multi granularity word segmentation.The result of word segmentation is as follows: man/gentleman/stamp/flower/t/T-shirt/man/stamp/T-shirt/man's stamp/stamp T-shirt/man's T-shirt/.
It doesn't matter if you don't know about the above ngram, which will be covered in the following special algorithm chapter.For some phonetic characters, such as English, French, Indonesian, etc., the space segmentation method is based on the space between keywords, such as "women dress".
Why is it different from Chinese? In fact, English also has multi granularity word segmentation, and Chinese word segmentation is based on the word combination rationality in the dictionary, but there are some differences between Chinese and phonological language.
Extend it herePhonological language and structural language have two kinds of decidedly different meaning capacity and precision in language meaning.I.e. wordsConnotative capacity:Scope of expression of language meaning of single word;Precision: the range value accurately described by the language of a single word, the smaller the range value, the higher the precision value.
The origin of structural language comes from hieroglyphics, that is, words are structured according to the shape of objects, excluding literary expression. The expression content of basic words requires multiple words to form complete and accurate meanings. The meaning of a single word is extensive and lacks precision.
Phonetic language and characters originated from the combination of letters. A small number of letter combinations form the root, which is used as the basis for language extension.Expand and obtain more semantic words through fewer roots, and expand vocabulary branches as a hierarchy.Expands outward from the root change, the smaller the deformation, the closer the meaning is to the root, and the larger the deformation, the farther the meaning is from the root
So we getHypothetical conclusion:
Phonetic characters: the range of meaning capacity of phonetic characters is low, and the precision value is high;
Chinese: hieroglyphics, with a high range of word meaning capacity and low precision value.
The word segmentation method of multi granularity phrase segmentation in Chinese search is largely based on the large meaning capacity of words in Chinese search, resulting in inaccurate accuracy. Therefore, it is necessary to use multiple words to confirm the specific meaning of the search words.
Query (English): Men Print T-Shirt: men/print/t-shirt/men t-shirt/print t-shirt/.
The principles of both are roughly the same. In addition, I just want to let you know that there are some differences in word segmentation between different languages, and it can't "eat all the fresh food in one move".
After word segmentation, the system entersSynonym expansion Links, dictionaries and manually maintained synonym thesaurus are used to expand the keywords after word segmentation. For specific examples, printing and dyeing are synonyms, and men are synonyms with boys and men. These synonyms will be added to the word segmentation together to enter the matching recall link.
get intoMatching recallAt this stage, take a look at this picture first. Similarly, I used my former colleague's PPT screenshots as a display. I'm tired of seeing his screenshots of examples that have remained unchanged for thousands of years. You can see that the whole word matching recall is used.
what do you mean?
The segmentation result of men's printed T-shirts requires that all of them match the product name or attribute description at the same granularity before recalling the product. It is not allowed to add one less match.
In addition, the weight of multi word granularity is greater than that of word granularity, which means that phrase matching takes precedence over word matching.
When the phrase does not match, then match the word. Of course, there is no meaning in matching the word in Chinese. Generally, Chinese basically matches the phrase.(Some examples of my Chinese word segmentation are not very appropriate)
After the matching recall, enter the "head nodding" link, also known asConfirm whether the product is "no result" or "less result"No result means that the keyword can't find goods. A few results means that the keyword can find 8 products or less. Some e-commerce companies set the number of results less than 4 or 12. Anyway, everyone knows that.
After the head is clicked, it enters the stage of large-scale sorting, category sorting.
We call this linkCategory prediction,Put the category most related to the keyword at the front (it should be known that the product set of these categories should also match the whole keyword. It does not mean that all the products of the category will be placed at the front).
Category prediction is generally carried out through algorithms, supplemented by manual intervention.At this time, the display range of filter item parameters (that is, the parameters under this category) is also confirmed, and the top category will also confirm whether to fire display at this time.
After category prediction, startProduct sorting, predicted categories are sorted separately from non predicted categories.There are various sorting algorithms, which are based on user behavior data and commodity comprehensive score algorithm.And then throughWeb renderingAfter that, we see the search results page,
You see, it's simple. I'll come here first today.
Preview the content of the next chapter: a comprehensive analysis of the prediction of search categories in the site
#Columnist#
Author: Wang Huan, WeChat: wanghuan314400, a small piece of operation ash.
This article is reproduced inWang Huan, this article does not represent the love operation position, please contact the original source for reprinting.If there are any copyright problems with the content and pictures, please contact Love Operation.