Comprehensive Guide to On site Search of E-commerce Operation (V)

In the last chapter, I gave you the principle of search results page optimization, or my personal opinion. The drainage and conversion of search result pages. Optimization of each module operate Methods and indicators. Then this chapter will discuss it, On site search The process of how the user's search term is turned into the search results page in front of us.

To facilitate your understanding, here is a flow chart that is easy to understand. There is a general context, and I will introduce it to you according to this context. Not much to say, above! (I simplified it a lot in order to appear less product flavor and easy to understand)

 Comprehensive Guide to On site Search of E-commerce Operation (V)

Generally speaking, the dry splitting process is tasteless, so we will take examples to go through it, and almost understand it. Let's make it clear that what I said is not necessarily true, and don't follow blindly.

Let's start with Chinese, the key word is "men's printed T-shirt"

First, you will enter Pretreatment , this phase will remove the useless parts of the keyword, such as stop words. Among them, "de" is the part that we want to cut off. In fact, in the pre-processing stage, some useless spaces on the left and right sides will be eliminated. How to determine the stop words? Any word may be useful in Chinese, but in a specific language environment, many phrases become stop words, such as obscene words, extreme sensitive words (refer to the advertising law), etc.

After pretreatment, enter Intelligent error correction or manual rewriting , you need to determine whether this keyword has typos (algorithm/artificial thesaurus) and whether it hits the artificially rewritten thesaurus (tyrant function).

After this process, enter language identification Link, domestic e-commerce also supports the English environment, and the keyword will be language identification Confirm the language environment in which the word should be searched. Some e-commerce companies do not have this link because it is really unnecessary. It is necessary for export oriented cross-border e-commerce, such as AliExpress, Xiapi, Amazon, etc.

Enter at this time Part of speech restoration At the stage, part of speech restoration, as its name implies, refers to singular and plural restoration, tense restoration, word stem extraction, etc. For English, it refers to identifying the keyword backbone, men's printed T-shirt ("de" has been removed from pretreatment), and the whole word is the backbone.

Then enter Participle stage At this time, the word segmentation system will segment the "men's printed T-shirt". Generally speaking, the Chinese language will perform n-gram multi granularity word segmentation. The result of word segmentation is as follows: man/gentleman/stamp/flower/t/T-shirt/man/stamp/T-shirt/man's stamp/stamp T-shirt/man's T-shirt/.

It doesn't matter if you don't know about the above ngram, which will be covered in the following special algorithm chapter. For some phonetic characters, such as English, French, Indonesian, etc., the space segmentation method is based on the space between keywords, such as "women dress".

Why is it different from Chinese? In fact, English also has multi granularity word segmentation, and Chinese word segmentation is based on the word combination rationality in the dictionary, but there are some differences between Chinese and phonological language.

Extend it here Phonological language and structural language have two kinds of decidedly different meaning capacity and precision in language meaning. I.e. words Connotative capacity Scope of expression of language meaning of single word Precision: the range value accurately described by the language of a single word , the smaller the range value, the higher the precision value.

The origin of structural language comes from hieroglyphics, that is, words are structured according to the shape of objects, excluding literary expression. The expression content of basic words requires multiple words to form complete and accurate meanings. The meaning of a single word is extensive and lacks precision.

Phonetic language and characters originated from the combination of letters. A small number of letter combinations form the root, which is used as the basis for language extension. Expand and obtain more semantic words through fewer roots, and expand vocabulary branches as a hierarchy. Expands outward from the root change, the smaller the deformation, the closer the meaning is to the root, and the larger the deformation, the farther the meaning is from the root

So we get Hypothetical conclusion

  • Phonetic characters: the range of meaning capacity of phonetic characters is low, and the precision value is high;
  • Chinese: hieroglyphics, with a high range of word meaning capacity and low precision value.

The word segmentation method of multi granularity phrase segmentation in Chinese search is largely based on the large meaning capacity of words in Chinese search, resulting in inaccurate accuracy. Therefore, it is necessary to use multiple words to confirm the specific meaning of the search words.

Let's experience:

  • Query (Chinese): Men's printed T-shirt. Words cut: man/gentleman/stamp/flower/t/T-shirt/man/stamp/T-shirt/man's stamp/printed T-shirt/man's T-shirt/;
  • Query (English): Men Print T-Shirt: men/print/t-shirt/men t-shirt/print t-shirt/.

The principles of both are roughly the same. In addition, I just want to let you know that there are some differences in word segmentation between different languages, and it can't "eat all the fresh food in one move".

After word segmentation, the system enters Synonym expansion Links, dictionaries and manually maintained synonym thesaurus are used to expand the keywords after word segmentation. For specific examples, printing and dyeing are synonyms, and men are synonyms with boys and men. These synonyms will be added to the word segmentation together to enter the matching recall link.

get into Matching recall At this stage, take a look at this picture first. Similarly, I used my former colleague's PPT screenshots as a display. I'm tired of seeing his screenshots of examples that have remained unchanged for thousands of years. You can see that the whole word matching recall is used.

what do you mean?

The segmentation result of men's printed T-shirts requires that all of them match the product name or attribute description at the same granularity before recalling the product. It is not allowed to add one less match.

In addition, the weight of multi word granularity is greater than that of word granularity, which means that phrase matching takes precedence over word matching.

When the phrase does not match, then match the word. Of course, there is no meaning in matching the word in Chinese. Generally, Chinese basically matches the phrase. (Some examples of my Chinese word segmentation are not very appropriate)

 Comprehensive Guide to On site Search of E-commerce Operation (V)

After the matching recall, enter the "head nodding" link, also known as Confirm whether the product is "no result" or "less result" No result means that the keyword can't find goods. A few results means that the keyword can find 8 products or less. Some e-commerce companies set the number of results less than 4 or 12. Anyway, everyone knows that.

After the head is clicked, it enters the stage of large-scale sorting, category sorting.

We call this link Category prediction, Put the category most related to the keyword at the front (it should be known that the product set of these categories should also match the whole keyword. It does not mean that all the products of the category will be placed at the front).

Category prediction is generally carried out through algorithms, supplemented by manual intervention. At this time, the display range of filter item parameters (that is, the parameters under this category) is also confirmed, and the top category will also confirm whether to fire display at this time.

After category prediction, start Product sorting , predicted categories are sorted separately from non predicted categories. There are various sorting algorithms, which are based on user behavior data and commodity comprehensive score algorithm. And then through Web rendering After that, we see the search results page,

You see, it's simple. I'll come here first today.

Preview the content of the next chapter: a comprehensive analysis of the prediction of search categories in the site

#Columnist#

Author: Wang Huan, WeChat: wanghuan314400, a small piece of operation ash.

Operation article of last year and today

  1. 2023:   "It's bread, it's air, it's a miracle" communication plan (0)
  2. 2023:   Exploring the framework construction of knowledge system (0)
  3. 2023:   100 Thinking Models 046. Reverse Thinking Model (0)
  4. 2023:   People's Daily recommends: 9 good ways to get out of the slump, and it is recommended to collect them forever! (0)
  5. 2023:   The Tomato Working Method of the World's Top Ten Learning Methods (0)

This article is reproduced in Wang Huan , this article does not represent the love operation position, please contact the original source for reprinting. If there are any copyright problems with the content and pictures, please contact Love Operation.

fabulous (0)
 Avatar of Love Operation Love operation administrators
Previous 10:59 am, September 18, 2019
Next September 18, 2019

Recommended information

Post reply

Sign in Can't comment until
Share this page
Back to top