mysql全文索引之停止词(stopword)

标签: mysql 索引 stopword | 发表时间:2014-10-20 21:23 | 作者:pxczy
出处:http://www.iteye.com

本文IT技术学习网将给大家讲述什么是mysql全文索引中的停止词(stopword也有的翻译做停止字)。

stopword

在全文索引中,如果一个词被认为是太普通或者太没价值,那么它将会被搜索索引和搜索查询忽略。innodb和myisam分别有两组不同的设置,控制着对应的stopword。

全文检索时,停止词列表将会被读取和检索,在不同的字符集和排序方式下(character_set_server and collation_server 系统变量),可能会导致在搜索时的停止词的不匹配。

停止词是否大小写敏感,取决于不同的排序方式,比如:latin1_swedish_ci下停止词是大小写敏感的,latin1_general_cs 或 latin1_bin下停止词是大小写不敏感的。

innodb的索引停止词

innodb的默认停止词列表很短。查询INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD表来查看默认的innodb停止词表。

      mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;

 

      +-------+

 

      | value |

 

      +-------+

 

      | a     |

 

      | about |

 

      | an    |

 

      | are   |

 

      | as    |

 

      | at    |

 

      | be    |

 

      | by    |

 

      | com   |

 

      | de    |

 

      | en    |

 

      | for   |

 

      | from  |

 

      | how   |

 

      | i     |

 

      | in    |

 

      | is    |

 

      | it    |

 

      | la    |

 

      | of    |

 

      | on    |

 

      | or    |

 

      | that  |

 

      | the   |

 

      | this  |

 

      | to    |

 

      | was   |

 

      | what  |

 

      | when  |

 

      | where |

 

      | who   |

 

      | will  |

 

      | with  |

 

      | und   |

 

      | the   |

 

      | www   |

 

      +-------+

 

    36 rows in set (0.00 sec)

myisam索引的停止词

myisam索引的停止词列表与innodb不同,默认的myisam停止词列表是直接在mysql程序源码中已写入。设置ft_stopword_file系统变量来指定停止词文件,从而覆盖默认设置。

在mysql源程序的 storage/myisam/ft_static.c file文件中,你可以找到默认的myisam停止词列表:

      a's able about above according

 

      accordingly across actually after afterwards

 

      again against ain't all allow

 

      allows almost alone along already

 

      also although always am among

 

      amongst an and another any

 

      anybody anyhow anyone anything anyway

 

      anyways anywhere apart appear appreciate

 

      appropriate are aren't around as

 

      aside ask asking associated at

 

      available away awfully be became

 

      because become becomes becoming been

 

      before beforehand behind being believe

 

      below beside besides best better

 

      between beyond both brief but

 

      by c'mon c's came can

 

      can't cannot cant cause causes

 

      certain certainly changes clearly co

 

      com come comes concerning consequently

 

      consider considering contain containing contains

 

      corresponding could couldn't course currently

 

      definitely described despite did didn't

 

      different do does doesn't doing

 

      don't done down downwards during

 

      each edu eg eight either

 

      else elsewhere enough entirely especially

 

      et etc even ever every

 

      everybody everyone everything everywhere ex

 

      exactly example except far few

 

      fifth first five followed following

 

      follows for former formerly forth

 

      four from further furthermore get

 

      gets getting given gives go

 

      goes going gone got gotten

 

      greetings had hadn't happens hardly

 

      has hasn't have haven't having

 

      he he's hello help hence

 

      her here here's hereafter hereby

 

      herein hereupon hers herself hi

 

      him himself his hither hopefully

 

      how howbeit however i'd i'll

 

      i'm i've ie if ignored

 

      immediate in inasmuch inc indeed

 

      indicate indicated indicates inner insofar

 

      instead into inward is isn't

 

      it it'd it'll it's its

 

      itself just keep keeps kept

 

      know known knows last lately

 

      later latter latterly least less

 

      lest let let's like liked

 

      likely little look looking looks

 

      ltd mainly many may maybe

 

      me mean meanwhile merely might

 

      more moreover most mostly much

 

      must my myself name namely

 

      nd near nearly necessary need

 

      needs neither never nevertheless new

 

      next nine no nobody non

 

      none noone nor normally not

 

      nothing novel now nowhere obviously

 

      of off often oh ok

 

      okay old on once one

 

      ones only onto or other

 

      others otherwise ought our ours

 

      ourselves out outside over overall

 

      own particular particularly per perhaps

 

      placed please plus possible presumably

 

      probably provides que quite qv

 

      rather rd re really reasonably

 

      regarding regardless regards relatively respectively

 

      right said same saw say

 

      saying says second secondly see

 

      seeing seem seemed seeming seems

 

      seen self selves sensible sent

 

      serious seriously seven several shall

 

      she should shouldn't since six

 

      so some somebody somehow someone

 

      something sometime sometimes somewhat somewhere

 

      soon sorry specified specify specifying

 

      still sub such sup sure

 

      t's take taken tell tends

 

      th than thank thanks thanx

 

      that that's thats the their

 

      theirs them themselves then thence

 

      there there's thereafter thereby therefore

 

      therein theres thereupon these they

 

      they'd they'll they're they've think

 

      third this thorough thoroughly those

 

      though three through throughout thru

 

      thus to together too took

 

      toward towards tried tries truly

 

      try trying twice two un

 

      under unfortunately unless unlikely until

 

      unto up upon us use

 

      used useful uses using usually

 

      value various very via viz

 

      vs want wants was wasn't

 

      way we we'd we'll we're

 

      we've welcome well went were

 

      weren't what what's whatever when

 

      whence whenever where where's whereafter

 

      whereas whereby wherein whereupon wherever

 

      whether which while whither who

 

      who's whoever whole whom whose

 

      why will willing wish with

 

      within without won't wonder would

 

      wouldn't yes yet you you'd

 

      you'll you're you've your yours

 

      yourself yourselves zero





已有 0 人发表留言,猛击->> 这里<<-参与讨论


ITeye推荐



相关 [mysql 索引 stopword] 推荐:

mysql全文索引之停止词(stopword)

- - 操作系统 - ITeye博客
本文IT技术学习网将给大家讲述什么是mysql全文索引中的停止词(stopword也有的翻译做停止字). 在全文索引中,如果一个词被认为是太普通或者太没价值,那么它将会被搜索索引和搜索查询忽略. innodb和myisam分别有两组不同的设置,控制着对应的stopword. 全文检索时,停止词列表将会被读取和检索,在不同的字符集和排序方式下(character_set_server and collation_server 系统变量),可能会导致在搜索时的停止词的不匹配.

ElasticSearch 索引 VS MySQL 索引

- - crossoverJie's Blog
这段时间在维护产品的搜索功能,每次在管理台看到 elasticsearch 这么高效的查询效率我都很好奇他是如何做到的. 这甚至比在我本地使用 MySQL 通过主键的查询速度还快. 这类问题网上很多答案,大概意思呢如下:. Lucene 的全文检索引擎,它会对数据进行分词后保存索引,擅长管理大量的索引数据,相对于.

[MySQL] B+树索引

- - CSDN博客推荐文章
B+树是一种经典的数据结构,由平衡树和二叉查找树结合产生,它是为磁盘或其它直接存取辅助设备而设计的一种平衡查找树,在B+树中,所有的记录节点都是按键值大小顺序存放在同一层的叶节点中,叶节点间用指针相连,构成双向循环链表,非叶节点(根节点、枝节点)只存放键值,不存放实际数据. 保持树平衡主要是为了提高查询性能,但为了维护树的平衡,成本也是巨大的,当有数据插入或删除时,需采用拆分节点、左旋、右旋等方法.

mysql 索引技巧

- - 小彰
MySQL索引的建立对于MySQL的高效运行是很重要的. 下面介绍几种常见的MySQL索引类型. 在数据库表中,对字段建立索引可以大大提高查询速度. 假如我们创建了一个 mytable表:. CREATE TABLE mytable(   ID INT NOT NULL,    username VARCHAR(16) NOT NULL  );   我们随机向里面插入了10000条记录,其中有一条:5555, admin.

mysql选择索引

- - CSDN博客数据库推荐文章
1、尽量为用来搜索、分类或分组的数据列编制索引,不要为作为输出显示的数据列编制索引. 最适合有索引的数据列是那些在where子句中数据列,在联结子句中出现的数据列,或者是在Group by 、Order by子句中出现的数据列. select 后的数据列最好不要用索引. 2、综合考虑各数据列的维度.

mysql 索引详解

- - 行业应用 - ITeye博客
本文以MySQL数据库为研究对象,讨论与数据库索引相关的一些话题. 特别需要说明的是,MySQL支持诸多存储引擎,而各种存储引擎对索引的支持也各不相同,因此MySQL数据库支持多种索引类型,如BTree索引,哈希索引,全文索引等等. 为了避免混乱,本文将只关注于BTree索引,因为这是平常使用MySQL时主要打交道的索引,至于哈希索引和全文索引本文暂不讨论.

mysql索引认识

- - 数据库 - ITeye博客
数据在磁盘中是以 “块”的形式存储的,所以一张表涉及的数据可能会存在多个块中,而在磁盘中查询数据则会根据字段是否为有序与无序来区分,. 无序情况:1.数值具有唯一性则需要查找 总块数/2.                   2.无序+无唯一性则需要查找  总块数. 有序情况:1.数值唯一性:log2(总块数/2)   (log2是二分查找算法).

MySQL 索引方式

- - zzm
本文配图来自《高性能MySQL(第二版)》. 在数据库中,对性能影响最大的几个策略包括数据库的锁策略、缓存策略、索引策略、存储策略、执行计划优化策略. 索引策略决定数据库快速定位数据的效率,存储策略决定数据持久化的效率. MySQL中两大主要存储引擎MyISAM和InnoDB采用了不同的索引和存储策略,本文将分析它们的异同和性能.

MySql索引总结

- - 掘金后端
MySQL 索引底层数据结构.   Mysql索引使用的数据结构主要有 BTree索引 和 Hash索引. 对于Hash索引来说,底层数据结构就是哈希表,因此在绝大多数需求为单条记录查询的时候,使用Hash索引查询性能最快. 其余大多数场景建议使用BTree索引. 为什么索引能够提高查询速度.

Mysql-innodb-B+索引

- - 掘金后端
这是读书笔记,Mysql,innodb系列一共3篇. Mysql-innodb-B+索引(本篇). Mysql-innodb-锁(预计20200523). Mysql-innodb-事务预计20200530). CREATE TABLE `aid_***_detail` ( //省略所有字段 PRIMARY KEY (`id`), KEY `range_idx` (`range_id`,`is_delete`,`range_detail_num`,`goods_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4复制代码.