网站首页php
Coreseek中文全文检索安装及使用
发布时间:2016-11-30 18:48:02编辑:阅读(2377)
最近在开发政策库系统的时候,有一个政策全文搜索的需求,用到了这个coreseek。
coreseek算是sphinx的中文版本,详细内容可前往:http://www.coreseek.cn。
下面分享一些安装和使用中的体验.
系统里已经做好了mysql和php的基本检索。
1. 编译安装coreseek. 路上遇到的各种问题请移步网站常见问题 : )
a>$ wget http://www.coreseek.cn/uploads/csft/4.0/coreseek-4.1-beta.tar.gz b>$ tar xzvf coreseek-4.1-beta.tar.gz c>$ cd coreseek-4.1-beta ##安装mmseg d>$ cd mmseg-3.2.14 e>$ ./bootstrap f>$ ./configure --prefix=/usr/local/mmseg g>$ make && make install h>$ cd .. ##安装coreseek i>$ cd csft-4.1/ j>$ sh buildconf.sh k>$ ./configure --prefix=/usr/local/coreseek --without-python --with-mysql=/usr/local/mysql --with-mmseg=/usr/local/mmseg --with-mmseg-includes=/usr/local/mmseg/include/mmseg/ --with-mmseg-libs=/usr/local/mmseg/lib/ l>$ make && make install
2. 配置coreseek
a>$ cp /usr/local/coreseek/etc/sphinx.conf.dist /usr/local/coreseek/etc/sphinx.conf b>$ vim /usr/local/coreseek/etc/sphinx.conf source content { type = mysql sql_host = localhost sql_user = DB_USER sql_pass = DB_PASSWORD sql_db = DB_NAME sql_port = 3306 sql_query_pre = SET NAMES utf8 sql_query = \ SELECT a.id, group_id, date_added, a.title, b.content FROM `news` a INNER JOIN `newscontent` b ON a.id=b.id WHERE a.id=$id sql_attr_uint = group_id sql_attr_timestamp = pub_time sql_query_info = SELECT * FROM contents WHERE id=$id } index content { source = content path = /usr/local/coreseek/var/data/content docinfo = extern charset_dictpath = /usr/local/mmseg/etc/ charset_type = zh_cn.utf-8 ngram_len = 0 } indexer { mem_limit = 32M } searchd { port = 9312 log = /usr/local/coreseek/var/log/searchd.log query_log = /usr/local/coreseek/var/log/query.log read_timeout = 5 max_children = 30 pid_file = /usr/local/coreseek/var/log/searchd.pid max_matches= 1000 seamless_rotate = 1 preopen_indexes= 1 unlink_old = 1 } c>$ mysql的默认连接字符集也要设置为utf8, 直接在my.cnf里加入: character_set_server=utf8
3. 定时任务更新索引
A、在coreseek目录下,新建3个sh脚本,以便操作: a>stop.sh ##停止服务 #!/bin/bash /usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/sphinx.conf --stop b>build.sh ##建立索引 #!/bin/bash /usr/local/coreseek/bin/indexer -c /usr/local/coreseek/etc/sphinx.conf --all --rotate c>start.sh ##启动服务 #!/bin/bash /usr/local/coreseek/bin/searchd -c /usr/local/coreseek/etc/sphinx.conf B、添加可执行权限: chmod +x start.sh chmod +x stop.sh chmod +x build.sh C、加入定时任务 $ crontab -e 0 2 * * * sh /usr/local/coreseek/build.sh >/dev/null 2>&1 D、运行start.sh后,使用crontab定时执行build.sh,就可更新索引,更新过索引后, 可以执行: $ /usr/local/coreseek/bin/search -c /usr/local/coreseek/etc/sphinx.conf -a 国家标准化管理委员会 即可看到执行结果。
4. 更新搜索代码
在/usr/local/src/coreseek-4.1-beta/csft-4.1/api目录下提供了PHP的接口文件 sphinxapi.php, 这个文件包含一个SphinxClient的类,copy到自己的web目录,PHP code如下: require("sphinxapi.php"); $s = new SphinxClient; $s->SetServer("localhost", 9312); $s->setLimits($page, $pageSize); $result = $s->Query('@title (测试) @content (网络)', "*"); /***在result['matches']中即为匹配结果, result['total']为匹配数量***/ echo '<pre>'; print_r($res['matches']); echo '</pre>'; if( isset($result['matches']) && !empty($result['matches']) ){ foreach($result['matches'] as $k=>$v){ array_push($resultid, $k); } $resultid = implode(',', $resultid); /***执行下面的sql语句,返回查询到的结果***/ $sql = "SELECT * FROM `news` WHERE id IN ({$resultid})"; //... }
评论