2008-05-03

Hiberate Search 介绍与入门实例

关键字: hiberate search 介绍 入门 实例
官方介绍:

Hibernate Search brings the power of full text search engines to the persistence domain model and Hibernate experience, through transparent configuration (Hibernate Annotations) and a common API.

Full text search engines like Apache Lucene(tm) allow applications to execute free-text search queries. However, it becomes increasingly more difficult to index a more complex object domain model - keeping the index up to date, dealing with the mismatch between the index structure and the domain model, querying mismatches, and so on.

Hibernate Search abstracts you from these problems by solving:

The structural mismatch: Hibernate Search takes care of the object/index translation
The duplication mismatch: Hibernate Search manages the index, keeps changes synchronized with your database, and optimizes the index access transparently
The API mismatch: Hibernate Search lets you query the index and retrieve managed objects as any regluar Hibernate query would do
Hibernate Search is using Apache Lucene(tm) internally, and always provides the ability to fallback to the native Lucene APIs.

Depending on application needs, Hibernate Search works well in non-clustered and clustered mode, provides synchronous index updates and asynchronous index updates, letting you choose between response time, throughput and index update.



Hibernate Search项目的主要特性包含以下几个方面:

  • Lucene集成——作为强大高效的检索引擎,Lucene的美名早已久经考验了;
  • 数据的自动插入和更新——当一个对象通过Hibernate添加或更新时,索引也会相应进行透明的更新;
  • 支持众多复杂的搜索方式——可快速的使用通配符进行搜索,以及多关键词全文检索(multi-word text searches)和近似或同义词搜索(approximation/synonym searches),或根据相关性排列搜索结果;
  • 搜索集群(Search Clustering)——Hibernate Search提供了内建搜索集群解决方案,其中包括一个基于JMS的异步查询和索引系统;
  • 对Lucene API接口的直接调用——如果用户打算处理某些特别复杂的问题,可以在查询中直接使用Lucene提供的API接口;
  • 对Lucene的自动管理——Hibernate Search可以管理并优化Lucene的索引,并且非常高效地使用Lucene的API接口。


Hibernate Search相关的Annotation主要有三个:
  • @Indexed 标识需要进行索引的对象,
  • 属性 : index 指定索引文件的路径
  • @DocumentId 用于标示实体类中的唯一的属性保存在索引文件中,是当进行全文检索时可以这个唯一的属性来区分索引中其他实体对象,一般使用实体类中的主键属性
  • @Field 标注在类的get属性上,标识一个索引的Field
  • 属性 : index 指定是否索引,与Lucene相同
    store 指定是否索引,与Lucene相同
    name 指定Field的name,默认为类属性的名称
    analyzer 指定分析器

另外@IndexedEmbedded 与 @ContainedIn 用于关联类之间的索引
@IndexedEmbedded有两个属性,一个prefix指定关联的前缀,一个depth指定关联的深度



step-by-step实例:



1.在maven的pom.xml文件中加入以下三个依赖包:

 .......

<dependency>
   <groupId>org.hibernate</groupId>
   <artifactId>hibernate-search</artifactId>
   <version>3.0.1.GA</version>
</dependency>
<dependency>
   <groupId>org.hibernate</groupId>
   <artifactId>hibernate-annotations</artifactId>
   <version>3.3.0.ga</version>
</dependency>
<dependency>
   <groupId>org.hibernate</groupId>
   <artifactId>hibernate-entitymanager</artifactId>
   <version>3.3.1.ga</version>
</dependency>
.......



2.hiberante配置文件



主要就是添加两个属性,hibernate.search.default.directory_provider指定Directory的代理,即把索引的文件保存在硬盘中(org.hibernate.search.store.FSDirectoryProvider)还是内存里(org.hibernate.search.store.RAMDirectoryProvider),保存在硬盘的话hibernate.search.default.indexBase属性指定索引保存的路径.

......

    <!-- use a file system based index -->
    <prop
     key="hibernate.search.default.directory_provider">
     org.hibernate.search.store.FSDirectoryProvider
    </prop>
    <!-- directory where the indexes will be stored -->
    <prop key="hibernate.search.default.indexBase">
     D:/index
    </prop>

 .........




如果计划使用Hibernate Annotations或者EntityManager 3.2.x(已经嵌入到JBoss AS 4.2.GA中),那也需要配置相应的事件监听器。


<!-- use a file system based index -->
         <property name="hibernate.search.default.directory_provider" 
                   value="org.hibernate.search.store.FSDirectoryProvider"/>
         <!-- directory where the indexes will be stored -->
         <property name="hibernate.search.default.indexBase" 
                   value="D:/index"/>


<!--事件监听器-->
         <property name="hibernate.ejb.event.post-insert" 
                   value="org.hibernate.search.event.FullTextIndexEventListener"/>
         <property name="hibernate.ejb.event.post-update" 
                   value="org.hibernate.search.event.FullTextIndexEventListener"/>
         <property name="hibernate.ejb.event.post-delete" 
                   value="org.hibernate.search.event.FullTextIndexEventListener"/>

 


3.pojo



PromotionCase.java
import java.io.Serializable;
import java.util.Date;
import java.util.List;

import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.FetchType;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import javax.persistence.JoinColumn;
import javax.persistence.JoinTable;
import javax.persistence.ManyToMany;
import javax.persistence.Table;
import javax.persistence.Temporal;
import javax.persistence.TemporalType;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.hibernate.search.annotations.Analyzer;
import org.hibernate.search.annotations.DocumentId;
import org.hibernate.search.annotations.Field;
import org.hibernate.search.annotations.Index;
import org.hibernate.search.annotations.Indexed;
import org.hibernate.search.annotations.Store;

/**
 * @author david.wei
 * 
 */

@Entity()
@Table(name = "promotioncase")
@Indexed(index = "promotionCase")
public class PromotionCase implements Serializable {

 

 private Long id;

 private String name; 

 private String content; 

 private Short type; 

 private Short state;

 private Date beginTime;

 private Date endTime;

 // basic-->common
 private List<PromotionCase> promotionCases;

 @Id
 @GeneratedValue
 @DocumentId
 public Long getId() {
  return id;
 }

 public void setId(Long id) {
  this.id = id;
 }

 @Column(name = "name", length = 128)
 @Field(name = "name", index = Index.TOKENIZED, store = Store.YES, analyzer = @Analyzer(impl = StandardAnalyzer.class))
 public String getName() {
  return name;
 }

 public void setName(String name) {
  this.name = name;
 }

 @Column(name = "state", length = 50)
 @Field(name = "state", index = Index.TOKENIZED, store = Store.YES, analyzer = @Analyzer(impl = StandardAnalyzer.class))
 public Short getState() {
  return state;
 }

 public void setState(Short state) {
  this.state = state;
 }

 @Column(name = "type", length = 50)
 public Short getType() {
  return type;
 }

 public void setType(Short type) {
  this.type = type;
 }

 @Column(name = "content", length = 256)
 @Field(name = "content", index = Index.TOKENIZED, store = Store.YES, analyzer = @Analyzer(impl = StandardAnalyzer.class))
 public String getContent() {
  return content;
 }

 public void setContent(String content) {
  this.content = content;
 }

  @Column(name = "begin_time", length = 256)
 @Temporal(TemporalType.TIMESTAMP)
 public Date getBeginTime() {
  return beginTime;
 }

 public void setBeginTime(Date beginTime) {
  this.beginTime = beginTime;
 }

 @Column(name = "end_time", length = 256)
 @Temporal(TemporalType.TIMESTAMP)
 public Date getEndTime() {
  return endTime;
 }

 public void setEndTime(Date endTime) {
  this.endTime = endTime;
 }

 @ManyToMany(fetch = FetchType.LAZY, targetEntity = PromotionCase.class)
 @JoinTable(name = "promotionCase_ref", joinColumns = @JoinColumn(name = "normal_id", referencedColumnName = "id"), inverseJoinColumns = @JoinColumn(name = "basic_id", referencedColumnName = "id"))
 public List<PromotionCase> getPromotionCases() {
  return promotionCases;
 }

 public void setPromotionCases(List<PromotionCase> promotionCases) {
  this.promotionCases = promotionCases;
 }

}



在这里是对name和content两个字段建立了索引。

完成了以上步骤,当对对象进行增删改的时候hibernate会去执行相应的索引创建、删除或修改。



4.搜索

FullTextSession fullTextSession = Search
    .createFullTextSession(getPromotionCaseDao()
      .getSessionFactory().openSession());

  MultiFieldQueryParser parser = new MultiFieldQueryParser(new String[] {
    "name", "content" }, new StandardAnalyzer());

  org.apache.lucene.search.Query luceneQuery = parser.parse(indexString);
  FullTextQuery query = fullTextSession.createFullTextQuery(luceneQuery,
    PromotionCase.class);
 // 添加分页查询
  query.setFirstResult(3);
  query.setMaxResults(80);

  // 对查询结果按name进行排序
  org.apache.lucene.search.Sort sort = new Sort(new SortField("name"));
  query.setSort(sort);
List result = query.list();



简单的应用就是这么样子了.

项目开始了,这段时间有点忙,随便整理了一下.不是很全,研究的也不是很深入.等项目做完了再找个时间好好分析与整理下.
评论
weidewei 2008-05-17
这段时间好忙,都没时间来这里了.楼上的其实你这样创建索引也是没问题的,但是你要确保创建索引前你的查询
List list = factory.getCurrentSession().createQuery("from Vobject").list();

时list得到的数据不是乱码.
test_root 2008-05-07
我的索引文件是人工创建的:
SessionFactory factory = HibernateUtil.getSessionFactory();
FullTextSession fullTextSession = Search.createFullTextSession(factory.getCurrentSession());
Transaction tx = fullTextSession.beginTransaction();
List list = factory.getCurrentSession().createQuery("from Vobject").list();
for (Iterator iter = list.iterator(); iter.hasNext();) {
    Vobject element = (Vobject) iter.next();
    fullTextSession.index(element);
}
tx.commit();
fullTextSession.close();

如果我把数据库改为ms sql server的,字符集是中文的。(以前是oracle英文)
还是用上面的这段索引创建代码和搜索代码,
就可以搜索到结果,并且是正确的。
这是为什么呢?
weidewei 2008-05-06
这个是索引创建的时候有问题.我也碰到过.不知道你创建索引是通过怎么方式创建的.手工创建的索引通过hibernate-search去搜索是搜不到的.
test_root 2008-05-06
譬如我数据库的字符集是iso-8859-1的
我现在创建索引文件的时候,pojo就用StandardAnalyzer分析器
搜索的时候也用这个分析器,搜索中文关键字时也转换为数据库的字符集,结果还是查不到内容。但却有总记录数?
pojo
@Indexed
@Analyzer(impl = StandardAnalyzer.class)
public class Vobject {
	@DocumentId
	@FieldBridge(impl = ZtBridge.class)
	private ZchTableName id;

	@Field(store = Store.NO, index = Index.TOKENIZED)
	private String qymc;

	@Field(store = Store.NO, index = Index.TOKENIZED)
	private String dbr;
	private String djqx;
	private String zs;
	private String jyfw;

         // getters/setters 
}

标识符组件类:
public class ZchTableName implements Serializable {
	private String zch;
	private String tableName;
	
	public String getTableName() {
		return tableName;
	}
	public void setTableName(String tableName) {
		this.tableName = tableName;
	}
	public String getZch() {
		return zch;
	}
	public void setZch(String zch) {
		this.zch = zch;
	}
	@Override
	public int hashCode() {
		final int PRIME = 31;
		int result = 1;
		result = PRIME * result + ((tableName == null) ? 0 : tableName.hashCode());
		result = PRIME * result + ((zch == null) ? 0 : zch.hashCode());
		return result;
	}
	@Override
	public boolean equals(Object obj) {
		if (this == obj)
			return true;
		if (obj == null)
			return false;
		if (getClass() != obj.getClass())
			return false;
		final ZchTableName other = (ZchTableName) obj;
		if (tableName == null) {
			if (other.tableName != null)
				return false;
		} else if (!tableName.equals(other.tableName))
			return false;
		if (zch == null) {
			if (other.zch != null)
				return false;
		} else if (!zch.equals(other.zch))
			return false;
		return true;
	}
	@Override
	public String toString() {
		return this.getZch()+","+this.getTableName();
	}
}

Field Bridge:
public class ZtBridge implements TwoWayStringBridge {

	public String objectToString(Object arg0) {
		ZchTableName zt = (ZchTableName)arg0;
		return zt.toString();
	}

	public Object stringToObject(String arg0) {
		String[] s = arg0.split(",");
		ZchTableName zt = new ZchTableName();
		zt.setZch(s[0]);
		zt.setTableName(s[1]);
		return zt;
	}

}

搜索部分代码:
QueryParser parser = new QueryParser("qymc", new StandardAnalyzer());
Query luceneQuery = parser.parse(new String("饭店".getBytes("gbk"),"iso-8859-1"));	    
			
			Searcher searcher = new IndexSearcher("F:/hibernate-search/indexbj/com.mytest.model.Vobject");
			Hits hits = searcher.search(luceneQuery);			//此处可得到总记录数52条
			result.setRowCount(hits.length());
			
			org.hibernate.Query query = fullTextSession.createFullTextQuery(luceneQuery, Vobject.class);			query.setFirstResult(1);
			query.setMaxResults(10)
                           
			List list = query.list();//结果list为空?

控制台输出如下语句:
Hibernate: select this_.zch as zch0_0_, this_.c00 as c2_0_0_, this_.qymc as qymc0_0_, this_.qyzt as qyzt0_0_, this_.qydl as qydl0_0_, this_.dbr as dbr0_0_, this_.gxqx as gxqx0_0_, this_.jyfw as jyfw0_0_, this_.zs as zs0_0_ from td_vobject this_ where ((this_.zch, this_.c00) in ((?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?, ?), (?, ?)))
为什么可得到总记录数52条,但
query.list();//结果list为空呢?
weidewei 2008-05-05
你的理解好象有问题吧.首先,进行搜索的时候不是去数据库查询的,而是直接去搜索索引文件读取内容的.

第二个问题你查不到是很正常的,因为你的数据库变成中文了,但是你的索引文件在被创建的时候数据是乱码的,在索引文件中的内容也就是乱码,去搜索当然是搜不到的了.

其实ChineseAnalyzer根本就不需要用的,因为在2.0以后的lucene版本里StandardAnalyzer分析器也是可以很好的支持中文的分词搜索.
test_root 2008-05-05
int count = query.list().size()
调用后,hibernate向数据库发送的不是一条count语句,若总记录数达到几万条时,此语句严重影响性能,程序几乎要死掉,这样好像行不通。
org.apache.lucene.search.Query luceneQuery = parser.parse(indexString);
我认为这条语句执行后,lucene就从索引文件中得到了所有符合条件的pojo主键,这就是总的记录数。但luceneQuery没有相应的方法。
query.list()
就是根据这些这些主键向数据库发出查询语句并返回对象。
不知道这个理解正确吗?

第二个问题是中文编码为数据库字符集后保存在数据库里的,从数据取出后通过编码可以转换中文。
QueryParser parser = new QueryParser("content", new ChineseAnalyzer());
parser.parse("中文");
象这样就无法查询到结果。
weidewei 2008-05-04
添加分页之前加
int count = query.list().size();

就可以得到记录总数啦.

第二个问题还没研究过.不过奇怪的是数据库怎么会是乱吗? 我觉得通过修改设置应该是可以解决的数据库乱码问题的.
test_root 2008-05-04
你好,请教如下问题
1、添加分页查询时,如何统计总记录数呢?

2、如果我的数据库英字符集的,中文内容保存后都是乱码,
我该如何实例化org.apache.lucene.queryParser.QueryParser,及如何调用
parser.parse()方法?
谢谢
发表评论

您还没有登录,请登录后发表评论

weidewei
搜索本博客
我的相册
0490951c-c71b-355b-8bc9-40e9ed12d2c9-thumb
expanding_universe_1400x904
共 20 张
存档
最新评论