自己做的一个操作索引的例子
2005-04-29 16:09
387 查看
建立的字段包括:ID=digitalID,title,contents几个字段,可以根据需要增加新的字段,其中digitaiID做了唯一性处理,在添加之前进行删除更新,ID是作为path存储的,不能索引,只能使用,在这里这样做没有什么实际的价值,其他字段都是字符串,由于所有的数据都是取自数据库的,所以这样做比较好,例子的最后有一段是建立文件索引的,直接引用别人的代码了。
上述方法的查询时候不能直接使用一些特殊的符号:+-[](){}&&||"等等,如果要是用需要进行转义,查询的给定条件需要自己定义:比如查询title和contents同时满足的需要这样构造:
field:contents
text:title:text AND text
或的关系
field:contents
text:title:text OR text 或 text:title:text text
/**
* 建立指定的编码的图片的索引
* 1.索引的内容包含图的标题,描述关键字
* 2.上述内容用字符串的形式存储
* 3.给定的建立索引的目录必须是空目录或者存在索引文件的目录
* @param astrID String 给定的数字对象的编码
* @param astrTitle String 给定的关键
* @param astrContent String 给定的索引内容(包括标题,描述)
* @throws java.lang.Exception 数据处理错误
*/
private vo
4000
id addIndex(String astrID, String astrTitle, String astrContent) throws
Exception {
boolean build = false;
//是否要建立索引文件
String tmpPath = mstrIndexPath + "/segments";
File exist = new File(tmpPath);
if (!exist.exists() || exist.isDirectory()) { //给定的文件不存在,需要建立新的索引文件
build = true;
}
//用指定的语言分析器构造一个索引
IndexWriter writer = new IndexWriter(mstrIndexPath,
new StandardAnalyzer(), build);
//writer.mergeFactor = 10;
//mergeFactor是一个针对批量索引的优化参数,控制多少条处理完多少条记录(Document)后,写入一次索引,写入频率越高,内存使用越少,但索引速度越慢,所以在大批量数据导入时需要增大文件写入的间隔,多让索引在内存中操作。
org.apache.lucene.document.Document doc = new org.apache.lucene.
document.Document();
doc.add(Field.UnIndexed("ID", astrID));
doc.add(Field.Keyword("digitalID", astrID));
doc.add(Field.Text("title", astrTitle));
doc.add(Field.Text("contents", astrContent));
writer.addDocument(doc);
writer.optimize();
writer.close();
}
/**
* 修改指定索引编码的索引记录:
* 修改的方法是在原有索引的基础上删除相同的索引号,添加新的索引号
* @param astrID String 指定索引的索引编码
* @param astrTitle 指定的关键字
* @param astrContent String 指定的内容
* @throws Exception 可能的异常
*/
public synchronized void modIndex(String astrID, String astrTitle,
String astrContent) throws
Exception {
File file = new File(mstrIndexPath + "/segments");
if (file.exists() && !file.isDirectory()) {
//删除存在的ID
Directory directory = FSDirectory.getDirectory(mstrIndexPath, false);
IndexReader reader = IndexReader.open(directory);
reader.unlock(directory);
//删除指定的索引
Term term = new Term(INDEX_ID, astrID);
int delete = reader.delete(term);
/*for (int i = 0; i < reader.maxDoc(); i++) {
if (!reader.isDeleted(i)) {
Document doc = reader.document(i); //取索引的一行记录
String fValue = doc.get("digitalID"); //取指定的字段值
if (fValue.equals(astrID)) {
reader.delete(i);
}
}
}*/
reader.close();
directory.close();
}
//添加一条记录
addIndex(astrID, astrTitle, astrContent);
}
/**
* 删除指定字段的索引
* @param field String 指定的字段
* @param txt String 指定字段的值
* @throws Exception 可能的异常
*/
public synchronized void delIndex(String field, String txt) throws
Exception {
//删除存在的ID
Directory directory = FSDirectory.getDirectory(mstrIndexPath, false);
IndexReader reader = IndexReader.open(directory);
reader.unlock(directory);
//删除指定的索引
Term term = new Term(field, txt);
int delete = reader.delete(term);
reader.close();
directory.close();
}
/**
* 查找给定串的数字对象,返回所有的数字对象编码集合
* @param astrField String 给定的域
* @param astrKey String 给定的查询字符串
* @return String[] 编码ID数组
* @throws Exception
*/
public String[] search(String astrField, String astrKey) throws Exception {
Analyzer analyzer = new StandardAnalyzer();
String[] did = new String[0];
//指向索引目录的搜索器
try {
Searcher searcher = new IndexSearcher(mstrIndexPath);
Term term = new Term(astrField, astrKey);
Query query = QueryParser.parse(astrKey, astrField, analyzer);
System.out.println("Searching for: " + query.toString("contents"));
Hits hits = searcher.search(query);
int size = hits.length();
did = new String[size];
for (int i = 0; i < size; i++) {
did[i] = hits.doc(i).get("digitalID");
System.out.println(":" + hits.doc(i).get("contents"));
}
} catch (ParseException e) {
//e.printStackTrace();
System.out.println(e.getMessage());
}
return did;
}
----------------------------------------------------------------------------------------------------------------------------------------------
class IndexFiles {
public static void main(String[] args) throws IOException {
String usage = "java " + IndexFiles.class + " <root_directory>";
if (args.length == 0) {
System.err.println("Usage: " + usage);
System.exit(1);
}
Date start = new Date();
try {
IndexWriter writer = new IndexWriter("D://index//tt", new StandardAnalyzer(), true);
indexDocs(writer, new File("D://temp"));
writer.optimize();
writer.close();
Date end = new Date();
System.out.print(end.getTime() - start.getTime());
System.out.println(" total milliseconds");
} catch (IOException e) {
System.out.println(" caught a " + e.getClass() +
"/n with message: " + e.getMessage());
}
}
public static void indexDocs(IndexWriter writer, File file)
throws IOException {
// do not try to index files that cannot be read
if (file.canRead()) {
if (file.isDirectory()) {
String[] files = file.list();
// an IO error could occur
if (files != null) {
for (int i = 0; i < files.length; i++) {
indexDocs(writer, new File(file, files[i]));
}
}
} else {
System.out.println("adding " + file);
try {
writer.addDocument(FileDocument.Document(file));
}
// at least on windows, some temporary files raise this exception with an "access denied" message
// checking if the file can be read doesn't help
catch (FileNotFoundException fnfe) {
;
}
}
}
}
}
public class FileDocument {
/** Makes a document for a File.
<p>
The document has three fields:
<ul>
<li><code>path</code>--containing the pathname of the file, as a stored,
tokenized field;
<li><code>modified</code>--containing the last modified date of the file as
a keyword field as encoded by <a
href="lucene.document.DateField.html">DateField</a>; and
<li><code>contents</code>--containing the full contents of the file, as a
Reader field;
*/
public static Document Document(File f)
throws java.io.FileNotFoundException {
// make a new, empty document
Document doc = new Document();
// Add the path of the file as a field named "path". Use a Text field, so
// that the index stores the path, and so that the path is searchable
doc.add(Field.Text("path", f.getPath()));
// Add the last modified date of the file a field named "modified". Use a
// Keyword field, so that it's searchable, but so that no attempt is made
// to tokenize the field into words.
doc.add(Field.Keyword("modified",
DateField.timeToString(f.lastModified())));
System.out.println(f.lastModified()+" "+System.currentTimeMillis());
// Add the contents of the file a field named "contents". Use a Text
// field, specifying a Reader, so that the text of the file is tokenized.
// ?? why doesn't FileReader work here ??
FileInputStream is = new FileInputStream(f);
Reader reader = new BufferedReader(new InputStreamReader(is));
doc.add(Field.Text("contents", reader));
// return the document
return doc;
}
private FileDocument() {}
}
class SearchFiles {
public static void main(String[] args) {
try {
Searcher searcher = new IndexSearcher("D://index//tt");
Analyzer analyzer = new StandardAnalyzer();
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
while (true) {
System.out.print("Query: ");
String line = in.readLine();
if (line.length() == -1)
break;
//Query query = QueryParser.parse(line, "contents", analyzer);
Query query = QueryParser.parse("产党", "contents", analyzer);
System.out.println("Searching for: " + query.toString("contents"));
Hits hits = searcher.search(query);
System.out.println(hits.length() + " total matching documents");
final int HITS_PER_PAGE = 10;
for (int start = 0; start < hits.length(); start += HITS_PER_PAGE) {
int end = Math.min(hits.length(), start + HITS_PER_PAGE);
for (int i = start; i < end; i++) {
Document doc = hits.doc(i);
String path = doc.get("path");
if (path != null) {
//System.out.println(i + ". " + path+" "+doc.get("modified"));
} else {
String url = doc.get("url");
if (url != null) {
//System.out.println(i + ". " + url);
//System.out.println(" - " + doc.get("title"));
} else {
System.out.println(i + ". " + "No path nor URL for this document");
}
}
}
if (hits.length() > end) {
System.out.print("more (y/n) ? ");
line = in.readLine();
if (line.length() == 0 || line.charAt(0) == 'n')
break;
}
}
}
searcher.close();
} catch (Exception e) {
System.out.println(" caught a " + e.getClass() +
"/n with message: " + e.getMessage());
}
}
}
上述方法的查询时候不能直接使用一些特殊的符号:+-[](){}&&||"等等,如果要是用需要进行转义,查询的给定条件需要自己定义:比如查询title和contents同时满足的需要这样构造:
field:contents
text:title:text AND text
或的关系
field:contents
text:title:text OR text 或 text:title:text text
/**
* 建立指定的编码的图片的索引
* 1.索引的内容包含图的标题,描述关键字
* 2.上述内容用字符串的形式存储
* 3.给定的建立索引的目录必须是空目录或者存在索引文件的目录
* @param astrID String 给定的数字对象的编码
* @param astrTitle String 给定的关键
* @param astrContent String 给定的索引内容(包括标题,描述)
* @throws java.lang.Exception 数据处理错误
*/
private vo
4000
id addIndex(String astrID, String astrTitle, String astrContent) throws
Exception {
boolean build = false;
//是否要建立索引文件
String tmpPath = mstrIndexPath + "/segments";
File exist = new File(tmpPath);
if (!exist.exists() || exist.isDirectory()) { //给定的文件不存在,需要建立新的索引文件
build = true;
}
//用指定的语言分析器构造一个索引
IndexWriter writer = new IndexWriter(mstrIndexPath,
new StandardAnalyzer(), build);
//writer.mergeFactor = 10;
//mergeFactor是一个针对批量索引的优化参数,控制多少条处理完多少条记录(Document)后,写入一次索引,写入频率越高,内存使用越少,但索引速度越慢,所以在大批量数据导入时需要增大文件写入的间隔,多让索引在内存中操作。
org.apache.lucene.document.Document doc = new org.apache.lucene.
document.Document();
doc.add(Field.UnIndexed("ID", astrID));
doc.add(Field.Keyword("digitalID", astrID));
doc.add(Field.Text("title", astrTitle));
doc.add(Field.Text("contents", astrContent));
writer.addDocument(doc);
writer.optimize();
writer.close();
}
/**
* 修改指定索引编码的索引记录:
* 修改的方法是在原有索引的基础上删除相同的索引号,添加新的索引号
* @param astrID String 指定索引的索引编码
* @param astrTitle 指定的关键字
* @param astrContent String 指定的内容
* @throws Exception 可能的异常
*/
public synchronized void modIndex(String astrID, String astrTitle,
String astrContent) throws
Exception {
File file = new File(mstrIndexPath + "/segments");
if (file.exists() && !file.isDirectory()) {
//删除存在的ID
Directory directory = FSDirectory.getDirectory(mstrIndexPath, false);
IndexReader reader = IndexReader.open(directory);
reader.unlock(directory);
//删除指定的索引
Term term = new Term(INDEX_ID, astrID);
int delete = reader.delete(term);
/*for (int i = 0; i < reader.maxDoc(); i++) {
if (!reader.isDeleted(i)) {
Document doc = reader.document(i); //取索引的一行记录
String fValue = doc.get("digitalID"); //取指定的字段值
if (fValue.equals(astrID)) {
reader.delete(i);
}
}
}*/
reader.close();
directory.close();
}
//添加一条记录
addIndex(astrID, astrTitle, astrContent);
}
/**
* 删除指定字段的索引
* @param field String 指定的字段
* @param txt String 指定字段的值
* @throws Exception 可能的异常
*/
public synchronized void delIndex(String field, String txt) throws
Exception {
//删除存在的ID
Directory directory = FSDirectory.getDirectory(mstrIndexPath, false);
IndexReader reader = IndexReader.open(directory);
reader.unlock(directory);
//删除指定的索引
Term term = new Term(field, txt);
int delete = reader.delete(term);
reader.close();
directory.close();
}
/**
* 查找给定串的数字对象,返回所有的数字对象编码集合
* @param astrField String 给定的域
* @param astrKey String 给定的查询字符串
* @return String[] 编码ID数组
* @throws Exception
*/
public String[] search(String astrField, String astrKey) throws Exception {
Analyzer analyzer = new StandardAnalyzer();
String[] did = new String[0];
//指向索引目录的搜索器
try {
Searcher searcher = new IndexSearcher(mstrIndexPath);
Term term = new Term(astrField, astrKey);
Query query = QueryParser.parse(astrKey, astrField, analyzer);
System.out.println("Searching for: " + query.toString("contents"));
Hits hits = searcher.search(query);
int size = hits.length();
did = new String[size];
for (int i = 0; i < size; i++) {
did[i] = hits.doc(i).get("digitalID");
System.out.println(":" + hits.doc(i).get("contents"));
}
} catch (ParseException e) {
//e.printStackTrace();
System.out.println(e.getMessage());
}
return did;
}
----------------------------------------------------------------------------------------------------------------------------------------------
class IndexFiles {
public static void main(String[] args) throws IOException {
String usage = "java " + IndexFiles.class + " <root_directory>";
if (args.length == 0) {
System.err.println("Usage: " + usage);
System.exit(1);
}
Date start = new Date();
try {
IndexWriter writer = new IndexWriter("D://index//tt", new StandardAnalyzer(), true);
indexDocs(writer, new File("D://temp"));
writer.optimize();
writer.close();
Date end = new Date();
System.out.print(end.getTime() - start.getTime());
System.out.println(" total milliseconds");
} catch (IOException e) {
System.out.println(" caught a " + e.getClass() +
"/n with message: " + e.getMessage());
}
}
public static void indexDocs(IndexWriter writer, File file)
throws IOException {
// do not try to index files that cannot be read
if (file.canRead()) {
if (file.isDirectory()) {
String[] files = file.list();
// an IO error could occur
if (files != null) {
for (int i = 0; i < files.length; i++) {
indexDocs(writer, new File(file, files[i]));
}
}
} else {
System.out.println("adding " + file);
try {
writer.addDocument(FileDocument.Document(file));
}
// at least on windows, some temporary files raise this exception with an "access denied" message
// checking if the file can be read doesn't help
catch (FileNotFoundException fnfe) {
;
}
}
}
}
}
public class FileDocument {
/** Makes a document for a File.
<p>
The document has three fields:
<ul>
<li><code>path</code>--containing the pathname of the file, as a stored,
tokenized field;
<li><code>modified</code>--containing the last modified date of the file as
a keyword field as encoded by <a
href="lucene.document.DateField.html">DateField</a>; and
<li><code>contents</code>--containing the full contents of the file, as a
Reader field;
*/
public static Document Document(File f)
throws java.io.FileNotFoundException {
// make a new, empty document
Document doc = new Document();
// Add the path of the file as a field named "path". Use a Text field, so
// that the index stores the path, and so that the path is searchable
doc.add(Field.Text("path", f.getPath()));
// Add the last modified date of the file a field named "modified". Use a
// Keyword field, so that it's searchable, but so that no attempt is made
// to tokenize the field into words.
doc.add(Field.Keyword("modified",
DateField.timeToString(f.lastModified())));
System.out.println(f.lastModified()+" "+System.currentTimeMillis());
// Add the contents of the file a field named "contents". Use a Text
// field, specifying a Reader, so that the text of the file is tokenized.
// ?? why doesn't FileReader work here ??
FileInputStream is = new FileInputStream(f);
Reader reader = new BufferedReader(new InputStreamReader(is));
doc.add(Field.Text("contents", reader));
// return the document
return doc;
}
private FileDocument() {}
}
class SearchFiles {
public static void main(String[] args) {
try {
Searcher searcher = new IndexSearcher("D://index//tt");
Analyzer analyzer = new StandardAnalyzer();
BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
while (true) {
System.out.print("Query: ");
String line = in.readLine();
if (line.length() == -1)
break;
//Query query = QueryParser.parse(line, "contents", analyzer);
Query query = QueryParser.parse("产党", "contents", analyzer);
System.out.println("Searching for: " + query.toString("contents"));
Hits hits = searcher.search(query);
System.out.println(hits.length() + " total matching documents");
final int HITS_PER_PAGE = 10;
for (int start = 0; start < hits.length(); start += HITS_PER_PAGE) {
int end = Math.min(hits.length(), start + HITS_PER_PAGE);
for (int i = start; i < end; i++) {
Document doc = hits.doc(i);
String path = doc.get("path");
if (path != null) {
//System.out.println(i + ". " + path+" "+doc.get("modified"));
} else {
String url = doc.get("url");
if (url != null) {
//System.out.println(i + ". " + url);
//System.out.println(" - " + doc.get("title"));
} else {
System.out.println(i + ". " + "No path nor URL for this document");
}
}
}
if (hits.length() > end) {
System.out.print("more (y/n) ? ");
line = in.readLine();
if (line.length() == 0 || line.charAt(0) == 'n')
break;
}
}
}
searcher.close();
} catch (Exception e) {
System.out.println(" caught a " + e.getClass() +
"/n with message: " + e.getMessage());
}
}
}
相关文章推荐
- 自己写的一个vc操作word的例子
- 自己写的一个使用ADO操作ACCESS数据库的例子
- 写一个程序,统计自己C语言共写了多少行代码,Github基本操作
- [ZZ]一个通过BackgroundWorker实现WinForm异步操作的例子
- 一个通过BackgroundWorker实现WinForm异步操作的例子
- 一个小例子-----读取sql server然后写到xml文件--------jechy你自己看看那些资料吧.我有点冷了,要上楼去了
- 一个MSSQL全文索引的例子
- 自己写了一个通过smarty与php相互结合的例子,动态生成静态程序,写的很烂
- 一个操作datagrid绑定xml文件的例子程序
- 一个自己封装的日期时间操作的用户对象
- 【写一个自己的js库】 4.完善跨浏览器事件操作
- Redis的安装、使用以及php中扩展redis并实现php操作redis的一个例子
- PHP操作excel的一个例子
- 一个MSSQL全文索引的例子
- 一个比较全的日期操作例子Calendar
- 一个数组,索引,泛型,迭代的例子
- 自己实现一个SQL解析引擎 功能:将用户输入的SQL语句序列转换为一个可执行的操作序列,并返回查询的结果集。 SQL的解析引擎包括查询编译与查询优化和查询的运行,主要包括3个步骤: 查询分析
- 一个用C#操作OpenLDAP的例子
- 【转载】关于:读写xls文件,提供下载的一个小例子(jxl 的简单运用) - 高级操作
- C# 一个自己写的树结构代码(2)-Array,HashTable,List,String数据结构操作封装