您的位置:首页 > 产品设计 > UI/UE

UVa123 Searching Quickly

2013-08-05 08:54 471 查看


 Searching Quickly 

Background

Searching and sorting are part of the theory and practice of computerscience. For example, binary search provides a good example of aneasy-to-understand algorithm with sub-linear complexity. Quicksort isan efficient


[average case] comparison based sort.

KWIC-indexing is an indexing method that permits efficient ``humansearch'' of, for example, a list of titles.

The Problem

Given a list of titles and a list of ``words to ignore'', you are towrite a program that generates a KWIC (Key Word In Context) index of thetitles. In a KWIC-index, a title is listed once for each keyword thatoccurs in the title. The KWIC-index is alphabetized
by keyword.

Any word that is not one of the ``words to ignore'' is a potentialkeyword.

For example, if words to ignore are``
the, of, and, as, a
'' and the listof titles is:

Descent of Man
The Ascent of Man
The Old Man and The Sea
A Portrait of The Artist As a Young Man

A KWIC-index of these titles might be given by:

a portrait of the ARTIST as a young man
the ASCENT of man
DESCENT of man
descent of MAN
the ascent of MAN
the old MAN and the sea
a portrait of the artist as a young MAN
the OLD man and the sea
a PORTRAIT of the artist as a young man
the old man and the SEA
a portrait of the artist as a YOUNG man


The Input

The input is a sequence of lines, the string
::
is used toseparate the list of words to ignore from the list of titles. Each ofthe words to ignore appears in lower-case letters on a line by itselfand is no more than 10 characters in length.
Each title appears on aline by itself and may consist of mixed-case (upper and lower) letters.Words in a title are separated by whitespace. No title contains morethan 15 words.

There will be no more than 50 words to ignore, no more than than 200titles, and no more than 10,000 characters in the titles and words toignore combined. No characters other than 'a'-'z', 'A'-'Z', and whitespace will appear in the input.

The Output

The output should be a KWIC-index of the titles, with each titleappearing once for each keyword in the title, and with the KWIC-indexalphabetized by keyword. If a word appears more than once in a title,each instance is a potential keyword.

The keyword should appear in all upper-caseletters. All other words in a title should be in lower-case letters.Titles in the KWIC-index with the same keyword should appear in the sameorder as they appeared in the input file. In the case where multipleinstances
of a word are keywords in the same title, the keywords shouldbe capitalized in left-to-right order.

Case (upper or lower) is irrelevant when determining if a word is to beignored.

The titles in the KWIC-index need NOT be justified or aligned bykeyword, all titles may be listed left-justified.

Sample Input

is
the
of
and
as
a
but
::
Descent of Man The Ascent of Man The Old Man and The Sea A Portrait of The Artist As a Young Man
A Man is a Man but Bubblesort IS A DOG


Sample Output

a portrait of the ARTIST as a young man
the ASCENT of man
a man is a man but BUBBLESORT is a dog
DESCENT of man
a man is a man but bubblesort is a DOG
descent of MAN
the ascent of MAN
the old MAN and the sea
a portrait of the artist as a young MAN
a MAN is a man but bubblesort is a dog
a man is a MAN but bubblesort is a dog
the OLD man and the sea
a PORTRAIT of the artist as a young man
the old man and the SEA
a portrait of the artist as a YOUNG man

这题的大意就是找关键字,先是给出几行字符串作为非关键字,然后在之后的n行字符串中寻找非非关键字的字符,按字典序输出各关键字所在字符串,且除关键字大写外其他小写。这题主要是用到一个结构体存储各关键字及表示其在第i个字符串的第j个,在读入字符串时将各个单词分离开,与非关键字进行比较,若为关键字,即把此关键字及其坐标存入结构体中。而最后的输出则通过分段输出,当遇到此关键字的坐标时,将其小写转换为大写,以关键字长度为限,最终将结果输出。

#include <iostream>
#include <cstring>
#include <cstdio>
#include <cctype>
#include <algorithm>
using namespace std;

char ignore[60][20];
char title[210][10010];
struct Keyword {
char keyword[20];
int origin_1;
int origin_2;
}k[10010];

bool IsIgnored(char *tmp,char ignore[][20],int t) {
for (int i = 0; i < t; i++)
if (!strcmp(tmp,ignore[i]))
return true;
return false;
}

bool cmp(Keyword a,Keyword b) {
if (!strcmp(a.keyword,b.keyword))
if (a.origin_1 == b.origin_1)
return a.origin_2 < b.origin_2;
else
return a.origin_1 < b.origin_1;
else
return strcmp(a.keyword,b.keyword) < 0;
}

int main() {
memset(ignore,0,sizeof(ignore));
memset(title,0,sizeof(title));
memset(k,0,sizeof(k));
int t_1 = 0; //the number of ignored words.
while (cin >> ignore[t_1]) {
if (ignore[t_1][0] == ':' && ignore[t_1][1] == ':')
break;
t_1++;
}

int t_2 = 0; //the number of titles.
char tmp_1[10010]; //temporary titles.
getchar();
while (gets(tmp_1)) {
//if (tmp_1[0] == ':') ////
// break; ////
int len_1 = strlen(tmp_1);
for (int i = 0; i < len_1; i++)
if (isupper(tmp_1[i]))
tmp_1[i] = tolower(tmp_1[i]);
strcpy(title[t_2],tmp_1);
t_2++;
}
char tmp_2[20]; //temporary keywords.
int len_2 = 0; //the length of temporary keywords.
int t_3 = 0; //the number of keywords.
memset(tmp_2,0,sizeof(tmp_2));
for (int i = 0; i < t_2; i++) {
int cnt = true;
int len_3 = strlen(title[i]);
for (int j = 0; j < len_3; j++) {
if (isalpha(title[i][j])) {
tmp_2[len_2] = title[i][j];
len_2++;
cnt = true;
}
else if (cnt){
if (!IsIgnored(tmp_2,ignore,t_1)) {
strcpy(k[t_3].keyword,tmp_2);
k[t_3].origin_1 = i;
k[t_3].origin_2 = j - len_2;
t_3++;
}
memset(tmp_2,0,sizeof(tmp_2));
len_2 = 0;
cnt = false;
}
}
if (cnt) {
if (tmp_2[0] != '\0') {
if (!IsIgnored(tmp_2,ignore,t_1)) {
strcpy(k[t_3].keyword,tmp_2);
k[t_3].origin_1 = i;
k[t_3].origin_2 = len_3 - len_2;
t_3++;
}
memset(tmp_2,0,sizeof(tmp_2));
len_2 = 0;
}
}
}

sort(k,k+t_3,cmp);

for (int i = 0; i < t_3; i++) {
for (int j = 0; j < k[i].origin_2; j++)
cout << title[k[i].origin_1][j];
for (int j = k[i].origin_2; j < k[i].origin_2 + strlen(k[i].keyword); j++)
printf("%c",toupper(title[k[i].origin_1][j]));
for (int j = k[i].origin_2 + strlen(k[i].keyword); j < strlen(title[k[i].origin_1]); j++)
cout << title[k[i].origin_1][j];
cout << endl;
}

//for (int i = 0; i < t_3; i++)
// cout << k[i].keyword << " " << k[i].origin_1 << " " << k[i].origin_2 << endl;
return 0;
}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  uva 算法