NLMISC::CWordsDictionary Class Reference

#include <words_dictionary.h>


Detailed Description

Words dictionary: allows to search for keys and words in _words_<language>.txt Unicode files. All searches are case not-sensitive.

Author:
Olivier Cado

Nevrax France

Date:
2003

Definition at line 44 of file words_dictionary.h.

Public Member Functions

 CWordsDictionary ()
 Constructor.

void exactLookupByKey (const CSString &key, CVectorSString &resultVec)
 Set the result vector with the word(s) corresponding to the key.

const CVectorSStringgetKeys ()
const CVectorSStringgetWords ()
CSString getWordsKey (const CSString &resultStr)
 Return the key contained in the provided string returned by lookup() (without extension).

bool init (const std::string &configFileName="words_dic.cfg")
void lookup (const CSString &inputStr, CVectorSString &resultVec)

Protected Member Functions

CSString makeResult (const CSString key, const CSString word)
 Make a result string.


Private Attributes

CVectorSString _Keys
 Keys (same indices as in _Words).

CVectorSString _Words
 Words (same indices as in _Keys).


Constructor & Destructor Documentation

NLMISC::CWordsDictionary::CWordsDictionary  ) 
 

Constructor.

Definition at line 41 of file words_dictionary.cpp.

00042 {
00043 }


Member Function Documentation

void NLMISC::CWordsDictionary::exactLookupByKey const CSString key,
CVectorSString resultVec
 

Set the result vector with the word(s) corresponding to the key.

Definition at line 190 of file words_dictionary.cpp.

References _Words.

00191 {
00192         // Search
00193         for ( CVectorSString::const_iterator ivs=_Keys.begin(); ivs!=_Keys.end(); ++ivs )
00194         {
00195                 if ( key == *ivs )
00196                         resultVec.push_back( _Words[ivs-_Keys.begin()] );
00197         }
00198 }

const CVectorSString& NLMISC::CWordsDictionary::getKeys  )  [inline]
 

Definition at line 77 of file words_dictionary.h.

00077 { return _Keys; }

const CVectorSString& NLMISC::CWordsDictionary::getWords  )  [inline]
 

Definition at line 78 of file words_dictionary.h.

References _Words.

00078 { return _Words; }

CSString NLMISC::CWordsDictionary::getWordsKey const CSString resultStr  ) 
 

Return the key contained in the provided string returned by lookup() (without extension).

Definition at line 214 of file words_dictionary.cpp.

References NLMISC::CSString::splitTo().

00215 {
00216         return resultStr.splitTo( ':' );
00217 }

bool NLMISC::CWordsDictionary::init const std::string &  configFileName = "words_dic.cfg"  ) 
 

Load the config file and the related words files. Return false in case of failure. Config file variables:

  • WordsPath: where to find *_words_.txt
  • LanguageCode: language code (ex: en for English)
  • Utf8: results are in UTF8, otherwise in ANSI string

Definition at line 52 of file words_dictionary.cpp.

References _Words, STRING_MANAGER::TWorksheet::begin(), STRING_MANAGER::TWorksheet::end(), STRING_MANAGER::TWorksheet::findCol(), STRING_MANAGER::TWorksheet::findId(), NLMISC::CConfigFile::getVarPtr(), NLMISC::CConfigFile::load(), STRING_MANAGER::loadExcelSheet(), nldebug, nlwarning, NLMISC::toString(), STRING_MANAGER::TWorksheet::TRow, uint, and v.

00053 {
00054         // Read config file
00055         bool cfFound = false;
00056         CConfigFile cf;
00057         try
00058         {
00059                 cf.load( configFileName );
00060                 cfFound = true;
00061         }
00062         catch ( EConfigFile& e )
00063         {
00064                 nlwarning( "WD: %s", e.what() );
00065         }
00066         string wordsPath, languageCode;
00067         bool utf8 = false;
00068         if ( cfFound )
00069         {
00070                 CConfigFile::CVar *v = cf.getVarPtr( "WordsPath" );
00071                 if ( v )
00072                 {
00073                         wordsPath = v->asString();
00074                         /*if ( (!wordsPath.empty()) && (wordsPath[wordsPath.size()-1]!='/') )
00075                                 wordsPath += '/';*/
00076                 }
00077                 v = cf.getVarPtr( "LanguageCode" );
00078                 if ( v )
00079                         languageCode = v->asString();
00080                 v = cf.getVarPtr( "Utf8" );
00081                 if ( v )
00082                         utf8 = (v->asInt() == 1);
00083         }
00084         if ( languageCode.empty() )
00085                 languageCode = "en";
00086 
00087         // Load all found words files
00088         const string ext = ".txt";
00089         vector<string> fileList;
00090         CPath::getPathContent( wordsPath, false, false, true, fileList );
00091         for ( vector<string>::const_iterator ifl=fileList.begin(); ifl!=fileList.end(); ++ifl )
00092         {
00093                 const string& filename = (*ifl);
00094                 string::size_type p;
00095                 if ( (p = filename.find( string("_words_") + languageCode + ext )) != string::npos )
00096                 {
00097                         nldebug( "WD: Loading %s", filename.c_str() );
00098                         string::size_type origSize = filename.size() - ext.size();
00099                         const string truncFilename = CFile::getFilenameWithoutExtension( filename );
00100                         const string wordType = truncFilename.substr( 0, p - (origSize - truncFilename.size()) );
00101 
00102                         // Load Unicode Excel words file
00103                         STRING_MANAGER::TWorksheet worksheet;
00104                         STRING_MANAGER::loadExcelSheet( filename, worksheet );
00105                         uint ck, cw;
00106                         if ( worksheet.findId( ck ) && worksheet.findCol( ucstring("name"), cw ) ) // => 
00107                         {
00108                                 for ( std::vector<STRING_MANAGER::TWorksheet::TRow>::iterator ip = worksheet.begin(); ip!=worksheet.end(); ++ip )
00109                                 {
00110                                         if ( ip == worksheet.begin() ) // skip first row
00111                                                 continue;
00112                                         STRING_MANAGER::TWorksheet::TRow& row = *ip;
00113                                         _Keys.push_back( row[ck].toString() );
00114                                         string word = utf8 ? row[cw].toUtf8() : row[cw].toString();
00115                                         _Words.push_back( word );
00116                                 }
00117                         }
00118                         else
00119                                 nlwarning( "WD: %s ID or name not found in %s", wordType.c_str(), filename.c_str() );
00120                 }
00121         }
00122 
00123         if ( _Keys.empty() )
00124         {
00125                 if ( wordsPath.empty() )
00126                         nlwarning( "WD: WordsPath missing in config file %s", configFileName.c_str() );
00127                 nlwarning( "WD: *_words_%s.txt not found", languageCode.c_str() );
00128                 return false;
00129         }
00130         else
00131                 return true;
00132 }

void NLMISC::CWordsDictionary::lookup const CSString inputStr,
CVectorSString resultVec
 

Set the result vector with strings corresponding to the input string:

  • If inputStr is partially or completely found in the keys, all the matching <key,words> are returned;
  • If inputStr is partially or completely in the words, all the matching <key, words> are returned. The following tags can modify the behaviour of the search algorithm:
  • ^mystring returns mystring only if it is at the beginning of a key or word
  • mystring$ returns mystring only if it is at the end of a key or word All returned words are in UTF8 string or ANSI string, depending of the config file.

Definition at line 144 of file words_dictionary.cpp.

References _Words, NLMISC::CSString::find(), makeResult(), and NLMISC::CSString::rightCrop().

00145 {
00146         // Prepare search string
00147         if ( inputStr.empty() )
00148                 return;
00149 
00150         CSString searchStr = inputStr;
00151         bool findAtBeginning = false, findAtEnd = false;
00152         if ( searchStr[0] == '^' )
00153         {
00154                 searchStr = searchStr.substr( 1 );
00155                 findAtBeginning = true;
00156         }
00157         if ( searchStr[searchStr.size()-1] == '$' )
00158         {
00159                 searchStr = searchStr.rightCrop( 1 );
00160                 findAtEnd = true;
00161         }
00162 
00163         // Search
00164         for ( CVectorSString::const_iterator ivs=_Keys.begin(); ivs!=_Keys.end(); ++ivs )
00165         {
00166                 const CSString& key = *ivs;
00167                 string::size_type p;
00168                 if ( (p = key.find( searchStr.c_str() )) != string::npos )
00169                 {
00170                         if ( ((!findAtBeginning) || (p==0)) && ((!findAtEnd) || (p==key.size()-searchStr.size())) )
00171                                 resultVec.push_back( makeResult( key, _Words[ivs-_Keys.begin()] ) );
00172                 }
00173         }
00174         for ( CVectorSString::const_iterator ivs=_Words.begin(); ivs!=_Words.end(); ++ivs )
00175         {
00176                 const CSString& word = *ivs;
00177                 string::size_type p;
00178                 if ( (p = word.find( searchStr.c_str() )) != string::npos )
00179                 {
00180                         if ( ((!findAtBeginning) || (p==0)) && ((!findAtEnd) || (p==word.size()-searchStr.size())) )
00181                                 resultVec.push_back( makeResult( _Keys[ivs-_Words.begin()], word ) );
00182                 }
00183         }
00184 }

CSString NLMISC::CWordsDictionary::makeResult const CSString  key,
const CSString  word
[inline, protected]
 

Make a result string.

Definition at line 204 of file words_dictionary.cpp.

References res.

Referenced by lookup().

00205 {
00206         CSString res = key + CSString(": ") + word;
00207         return res;
00208 }


Field Documentation

CVectorSString NLMISC::CWordsDictionary::_Keys [private]
 

Keys (same indices as in _Words).

Definition at line 88 of file words_dictionary.h.

CVectorSString NLMISC::CWordsDictionary::_Words [private]
 

Words (same indices as in _Keys).

Definition at line 91 of file words_dictionary.h.

Referenced by exactLookupByKey(), getWords(), init(), and lookup().


The documentation for this class was generated from the following files:
Generated on Tue Mar 16 13:43:30 2004 for NeL by doxygen 1.3.6