1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
|
Simple search engine
# Installing
It is recommended to use [virtualenv](https://virtualenv.pypa.io).
```
pip install -r requirements.txt
```
## Testing
If you just want to test and don't want to install a PostgreSQL database
but have Docker installed, juste use the `docker-compose.yml`.
This is only for test, don't use this shit on production (the docker-compose
file)!
## Sphinx-search / Manticore-search
You must use [Manticore-search](https://manticoresearch.com/) because of the
usage of the JSON search API in the searx engines.
But you can use [Sphinx-search](http://sphinxsearch.com/) if you don't want to
use the JSON search API. You need to know, as of January 2019, the last
version of Sphinx-search is distribued in closed-source instead of open-source
(for versions 3.x)
# Configuration
## Database
The database used for this project is PostgreSQL, you can update login
information in `config.py` file.
## Manticore-search
The configuration for this is in `sphinx_search.conf` file. For update this
file please view documentation [Manticore-search](https://docs.manticoresearch.com).
Keep in mind you must keep up to date the file `config.py` in accordance with
the `sphinx_search.conf` file.
# Crawling
For now there is an example spider with neodarz website.
For launch all the crawler use the following command:
```
python app.py crawl
```
# Indexing
Before lauch indexing or searching command you must verifiy that the folder of
`path` option is present in your system (Warning: the last word of the `path`
option is the value of the `source` option, don't create this folder but only
his parent folder).
Example with the configuration for the indexer `datas`:
```
index neodarznet {
source = neodarznet
path = /tmp/data/neodarznet
}
```
Here the folder is `/tmp/data/`
The command for indexing is:
```
indexer --config sphinx_search.conf --all
```
Don't forget to launch the crawling command before this ;)
# Searching
Before you can make search, you must lauch the search server
```
searchd -c sphinx_search.conf
```
## Enjoy
You can now launch the server!
```
python app.py
```
For start searching send a `POST` request with the manticoresearch json API,
for example:
```
http POST 'http://localhost:8080/json/search' < mysearch.json
```
This is the content of the `mysearch.json`:
```
{
"index": "neodarznet",
"query": { "match": { "content": "Livet" } },
"highlight":
{
"fields":
{
"content": {},
"url": {},
"title": {}
},
"pre_tags": "_",
"post_tags": "_",
}
}
```
You can find more information about the HTTP sear API avaiblable in the
[Manticores-earch documentation](https://docs.manticoresearch.com/latest/html/httpapi_reference.html)
Resultat are in json format. If you whant to know witch website is indexed,
search in the file [sphinx_search.conf](https://git.khaganat.net/neodarz/khanindexer/blob/master/sphinx_search.conf)
all the line who start by `index`.
|