aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: bd4967df1e53338b69636bf4dc911b1ed6a9d681 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
Simple search engine

# Installing

It is recommended to use [virtualenv](https://virtualenv.pypa.io).

```
pip install -r requirements.txt
```

## Testing

If you just want to test and don't want to install a PostgreSQL database
but have Docker installed, juste use the `docker-compose.yml`.

This is only for test, don't use this shit on production (the docker-compose
file)!

## Sphinx-search / Manticore-search

You can use [Sphinx-search](http://sphinxsearch.com/) but it's recommand to use
[Manticore-search](https://manticoresearch.com/) since the last version of
Sphinx-search is ditribued in closed-source instead of open-source (for
version 3.x).

All explication is for Manticore-search for the moment but at many time the
term `sphinx` is used in code because Manticore-search want to keep a
compatibility with Sphinx-search.

# Configuration

## Database

The database used for this project is PostgreSQL, you can update login
information in `config.py` file.

## Sphinx-search / Manticore-search

The configuration for this is in `sphinx_search.conf` file. For update this
file please view documentation of
[Sphinx-search](http://sphinxsearch.com/docs/manual-2.3.2.html) or
[Manticore-search](https://docs.manticoresearch.com).
Keep in mind you must keep up to date the file `config.py` in accordance with
the `sphinx_search.conf` file.

# Crawling

For now there is an example spider with neodarz website.
For launch all the crawler use the following command:

```
python app.py crawl
```

# Indexing

Before lauch indexing or searching command you must verifiy that the folder of
`path` option is present in your system (Warning: the last word of the `path`
option is the value of the `source` option, don't create this folder but only
his parent folder).

Example with the configuration for the indexer `datas`:

```
index datas {
    source = datas
    path = /tmp/data/datas
}
```
Here the folder is `/tmp/data/`

The command for indexing is:
```
indexer --config sphinx_search.conf --all
```

Don't forget to launch the crawling command before this ;)

# Searching

Before you can make search, you must lauch the search server
```
searchd -c sphinx_search.conf
```

## Enjoy

You can now launch the server!

```
python app.py
```

For start searching send `GET` request to the following adresse (without `<` and
`>`):
```
127.0.0.1:5000/?search=<search terms>
```

Resultat are in json format.