aboutsummaryrefslogtreecommitdiff
blob: 59b55f812941785fcefdfbb0bd20d23dec80c012 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
*********
Internals
*********

How it works?
=============

Scheme
------
.. image:: _static/autodep_arch.png

Format of network messages
--------------------------


1. Format of messages to file access registrar::

   <time of event: sec since 1970>
   <event type: open, read, write>
   <name of file>
   <building stage: stagename or unknown>
   <result:OK,ERR/errno,ASKING,DENIED>
2. Format of answer for ASKING packet from registrar::

   <ALLOW | DENY>

*Notes:*

* All sockets are SOCK_SEQPACKET
* All fields are delimited with character with code 0


How Hooklib approach works?
===========================

The main idea of Hooklib approach is to load a dynamic library-hooker 
**before** any other library(including the C runtime, libc.so). 
So, the functions, such as open, read and write, executed from this library
instead of libc.so.

Hooklib module modifies Linux's dynamic linker behavior changing LD_PRELOAD 
environment variable(see 
`man 8 ld-linux <http://linux.die.net/man/8/ld-linux>`_ for details).
Module protects LD_PRELOAD variable from further changes by program.

When hooklib module loads, it connects to file access registrar via Unix domain 
sockets. If program forks or creates a new thread, another copy of library 
loads. 

When program do open(...), read(...), write(...), library send an information 
about a call to registrar. Registar can block or allow an event. If registrer 
allows an event then the original function is called. Else error 
"file not found" is returned.

How Fusefs approach works?
==========================

The main idea if Fusefs approach is to create a loggable filesystem in userspace
and chroot a program into it.

Before program is launched registrar prepare mounts. It usually do:

1. mount -o bind / /mnt/rootfs/
2. mount /dev/, /dev/pts, /dev/shm, /proc/, /sys/ same way
3. mount /lib64/, /lib32/, /var/tmp/portage/ same way to increase performance at 
   cost of accuracy
4. launch fuse over /mnt/rootfs/

Fuse module blocks all external access to /mnt/rootfs while program runs.

Fuse module also asks the registrar about event allowness.

*Notes:*

* Checking for allowness takes a much time

Futher analysis of file access events
=====================================

After file access analyser recieves list of events it maps it on a list of 
packages. 

Then analyser builds a list of dependencies for packages installed and compares 
with the list it got from registrar. Analyser believes that packages from system
profile are implicit dependencies of any package in system.

If dependency from registrar is unexpected simple heuristics used to cut 
unuseful packages.

Rules of heuristics
-------------------

1. *Package is not useful if all files are .desktop or .xml or .m4*.
   Aclocal util tries to read all .m4 files in /usr/share/aclocal directory.
   Files ending on .desktop and .xml often readed on postrm phase.