aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'pypy/doc/parser.rst')
-rw-r--r--pypy/doc/parser.rst28
1 files changed, 18 insertions, 10 deletions
diff --git a/pypy/doc/parser.rst b/pypy/doc/parser.rst
index 9e155046ad..d2a6bb54ed 100644
--- a/pypy/doc/parser.rst
+++ b/pypy/doc/parser.rst
@@ -1,13 +1,12 @@
-
-===========
PyPy Parser
===========
Overview
-========
+--------
The PyPy parser includes a tokenizer and a recursive descent parser.
+
Tokenizer
---------
@@ -15,13 +14,15 @@ At the moment, the tokenizer is implemented as a single function
(``generate_tokens`` in :source:`pypy/interpreter/pyparser/pytokenizer.py`) that builds
a list of tokens. The tokens are then fed to the parser.
+
Parser
------
The parser is a simple LL(1) parser that is similar to CPython's.
+
Building the Python grammar
-***************************
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
The python grammar is built at startup from the pristine CPython grammar file
(see :source:`pypy/interpreter/pyparser/metaparser.py`). The grammar builder first
@@ -34,8 +35,9 @@ parsing. Finally, the assigns the grammar builder assigns each DFA state a
number and packs them into a list for the parser to use. The final product is
an instance of the ``Grammar`` class in :source:`pypy/interpreter/pyparser/parser.py`.
+
Parser implementation
-*********************
+~~~~~~~~~~~~~~~~~~~~~
The workhorse of the parser is the ``add_token`` method of the ``Parser`` class.
It tries to find a transition from the current state to another state based on
@@ -44,8 +46,9 @@ if the current state is accepting. If it's not, a ``ParseError`` is
raised. When parsing is done without error, the parser has built a tree of
``Node``.
+
Parsing Python
-**************
+~~~~~~~~~~~~~~
The glue code between the tokenizer and the parser as well as extra Python
specific code is in :source:`pypy/interpreter/pyparser/pyparse.py`. The
@@ -54,14 +57,16 @@ tree. It also detects the coding cookie if there is one and decodes the source.
Note that __future__ imports are handled before the parser is invoked by
manually parsing the source in :source:`pypy/interpreter/pyparser/future.py`.
+
Compiler
--------
The next step in generating Python bytecode is converting the parse tree into an
Abstract Syntax Tree (AST).
+
Building AST
-************
+~~~~~~~~~~~~
Python's AST is described in :source:`pypy/interpreter/astcompiler/tools/Python.asdl`.
From this definition, :source:`pypy/interpreter/astcompiler/tools/asdl_py.py` generates
@@ -74,15 +79,17 @@ extensions to the AST classes are in
parse trees into AST. It walks down the parse tree building nodes as it goes.
The result is a toplevel ``mod`` node.
+
AST Optimization
-****************
+~~~~~~~~~~~~~~~~
:source:`pypy/interpreter/astcompiler/optimize.py` contains the AST optimizer. It does
constant folding of expressions, and other simple transformations like making a
load of the name "None" into a constant.
+
Symbol analysis
-***************
+~~~~~~~~~~~~~~~
Before writing bytecode, a symbol table is built in
:source:`pypy/interpreter/astcompiler/symtable.py`. It determines if every name in the
@@ -90,8 +97,9 @@ source is local, implicitly global (no global declaration), explicitly global
(there's a global declaration of the name in the scope), a cell (the name in
used in nested scopes), or free (it's used in a nested function).
+
Bytecode generation
-*******************
+~~~~~~~~~~~~~~~~~~~
Bytecode is emitted in :source:`pypy/interpreter/astcompiler/codegen.py`. Each
bytecode is represented temporarily by the ``Instruction`` class in