Part I: Coq
Resources
- Guide to tactics.
Basic tutorials
    For each tutorial, you must remove all uses of TODO
    or Admitted.
    
- Programming in Coq: Coq / HTML / Vidéo
- Propositional proofs: Coq / HTML / Vidéo
- Arithmetic: Coq / HTML
The first three tutorials are due before
    Monday, February 22th.
    The completed .v files should simply be sent by email,
    possibly with comments on tricky parts, encountered problems, etc.
Advanced tutorials
The three advanced tutorials are due before
    Monday, March 29th.
    To get all the points you will have to complete all bonuses
    from the acc.v tutorial.
Part II.1: Solving problems sing SATs-solvers
Parts II.1 and II.2 are due by Wednesday, June 2 at 23:59. You must submit your solutions on ecampus: instructions are detailed there.
  This project comes with some
  code skeleton.
  The provided OCaml module Dimacs will be useful
  for all tasks, and the Hex module will be useful
  to encode the penguins' problem. The documentation of these
  modules can be generated in html/
  by running make doc; it is also
  available here.
  
We will consider several problems. Each problem will be encoded
  to SAT, and solved using minisat.
  More precisely, each instance I of the problem must be encoded
  in polynomial time
  to a SAT instance S such that the problem I
  is feasible iff S has a model.
  Further, you will have to be able to extract (again in polynomial time)
  a solution for I from a model of S.
  It is not required that all solutions can be obtained in this way.
  
Your encodings will be judged first on their correctness. If correct, points will be granted depending on their performance: more details here.
Latin square
  File src/latin.ml shows how to encode
  latin square
  to SAT (the input N is the size of the square).
  It can be tested using, for example, make N=10 test_latin.
  However, the encoding can be improved: try to modify the file, adding
  some clauses to help the SAT solver.
  
 
  Greco-latin square
  Adapt src/latin.ml into src/greek.ml to
  encode the
  graeco-latin
    square problem.
 
  Wang tiling
Jeandel and Rao have found a set of 11 Wang tiles that is aperiodic and is minimal for that property. The tiles are shown below on the left picture. On the right, we show a vertical pattern of five tiles (please ignore the number in the center of the tiles in this picture).
 
   
  
  You must create the binary jr
  which encodes the tiling problem for the Jeandel-Rao set of 11 tiles
  on a square of dimension N×N,
  with a constraint: the five-tile pattern shown above
  must be forced on tiles of column N/2 between lines N/2-1 and N/2+2.
  You should be able to find a tiling for N=70 but not N=80.
  
  You may use this file to convert your tilings
  to SVG for visualization. It contains instructions on how to represent
  tilings in ASCII, using symbols A, B, C… K to represent the tiles
  read from left to right and top to bottom in the above picture.
  This program depends on the OCaml library cairo2 which
  you may install using OPAM.
  I do not believe that the visualization itself will be very useful to debug
  your encodings, expect for the fact that it will check your solutions
  and display warnings on color mismatches.
  
Pingouins
  In Hey,
    That's My Fish! several penguins move on hexagonal ice floats
  to eat as much fish as possible. We consider the simplified problem
  where a single penguin has to eat some number of fish, when there
  is exactly one fish per float.
  Several instances of this problem are provided in problems/,
  organized in subdirectories indicating how many floats can be left
  unvisited (this number is tight but your encoding does not have to verify
  it).
  
 
  
  Write a SAT encoder src/pingouins.ml, using the provided
  OCaml module Hex to parse input files and handle
  moves on the hexagonal grid.
  Your pingouins binary
  should use the environment variable PENALTY to
  know how many floats can be left unvisited.
  This is required to be able to use the provided make targets,
  e.g. make PROBLEM=problems/1/flake1 PENALTY=1 test_pingouins
  to test on a given problem
  and make PENALTY=1 tests_pingouins to test on all
  problems of problems/1/.
  
Part II.2: Implementing a SAT solver
Replay the introductory video explaining DPLL and (at 30:00) Two-Watched Literals.
  The most naive way to solve SAT is to guess the value of all variables,
  one by one. This is however too inefficient: after guessing, for example,
  that some variable X is false, there are typically several
  variables Y whose value is forced by the clauses. This is
  the main idea behind the
  DPLL
  algorithm, which is the core of modern SAT solvers.
  
  A basic implementation of DPLL is provided in src/naive.ml.
  It alternates between choices and unit propagation — we will not
  consider pure literal propagations. Backtracking is used to consider
  alternative choices. When encountering a conflict (i.e. realizing that
  the current assignment violates a clause) the function simply returns,
  and it raises SAT m when a model m is found.
  The partial assignment, containing values resulting
  from both choices and propagations, is maintained as an association list,
  of type (literal * bool) list.
  
When TRACE is set in the environment, the execution of the naive SAT solver produces a trace of the successive partial assignments. It is required that this functionality is preserved in your modified versions of the solver. The trace produced after your modifications must remain unchanged.
Imperative partial assignments
In order to improve our SAT solver, we will first adopt a more efficient data structure for the partial assignment: instead of the association list, we will use two arrays, a first one indicating for each variable whether it is assigned, and a second one giving the actual values of variables. This implementation allows constant-time querying and modification of the partial assignment.
  The downside is that the data structure is no more persitent, but mutable.
  Thanks to its persistent partial assignment, the DPLL function of
  src/naive.ml simply takes an assignment as argument,
  and passes its modified versions to its two recursive calls,
  corresponding to the exploration of the two alternative values for
  the chosen variable.
  With the new mutable data structure, something has to be done to cope
  with backtracking. There are two options:
  
- 
      We could still take the assignment as argument of dpll, and useArray.copyto keep its initial value, and be able to backtrack. These copies would however be very inefficient, costing time and memory allocations.
- 
      A better approach is to maintain a single partial assignment
      (two arrays) to be used by all recursive calls of dpll, and maintain a stack of modifications performed on the partial assignment, to be able to undo them. Whenever a variableXis assigned, a new itemXshould be added to the stack. A special "checkpoint" value, for example0, can also be pushed to the stack, to be able to undo modifications back to this point: this simply consists in marking as unassigned all variables found on the stack before the checkpoint (changing their value is unnecessary, as the value of unassigned variables is irrelevant).
  Modify src/naive.ml into src/arrays.ml
  to use partial assignments as arrays, using the second approach.
  You should witness a significant speedup. To check for correctness
  first compare the traces on simple examples, then you can run
  make PROVER=./arrays test for a thorough test.
  This will also display the performance ratio between your prover
  and minisat, which is about 850 in my case.
  
  It is mandatory that you precisely state
  (1) how your data structures represent a partial assignement and
  (2) what invariant/pre-condition/post-condition holds about your stack
  of modifications through the recursive calls of the dpll
  function.
  
Two watched literals
Two fundamental ideas have brought impressive performance gains for SAT solvers. The first one is conflict-driven clause learning: it is purely logical, and beautiful, but we will not consider it in this project. The second one is two watched literals: it is an algorithmic improvement of the data structure used to store clauses, allowing more efficient unit propagation.
A DPLL solver spends most of its time propagating literals, it is thus important to optimize this step. Unit propagation requires to iterate over unit clauses wrt. the current assignment. A key observation is that, since a clause becomes a unit clause when all but one of its literals are assigned to false, it suffices to "watch" two of its literals to detect when this happens, maintaining the following invariant:
⊢ In a clause that is not satisfied in the current partial assignment, the two watched literals are unassigned.
Then, propagation takes place as follows when a literal L
  becomes false:
- 
      There is no need to consider clauses where Lis not watched. Indeed, these clauses have at least two other unassigned literals: their watched literals.
- 
      We also do not need to consider clauses that are satisfied,
      i.e. containing a literal assigned to true. This includes
      clauses containing the negation of L, watched or not.
- 
      If an unsatisfied clause where Lis watched has only one unassigned literal, it must be its other watched literal, and we can propagate it. If two such propagations are contradictory, we have a conflict.
- 
      If a clause where Lis watched is not a unit clause, then it must have an unassigned literalL'that is not currently watched. We maintain the invariant by changing the watched literals of the clause, replacingLbyL'.
More details, and illustrative examples, may be found in this or that paper.
A key aspect of this technique is that, since the choice of watched literals is arbitrary (as long as the invariant is respected), there is no need to undo the changes of watched literals when backtracking. Thus, we can use an efficient mutable data structure organizing clauses by watched literals, without any cost when backtracking. In practice, we can associate to each literal a doubly-linked list of clauses where the literal is watched, to obtain constant time removal and insertion of clauses.
  Adapt src/arrays.ml
  into src/twl.ml implementing this technique.
  You should notice a significant speedup, although the resulting
  prover should still be around 250 times slower than
  minisat, as indicated by make test.
  
Literal selection heuristics
To further improve performance, you can explore the impact of different literal selection heuristics. You might consider random literal selection, as well as some of the classic heuristics such as MAXO, MOMS, MAMS, etc. They are defined, for example, here.
For this part (and this part only) you will obtain different traces.
Coding tips for part II
Keep your solvers simple
  Do not implement more than what I ask: this means more work but also
  more opportunities for errors.
  For your information, I have 124 lines for my arrays
  solver, and 291 for twl. Both files are self-contained,
  in particular this line count includes the doubly-linked lists for
  twl.
  
Do one thing at a time, then test
It will be easier to debug your code if you change one thing at a time, and test each modification, including a verification that the traces did not change on some well-chosen examples. This will notably prevent unwanted changes of selection strategies. This way you can compare the two provers and check that nothing changes besides performance.
Do not debug using make
  If you encounter a problem, running for example
  make PROBLEM=problems/1/flake4hv PENALTY=1 test_pingouins,
  first identify which program triggers the problem (make runs several
  program one after the other in such targets).
  If it's minisat it means that your problem encoder
  produces an ill-formed output. Otherwise, it's one of your programs:
  focus on debugging its execution; but don't forget to run make
  each time you run it again to test modifications
  (make && ... is your friend).
How to debug when you get an exception
  A backtrace will help a lot.
  Run again the OCaml program with OCAMLRUNPARAM=b in the
  environment. You may also need to recompile with a modified Makefile:
  change the compiler definition to
  OCAMLOPT = ocamlopt -I src -g, then make clean
  and recompile everything.
  When aiming for performance, you may set different compiler options
  in this way.