Skip to content

Commit

Permalink
v0.33.0 - cleaning and documenting
Browse files Browse the repository at this point in the history
  • Loading branch information
arendsee committed Nov 3, 2020
2 parents 4406775 + 81984d9 commit ffc0006
Show file tree
Hide file tree
Showing 30 changed files with 425 additions and 122 deletions.
14 changes: 10 additions & 4 deletions ChangeLog.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,21 @@ handling for several very different languages (proofs-of-concept).
- [ ] language support (Python3, R, C++, Java, Haskell, Scheme, Prolog)
- [ ] well tested core libraries

0.34.0 [202x.xx.xx]
-------------------

- [ ] Remove extra space printed at the end of R JSON
- [ ] Remove semicolon requirement

0.33.0 [2020.11.08]
0.33.0 [2020.11.03]
-------------------

First hackage release

- [ ] Remove extra space printed at the end of R JSON
- [ ] Haddock documentation
- [ ] Update README
* Haddock documentation
* Update README
* In help statements write universal, not concrete, types
* Make default containers non-existential (probably a bad decision?)

0.32.0 [2020.11.01]
-------------------
Expand Down
163 changes: 83 additions & 80 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,37 +5,85 @@
[![DOI](https://zenodo.org/badge/75355860.svg)](https://zenodo.org/badge/latestdoi/75355860)

`morloc` is a functional programming language where functions are imported from
foreign languages and unified through a common type system. The compiler
foreign languages and unified under a common type system. The compiler
generates the code needed to compose functions across languages and also to
direct automation of mundane tasks such as data validation, type/format
conversions, data caching, distributed computing, and file reading/writing. In
the far future, I hope to develop `morloc` into a query language that returns
optimized programs from an infinite "library" of functions and compositions of
functions.
conversions, data caching, distributed computing, and file reading/writing. The
endgame is to develop `morloc` into a query language that returns optimized
programs from an infinite library of functions and compositions of functions.

See [the manual](https://morloc-project.github.io/docs) for more information.

## Status

This project is under active development and may change radically without
warning. There are lots of bugs. You may peruse the issues page for a small
sample.
If you want to get straight to playing with code, go through the steps in the
installation section and then go to the project in `demo/01_sequence_analysis`.

Pull requests and issue reports are very welcome.
## Status

All development and testing is done in Linux.
This project is under active development with no stability guarantees until the
v1.0 release. Pull requests, issue reports, and private messages are very
welcome.

## Installation

Compile and install the package as so (requires the Haskell utility `stack`):
Compile and install the package (requires the Haskell utility `stack`):

```sh
git clone https://github.com/morloc-project/morloc
cd morloc
stack install --fast
```

## Hello world!
`morloc` also depends on the `JSON::XS` perl module from CPAN, which can be
installed as follows:

```sh
export PERL_MM_USE_DEFAULT=1
export PERL_CANARY_STABILITY_NOPROMPT=1
sudo perl -MCPAN -e 'install JSON::XS'
```

For Python support, you need to download the `pymorlocinternals` library from
PyPi:

```sh
pip install pymorlocinternals
# or on Mac:
pip3 install pymorlocinternals
```

For R support, you need to install the `rmorlocinternals` library from github,
in an R session, run:

```sh
R> install.packages("devtools")
R> devtools::install_github("morloc-project/rmorlocinternals")
```

C++ support currently requires a GNU compiler that supports C++11.

`morloc` modules can be installed from the `morloc`
[library](https://github.com/morloclib) with the commands such as:

```sh
morloc install cppbase
morloc install pybase
morloc install rbase
morloc install math
```

The `morloc install` commands will install the modules in the
`$HOME/.morloc/lib` folder.

Last of all, if you are working in vim, you can install `morloc` syntax highlighting as follows:

``` sh
mkdir -p ~/.vim/syntax/
mkdir -p ~/.vim/ftdetect/
cp vim-syntax/loc.vim ~/.vim/syntax/
echo 'au BufRead,BufNewFile *.loc set filetype=loc' > ~/.vim/ftdetect/loc.vim
```

## Getting Started

```
export hello
Expand All @@ -62,19 +110,20 @@ message:
```
$ ./nexus.pl -h
The following commands are exported:
hello [0]
hello
return: Str
```

The `[0]` states the number of arguments the "command" hello takes.
The `return: Str` phrases states that hello returns a string value.

The command is called as so:
The command `hello` can be called as shown below:

```
$ ./nexus.pl hello
Hello World
```

## Simple example using math functions from C++
## Composing C++ Functions

The following code uses only C++ functions (`fold`, `map`, `add` and `mul`).

Expand All @@ -97,27 +146,33 @@ morloc install cppbase
morloc make example-1.loc
```

The `install` command clones the `cppbase` repo from github into the
local directory `~/.morloc/lib`. The `make` command will generate a file named
The `install` command clones the `cppbase` repo from github
[repo](https://github.com/morloclib/cppbase) into the local directory
`~/.morloc/lib`. The `morloc make` command will generate a file named
`nexus.pl`, which is an executable interface to the exported functions.

You can see the exported functions and the number of arguments they take:
You can see typed usage information for the exported functions with the `-h` flag:

```sh
$ ./nexus.pl
The following commands are exported
square [1]
rms [1]
$ ./nexus.pl -h
The following commands are exported:
square
param 1: Num
return: Num
sumOfSquares
param 1: [Num]
return: Num
```

Then you can call the exported functions:
Then you can call the exported functions (arguments are in JSON format):

```sh
$ ./nexus.pl sumOfSquares [1,2,3]
$ ./nexus.pl sumOfSquares '[1,2,3]'
14
```

The `nexus.pl` executable dispatches the command to the compiled C++ program, `pool-cpp.out`.
The `nexus.pl` executable dispatches the command to the compiled C++ program,
`pool-cpp.out`.


## Language interop
Expand Down Expand Up @@ -226,55 +281,3 @@ The concrete type of `mul` is currently written as a binary function of
doubles. Ideally this function should accept any numbers (e.g., an `int` and a
`double`). I intend to add this functionallity eventually, perhaps with a
Haskell-style typeclass system.


## The next level

System F is a solid foundation, but the ultimate goal is to be able to express
deep knowledge about the world. To this end, I am exploring the use of
description logic and ontologies for specifying the relationships between
types. This is the *semantic* layer of the type system.

![The user enters the Morloc script (A), which casts a string (AT1G30270.1) as a TairID and feeds it to the composition function. This function has the type signature (B) and expects input of type BioSeq. The required conversions are automatically performed following the type ontology (C). The conversions are performed by functions with the signatures shown in (D), where ?TairID indicates possible failure. These functions are given the convert role in (E). Since ProteinSeq is a BioSeq, any function of a BioSeq works automatically with ProteinSeq. The Morloc compiler](./figures/case-study.png)

One relation that can be defined between types is `a maps_to b`, which states
that any variable of type `a` can be uniquely converted to a variable of type
`b`, for example, `Int maps_to Double`. Some languages, such as Perl and
JavaScript, do extensive automatic conversions. Perl will happily evaluate the
term `"42" + 1` to 43, for example. In Morloc, these sorts of automatic
conversions are defined in ontologies that can be customized by the programmer.

Types can also be specialized with constraints, for example:

```
Count :: x:Int where ( x > 0 )
```

This is can also be used to place constraints on functions. A function is
a compound type that is composed of the types of its inputs, outputs, and
a list of constraints. Here is a signature for a function that generates *n*
random numbers between *a* and *b*.

```
rand :: n:Int -> a:Num -> b:Num -> xs:[c:Num] where (
n > 0
, len xs == n
, c >= a
, c <= b
);
```

The constraints are optional, and `rand` could instead just be written as:

```
rand :: Int -> Num -> Num -> [Num]
```

The addition of the constraints allows

* Static analysis of the correctness of the program.
* Runtime checks of input (if desired, this will be a compiler flag)
* Formal documentation of the behavior of the function

The type system is essential for specifying how data is passed between
languages.
2 changes: 1 addition & 1 deletion USAGE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
morloc version 0.32.0
morloc version 0.33.0

Usage:
morloc make [--expression] [--config=<config>] [--vanilla] <script>
Expand Down
6 changes: 6 additions & 0 deletions demos/01_sequence_analysis/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
all:
morloc make main.loc

.PHONY: clean
clean:
rm -rf nexus.pl pool.cpp *out */nexus.pl */*out */pool.cpp
46 changes: 46 additions & 0 deletions demos/01_sequence_analysis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# `morloc` DNA sequence example

Before running this example, be sure you have followed the installation
instructions in the top-level README. Then install the `cppbase` morloc module
with `morloc install cppbase`.

First open the `main.loc` file to see the top-level `morloc` program this we
will build. Then build the project as follows:

``` sh
morloc make main.loc
```

This should generate the following files:
* pool.cpp - the generate C++ source code for (de)serialization and function composition
* pool-cpp.out - the compiled code
* nexus.pl - the user interface

Feel free to skim the `pool.cpp` file to see what the morlocks are up to
underground. You can also read the `nexus.pl` script (which is just a Perl
script). Both of these files are generated based on the `morloc` template
`main.loc`.

To access the usage statement, run `./nexus.pl -h`. This will list all exported
commands and the types of their input and output. Currently the help statement
is pretty minimal, but I'll remedy this in the near future.

Commands can be called as follows:

``` sh
$ ./nexus.pl fasta_revcom '"test.fasta"'
">Unicorn
TGTATCTGTATCTGTATCTGTATC
>Dragon
TGTATCTGTATCTGTATCTGTATCTGTATCTGTATCTGTATCTGTATC
"
```

Why the weird quoting? All inputs to a morloc program are (for now) raw JSON
data. Raw string inputs need two levels of quotation since one level is removed
by Bash (hence, '"test.fasta"'). The returned value is also a JSON string, so
it is quoted. The `write_fasta` function could alternatively be written to
print directly to STDOUT instead of returning a string.

To learn more about module construction, visit the `bio` and `fasta` modules in
this folder.
31 changes: 31 additions & 0 deletions demos/01_sequence_analysis/bio/bio.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#ifndef __BIO_HPP__
#define __BIO_HPP__

#include <string>

std::string revcom(std::string seq){
size_t N = seq.size();
std::string revSeq(N, '*');
for (size_t i = 0; i < N; i++){
switch(seq[i]){
case 'A':
revSeq[N - i - 1] = 'T';
break;
case 'T':
revSeq[N - i - 1] = 'A';
break;
case 'G':
revSeq[N - i - 1] = 'C';
break;
case 'C':
revSeq[N - i - 1] = 'G';
break;
default:
revSeq[N - i - 1] = seq[i];
break;
}
}
return revSeq;
}

#endif
11 changes: 11 additions & 0 deletions demos/01_sequence_analysis/bio/main.loc
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
module bio {

export revcom

source Cpp from "bio.hpp" ("revcom")

-- Take the reverse complement of a DNA sequence
revcom :: Str -> Str;
revcom Cpp :: "std::string" -> "std::string";

}
11 changes: 11 additions & 0 deletions demos/01_sequence_analysis/bio/package.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
name: bio
version: 0.0.0
homepage: null
synopsis: bioinformatics module
description: Basic bioinformatics functions for genome analysis and all that.
category: bioinformatics
license: MIT
author: "Zebulun Arendsee"
maintainer: "[email protected]"
github: null
bug-reports: null
Loading

0 comments on commit ffc0006

Please sign in to comment.