Advanced-Scala之cats学习笔记(一)

Underscore.io开源了他们的著作,比如《advanced scala with cats》、《essential Slick》、《essential lift》、《essential play》等等大作。借此机会学习学习cats库,开开眼界😂。

至于cats是什么呢?它是一款scala函数式编程库😈。

Type Class

Type classes are a programming pattern originating in Haskell.类型类是源自于Haskell语言的编程模式。They allow us to extends existing libraries with new functionality,without using traditional inheritance,and without altering the original library source code.它允许我们在不去修改源代码的基础上来拓展原有库的功能。

Anatomy of a Type Class

There are three important components to the type class pattern:

  1. the type class itself
  2. instances for particular types
  3. the interface methods that we expose to users

类型类模式有三部分组成:类型类本身、具体类型的实例、留给用户的方法。

The Type Class

A type class is an interface or API that represents some functionality we want to implement.一个类型类代表着我们将要去实现的一些功能。In Cats a type class is represented by a trait with at least one type parameter.在Cats里,类型类是用至少有一个类型参数(注意:是类型参数)的trait实现。For example,we can represent generic “serialize to JSON” behaviour as follows.举个例子,我们可以编写一个简单的json库以及序列化到json的行为(“serialize to JSON” behaviour):

1
2
3
4
5
//Define a very simple JSON AST 抽象语法树
sealed trait Json
case class JsObject(get:Map[String,Json]) extends Json
case class JsString(get:String) extends Json
case class JsNumber(get:Double) extends Json
1
2
3
4
5
//"serialize to JSON" behaviour
//这里才是类型类
trait JsonWriter[A]{
def write(value:A):Json
}

Warts of the Scala Programming Language(中文翻译)

翻译自lihaoyi的文章(访问需梯子)
原创翻译,转载请联系译者

    
    Scala是我目前最喜欢的多用途的编程语言。然而它是有些缺陷的。语言中有些设计是经过仔细权衡,有些则是试验性的,那些愚蠢的问题所带来挫败感远超过他们的成功之处:warts(作者把语言中的失败之处比喻为疣子,中文把它比作糟粕更好)。这篇文章讲阐述我所认为的Scala语言的糟粕,希望可以提高人们对这些问题的认识,也希望可以集合更广大的社区力量来修复它。

About the Author: Haoyi is a software engineer, an early contributor to Scala.js, and the author of many open-source Scala tools such as the Ammonite REPL and FastParse.
If you’ve enjoyed this blog, or enjoyed using Haoyi’s other open source libraries, please chip in (or get your Company to chip in!) via Patreon so he can continue his open-source work

scala写算法-List、Stream、以及剑指Offer里部分题目基于scala解法

Stream(immutable)

Stream是惰性列表。实现细节涉及到lazy懒惰求值传名参数等等技术(具体细节详见维基百科-求值策略)。
StreamList是scala中严格求值非严格求值两个代表性不可变函数式数据结构。
考虑字符串拼接的表达式"foo"+"bar"的到"foobar",空串""就是这个操作的单位元(identity,数学中又称幺元),也就是说s+""或者""+s的值就是s。如果将三个字符串相加s+t+r,由于操作是可结合的(associative),所以说((s+t)+r)(s+(t+r))的结果一样。
在函数式编程里,把这类代数称为monoid。结合律(associativity)和同一律(identity)法则被称为monoid法则。

可折叠数据结构

比如ListStreamTreeVector等等这类都是可折叠数据结构。monoid与折叠操作有着紧密联系。
比如words=List("aa","bb","cc"),运用折叠方法如下:

1
2
words.foldRight("")((a,b)=>a+b) == ((""+"aa")+"bb")+"cc"
words.foldLeft("")((a,b)=>a+b) == "aa"+("bb"+("cc"+""))

首先实现Stream.fold方法,然后用fold去实现map filter flatMap等等高阶函数(大可不必仅使用fold去编写其它函数,就像尺规作图没有刻度一样。这样写仅仅为了好玩,没有银弹No Silver Bullet)。至于mapflatMap是什么?函子(Functor)是对map的泛化,Monad是对flatMap的泛化(相关概念参见Fp in Scala)。
fold源码如下(采用尾递归):

1
2
3
4
5
6
7
8
def fold[B](z: =>B)(f:(A,B)=>B):B={
@tailrec
def loop(stream: Stream[A])(result: =>B)(f:(A,B)=>B):B=stream match {
case Empty =>result
case Cons(h,t) =>loop(t())(f(h(),result))(f)
}
loop(self)(z)(f)
}

有可能是最最最简单优美的ORM框架-quill(基于scala)

Quill provides a Quoted Domain Specific Language (QDSL) to express queries in Scala and execute them in a target language. The library’s core is designed to support multiple target languages, currently featuring specializations for Structured Query Language (SQL) and Cassandra Query Language (CQL).

这篇博客确实有标题党的嫌疑,不过quill真的很优秀。

这是官网对其的介绍。quill提供非常优美舒适的dsl,容易快速入手使用。本博客采用的数据库基于mysql,数据来源于《SQL学习指南》中的数据,数据库脚本在此。至于mybatishibernate本文不再赘述。

写博客也是种学习过程。

scala-Future和Promise

首先说明同步与异步,阻塞与非阻塞的问题:
Asynchronous vs. Synchronous

A method call is considered synchronous if the caller cannot make progress until the method returns a value or throws an exception. On the other hand, an asynchronous call allows the caller to progress after a finite number of steps, and the completion of the method may be signalled via some additional mechanism (it might be a registered callback, a Future, or a message).
A synchronous API may use blocking to implement synchrony, but this is not a necessity. A very CPU intensive task might give a similar behavior as blocking. In general, it is preferred to use asynchronous APIs, as they guarantee that the system is able to progress.

Non-blocking vs. Blocking

We talk about blocking if the delay of one thread can indefinitely delay some of the other threads. A good example is a resource which can be used exclusively by one thread using mutual exclusion. If a thread holds on to the resource indefinitely (for example accidentally running an infinite loop) other threads waiting on the resource can not progress. In contrast, non-blocking means that no thread is able to indefinitely delay others.
Non-blocking operations are preferred to blocking ones, as the overall progress of the system is not trivially guaranteed when it contains blocking operations.

以上文献摘自akka文档,一个方法之所以被称为同步方法,是因为直到该方法返回某值或者抛出异常,该方法的调用者才能得到结果(make progress)。如果一个异步调用需要通过额外的机制(比如callback,Future,message)。如果一个线程的延迟导致了另一个(一些)线程的延迟,那么久出现了阻塞(blocking)。一个例子就是一个资源被一个线程所独占,那么其他线程需要等待这个线程释放资源才能继续执行。

scala中的FuturePromise都是非阻塞的执行,既可以通过回调函数获取结果,但是也可以通过阻塞的方法串行获取结果。

K-means算法

代码写的比较烂,废话多-_-#:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
object Main extends App {
def computeDistence(item1: (Int, Int), µ: (Int, Int)) = (item1._1 - µ._1) * (item1._1 - µ._1) + (item1._2 - µ._2) * (item1._2 - µ._2)
val data = List(1 to 100: _*) map (i => (Random.nextInt(100), Random.nextInt(100)))
var container1: List[(Int, Int)] = List()
var container2: List[(Int, Int)] = List()
var container3: List[(Int, Int)] = List()
var resultList1=container1
var resultList2=container2
var resultList3=container2
var µ1 = data.drop(30).head // 随机初始化 簇中心
var µ2 = data.drop(60).head // 随机初始化 簇中心
var µ3 = data.drop(90).head // 随机初始化 簇中心
for (_ <- 1 to 10) {
resultList1=container1
resultList2=container2
resultList3=container3
container2=Nil
container1=Nil
container3=Nil
data.foreach(item => {
val distence1=computeDistence(item,µ1)
val distence2=computeDistence(item,µ2)
val distence3=computeDistence(item,µ3)
if(distence1<distence2 && distence1<distence3)
container1=List(item)++container1 // 重新把点分类
if(distence2<distence1 && distence2<distence3)
container2=List(item)++container2 // 重新把点分类
if(distence3<distence1 && distence3<distence2)
container3=List(item)++container3 // 重新把点分类
})
µ1 = container1.fold((0, 0))((a, b) => (a._1 + b._1, a._2 + b._2))
µ1 = (µ1._1 / container1.length, µ1._2 / container1.length)//更新簇中心 简单采用几何中心
µ2 = container2.fold((0, 0))((a, b) => (a._1 + b._1, a._2 + b._2))
µ2 = (µ2._1 / container2.length, µ2._2 / container2.length) //更新簇中心
µ3=container3.fold((0,0))((a,b)=>(a._1+b._1,a._2+b._2))
µ3=(µ3._1/container3.length,µ3._2/container3.length)//更新簇中心
}
val f=Figure()
val p=f.subplot(0)
p += scatter(resultList1.map(_._1),resultList1.map(_._2),size = _=>1,colors = _=>Color.BLACK)
p += scatter(resultList2.map(_._1),resultList2.map(_._2),size = _=>1,colors = _=>Color.RED)
p += scatter(resultList3.map(_._1),resultList3.map(_._2),size = _=>1,colors = _=>Color.YELLOW)
f.saveas("ha.png")
}

数据展示如下图:

logistic regression处理分类问题

logistic regresion虽然名字里有”回归”,但是它实际上是分类方法,用于二元分类问题(即输出两种结果).需要找到一个预测函数,该函数输出为两种值(分别代表两种类别),所以利用了Logistic函数,形式为,其函数图像为:

对于线下边界而言:

构造预测函数:

构造cost函数:

scala矩阵运算库-breeze学习与使用

1
2
3
4
import breeze.linalg._
import breeze.numerics._
import breeze.stats._
import breeze.plot._

breeze库很好用,包括线性代数运算与数据可视化等等.

Plotting Data

折线图与散点图

1
2
3
4
5
6
7
8
val p2=f.subplot(0)
p2 += plot(1 to 100,1 to 100 map(i=>Random.nextInt(i*2)))
p2 += scatter(1 to 100,1 to 100 map(i=>Random.nextInt(i)),radius=>0.5,colors = x=>Color.BLACK)
//散点图
//radius 为点的半径
//colors为点的颜色
p2 += plot(1.0 to 100 by 0.1 ,1.0 to 100 by 0.1 map(15*math.cos(_)+50))
f.saveas("hah.png")