Added Masking
Added masking to the Linformer. However, this is still a WIP, since masking cannot be done in the traditional sense, like what is done in the attention is all you need paper, because there is an overhead of adding another (n,n)
matrix, which is infeasable.