Skip to content
This repository has been archived by the owner on Oct 16, 2024. It is now read-only.

DLFS - A FileSystem API wrapper over dlog API #227

Closed
wants to merge 7 commits into from

Conversation

sijie
Copy link
Member

@sijie sijie commented Oct 24, 2017

Descriptions of the changes in this PR:

  • FileSystem API wrapper built over dlog API

(This is based on initial implementation from @gerritsundaram at #43)

Features supported:

  • create and append files
  • open files for reading
  • input stream and output stream for reading and writing data
  • list files
  • get file status
  • rename
  • mkdir

Features aren't supported:

  • truncate
  • currently there is no clear distinguish between file and dir
  • only support delete recursive

(This change includes small changes for #224 #225 #226 ).

@sijie sijie added this to the 0.6.0 milestone Oct 24, 2017
@sijie sijie self-assigned this Oct 24, 2017
@sijie sijie requested review from mgodave and jiazhai October 24, 2017 10:16
jiazhai pushed a commit that referenced this pull request Oct 24, 2017
Descriptions of the changes in this PR:

reuse the methods used by `rename` to create missing path components.

(the test is covered by #227)

Author: Sijie Guo <[email protected]>

Reviewers: Jia Zhai <None>

This closes #228 from sijie/fix_create_log_pr
jiazhai pushed a commit that referenced this pull request Oct 24, 2017
Descriptions of the changes in this PR:

exclude `<default>` from listing logs

(the tests are covered by #227)

Author: Sijie Guo <[email protected]>

Reviewers: Jia Zhai <None>

This closes #229 from sijie/fix_listing_log_pr, closes #224
Copy link

@ivankelly ivankelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very cool. A few minor comments.

confLocal.addConfiguration(dlConf);
confLocal.setEnsembleSize(replication);
confLocal.setWriteQuorumSize(replication);
confLocal.setAckQuorumSize(replication);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack quorum == write quorum?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hdfs api has only one replication factor. so I am making ack quorum == write quorum to match similar behavior as hdfs.

Progressable progressable) throws IOException {
// for overwrite, delete the existing file first.
if (overwrite) {
delete(path, false);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens in the case that create fails but you managed to delete the file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current approach doesn't guarantee any atomicity. this can be improved if dlog api supports opening a log stream in overwrite mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create an issue for this - #231

}

@Override
public boolean truncate(Path f, long newLength) throws IOException {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can this even be supported without reading the whole log and rewriting? Is this used much in any case? Seems to me like a map reduce specific thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truncate(f, newLength) is a posix filesystem semantic. I see it is used in multiple places -- when people want to delete potential-corrupted data from file (e.g. due to application crashes), they use truncate.

in dlog, we support truncate the head. in filesystem, truncate is truncating the tail. it is actually easy to support - whenever a truncate is issued, it should close active segment (ledger), based on the length to find what segments can be fully deleted and what segments can be partially deleted. for segments that can be fully deleted, just delete the ledgers and for segments that can be partially deleted, update the end position (dlsn) in the segments' metadata.

all the metadata status and mechanism are sorted of ready in dlog, since it supports truncate the head. adding truncate to tail can use those statuses.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create an issue for this - #232

@Override
public void flush() throws IOException {
try {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty line looks like you wanted to say more here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, those empty lines were left because I abstract the logic of writeControlRecord method. I will remove the empty lines.

@sijie
Copy link
Member Author

sijie commented Oct 26, 2017

@jiazhai can you review this again?

}
}

//
Copy link
Member

@jiazhai jiazhai Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems only truncate is not supported now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch. I will update the comments.

Copy link
Member

@jiazhai jiazhai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.


writerOutputBufferSize=131072

numWorkerThreads=1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to set this as "1" instead of the default value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh this configuration is used only for testing. so I set it to 1 thread.

@sijie
Copy link
Member Author

sijie commented Oct 27, 2017

addressed the comments.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants