-
Notifications
You must be signed in to change notification settings - Fork 0
Home
This wiki is a place where anyone can share links, recipes, guides with other FieldDB users. How to edit the wiki
FieldDB is a free, open source project developed collectively by field linguists and software developers to use the latest BigData and Cloud technology to make a modular, user-friendly app which can be used to collect, search and share your data.
- FieldDB is a Chrome app, which means it works on Windows, Mac, Linux, Android, iPad, and also offline.
- Multiple collaborators can add to the same corpus, and you can encrypt any piece of data, keep it private within your corpus, or make it public to share with the community and other researchers.
FieldDB uses machine learning and computational linguistics to adapt to your existing organization of the data which you import and predict how to gloss it. FieldDB already supports import and export of many common formats, including ELAN, Praat, Toolbox, Filemaker Pro, LaTeX, xml, csv and more, but if you have another format you'd like to import or export, Contact Us.
We designed FieldDB from the ground up to be user-friendly, but also to conform to EMELD and DataOne best practices on formatting, archiving, open access, and security. For more information, see the data management sections of our white paper. We vow never to use your private data: you can find out more in our Privacy Policy.
FieldDB/LingSync is a free, open source project that only improves when people are working on it! We need your help growing the community of people working on LingSync and enabling them to devote more time to it. Find out how to do this in How to Help.
For more information, see our list of frequently asked questions. Let us know if you think of a question that should be added to the FAQ.
To find out more about the features and to install it, visit its website at http://lingsync.org
- M.E. Cathcart (U Delaware)
- Gina Cook (iLanguage Lab Ltd)
- Theresa Deering (iLanguage Lab Ltd)
- Yuliya Manyakina (Stony Brook)
- Elise McClay (McGill)
- Hisako Noguchi (Concordia)
- Similar Software for Fieldlinguistics
- Similar Software for sharing data
- Software which can be used for the Auto Glosser Module
We have some small funding TBA
This project is released under the Apache 2.0 license, which is an very non-restrictive open source license which basically says you can adapt the code to any use you see fit.
Functional requirements put simply, are buttons and features that do something for the user.
- The system shall allow for data entry like all other fieldlinguistics databases.
- The system shall allow drag and drop audio to attach audio to your data.
- The system shall insert IPA and special symbols
- The system shall support Hotkeys and keyboard short-cuts for common tasks.
- The system shall import .csv
- The system shall import ELAN
Non-functional requirements put simply, are overarching considerations which affect the software architecture. Non-functional requirements are where most existing field linguistics/corpus linguistics database applications fall short.
- The system shall be user friendly.
- The system shall be cute so that the users feel like entering data, instead of procrastinating.
- The system shall be designed for search as this is one of the most important tasks that a fieldlinguist needs to do. The search needs go far beyond traditional string matches and database indexes. We need to see data in context, and the search must attempt to make this possible.
- The system shall be OpenData enabled. Corpora often contain sensitive information, informant stories and other information which must be kept confidential. Having confidential data in plain text in a corpus forces the entire corpus to be kept confidential. The system shall encrypt confidential data and store the data in the corpus encrypted. To access the plain text the user must log in and use their password to decrypt the data. This design has important ramifications for exporting data, and for editing the data outside the app.
- The system shall be Non-Proprietary. The data must be exportable in non-proprietary formats such as XML or JSON to be sure that users can migrate their data to other systems and do as they see fit.
- The system shall be trustable. The system shall not provide access to private data to anyone, not even the developers of the software. Your data is your data.
- The system shall work offine. Running a webapp offline has important consequences for how data is stored, and how data is retrieved, and how much data can be used while offline. Most browsers have limits on the amount of data a webapp can store offline. By delivering a version of the app in a Chrome App (which has permission to have unlimited storage) the user will be able to have a significant portion of their data at their finger tips, whether in the metro, in the field, in a park or in a country where wifi is not common, which is commonly the case for where we do our field work.
- The system shall run on Mac, Linux, Windows computers. By providing the app as an installable Chrome Extension, the app will run on all platforms, Mac, Linux and Windows, as well as Chrome Books.
- The system shall be OpenSource. Being OpenSource allows departments to install and customize the database app for their needs without worry that the company behind the software will disappear or stop maintaining the software. In addition, OpenSourcing the software on GitHub allows linguists with scripting/programming experience to contribute back to the software to make it more customized to their needs and their language typologies or linguistics research areas.
- The system shall store data in Unicode. Encoding problems and loosing our data should be behind us in the days of unicode, however many existing fieldlinguistics databases were built in programming languages that didn't support unicode, so the unicode support is dangerously fragile. We want 100% unicode.
- The system shall be simple. The system is not designed to replace advanced field linguistics databases, or corpus linguistics databases. It is designed to replace Word Documents or LaTeX documents, which is a very common way fieldlinguists store data because it requires training, doesn't require a complicated set-up for data categories, and takes no time to add new categories.
- The system shall be theory free. The system will not include categories or linguistic frameworks or theoretical constructs that must be tied to the data, however the system will make it possible for the team to set their own categories and thus tie the specific theoretical constructs they are investigating to their data. This is a different approach to many corpus databases which are motivated by typologies and language counting which place a large data-entry burden on the user, rather than letting the user gradually constrain their data entry conventions over time as they get more data, and as their analysis of the data changes.
- The system shall be collaborative The system shall have users and teams, and permissions for corpora which will ensure that data can be safely shared and edited by multiple users. The corpus will be versioned so that users can track changes and revert mistakes.
- The system shall run on touch devices Touch tablets are one of the easiest tools to carry to the field, they have a long battery life, they can play videos or show images for the language consultant to elicit complicated contexts (such as quantification and scope), they permit recording audio and video and direct publishing to YouTube or other service. Android tablets are particularly easy to program and integrate the microphone directly into the database.
- The system shall be smart The system shall try to guess what users do most often, and automate the process for them. Most importantly, predictable glossing information should be automated as much as possible.