What does HackerNews think of TileDB?

The Universal Storage Engine

Language: C++

Stavros from TileDB here (Founder and CEO). I thought of requesting some feedback from the community on this blog. It was only natural for a multi-dimensional array database like TileDB to offer vector (i.e., 1D array) search capabilities. But the team managed to do it very well and the results surprised us. We are just getting started in this domain and a lot of new algorithms and features are coming up, but the sooner we get feedback the better.

TileDB-Vector-Search Github repo: https://github.com/TileDB-Inc/TileDB-Vector-Search

TileDB-Embedded (core array engine) Github repo: https://github.com/TileDB-Inc/TileDB

TileDB 101: Vector Search (blog to get kickstarted): https://tiledb.com/blog/tiledb-101-vector-search/

TileDB, Inc. | Full-Time | REMOTE | USA | Greece | https://tiledb.com

TileDB is the database for complex data, allowing data scientists, researchers, and analysts to access, analyze, and share any data with any tool at global scale. All data — tables, genomics, images, videos, location, time-series — across multiple domains is captured as multi-dimensional arrays. TileDB offers extreme interoperability via numerous APIs and tool integrations across the data science ecosystem, eliminating the hassles and inefficiencies of data conversion. TileDB Cloud implements a totally serverless infrastructure and delivers access control, easier data and code sharing and distributed computing at global scale, eliminating cluster management, minimizing TCO and promoting scientific collaboration and reproducibility.

TileDB, Inc. was spun out of MIT and Intel Labs in May 2017 and is backed by Two Bear Capital, Nexus Venture Partners, Uncorrelated Ventures, Intel Capital and Big Pi.

Recent press-release for our Single-Cell API 1.0 release: https://www.businesswire.com/news/home/20230328005380/en/Til...

Website: https://tiledb.com

GitHub: https://github.com/TileDB-Inc/TileDB

Docs: https://docs.tiledb.com

Blog: https://tiledb.com/blog

Our headquarters are located in Cambridge, MA and we have a subsidiary in Athens, Greece. We offer the ability to work remotely for anyone with legal residence in the US or Greece.

We have several open positions aimed at increasing TileDB’s feature set, growth and adoption. You will have the opportunity to work on innovative technology that creates impact on challenging and exciting problems in Genomics, Geospatial, Time Series, and more. Immediate features on the roadmap for TileDB Cloud include, advanced distributed computations, advanced computation pushdown, improved multi-cloud deployments and more.

We are actively seeking:

- Infrastructure Engineer / Site Reliability Engineer

- Frontend Engineer - UI and React

- Javascript library Engineer

- Senior Software Engineer: Backend (Golang, CGo, k8s, Terraform, MySQL/MariaDB)

- Senior Software Engineer: Python API (pybind11, Python, C++, CMake, scikit-build, conda)

- Senior Software Engineer: Python Data Science Tooling (Jupyter plugins)

- Senior Software Engineer: Build (CMake, C++, conda)

- Senior+ Software Engineer: Database Internals (C++, database query planning/execution, distributed execution)

Apply today at https://tiledb.workable.com !

TileDB, Inc. | Full-Time | REMOTE | USA | Greece | https://tiledb.com

TileDB is the database for complex data, allowing data scientists, researchers, and analysts to access, analyze, and share any data with any tool at global scale. All data — tables, genomics, images, videos, location, time-series — across multiple domains is captured as multi-dimensional arrays. TileDB offers extreme interoperability via numerous APIs and tool integrations across the data science ecosystem, eliminating the hassles and inefficiencies of data conversion. TileDB Cloud implements a totally serverless infrastructure and delivers access control, easier data and code sharing and distributed computing at global scale, eliminating cluster management, minimizing TCO and promoting scientific collaboration and reproducibility.

TileDB, Inc. was spun out of MIT and Intel Labs in May 2017 and is backed by Two Bear Capital, Nexus Venture Partners, Uncorrelated Ventures, Intel Capital and Big Pi.

Recent press-release for our Single-Cell API 1.0 release: https://www.businesswire.com/news/home/20230328005380/en/Til...

Website: https://tiledb.com

GitHub: https://github.com/TileDB-Inc/TileDB

Docs: https://docs.tiledb.com

Blog: https://tiledb.com/blog

Our headquarters are located in Cambridge, MA and we have a subsidiary in Athens, Greece. We offer the ability to work remotely for anyone with legal residence in the US or Greece.

We have several open positions aimed at increasing TileDB’s feature set, growth and adoption. You will have the opportunity to work on innovative technology that creates impact on challenging and exciting problems in Genomics, Geospatial, Time Series, and more. Immediate features on the roadmap for TileDB Cloud include, advanced distributed computations, advanced computation pushdown, improved multi-cloud deployments and more.

We are actively seeking:

- Infrastructure Engineer / Site Reliability Engineer

- Frontend Engineer - UI and React

- Javascript library Engineer

- Senior Software Engineer: Backend (Golang, CGo, k8s, Terraform, MySQL/MariaDB)

- Senior Software Engineer: Python API (pybind11, Python, C++, CMake, scikit-build, conda)

- Senior Software Engineer: Python Data Science Tooling (Jupyter plugins)

- Senior Software Engineer: Build (CMake, C++, conda)

- Senior+ Software Engineer: Database Internals (C++, database query planning/execution, distributed execution)

Apply today at https://tiledb.workable.com !

TileDB, Inc. | Full-Time | REMOTE | USA | Greece | https://tiledb.com

TileDB is the database for complex data, allowing data scientists, researchers, and analysts to access, analyze, and share any data with any tool at global scale. All data — tables, genomics, images, videos, location, time-series — across multiple domains is captured as multi-dimensional arrays. TileDB offers extreme interoperability via numerous APIs and tool integrations across the data science ecosystem, eliminating the hassles and inefficiencies of data conversion. TileDB Cloud implements a totally serverless infrastructure and delivers access control, easier data and code sharing and distributed computing at global scale, eliminating cluster management, minimizing TCO and promoting scientific collaboration and reproducibility.

TileDB, Inc. was spun out of MIT and Intel Labs in May 2017 and is backed by Two Bear Capital, Nexus Venture Partners, Uncorrelated Ventures, Intel Capital and Big Pi.

Recent press-release for our Single-Cell API 1.0 release: https://www.businesswire.com/news/home/20230328005380/en/Til...

Website: https://tiledb.com

GitHub: https://github.com/TileDB-Inc/TileDB

Docs: https://docs.tiledb.com

Blog: https://tiledb.com/blog

Our headquarters are located in Cambridge, MA and we have a subsidiary in Athens, Greece. We offer the ability to work remotely for anyone with legal residence in the US or Greece.

We have several open positions aimed at increasing TileDB’s feature set, growth and adoption. You will have the opportunity to work on innovative technology that creates impact on challenging and exciting problems in Genomics, Geospatial, Time Series, and more. Immediate features on the roadmap for TileDB Cloud include, advanced distributed computations, advanced computation pushdown, improved multi-cloud deployments and more.

We are actively seeking:

- Infrastructure Engineer / Site Reliability Engineer

- Frontend Engineer - UI and React

- Javascript library Engineer

- Senior Software Engineer: Backend (Golang, CGo, k8s, Terraform, MySQL/MariaDB)

- Senior Software Engineer: Python API (pybind11, Python, C++, CMake, scikit-build, conda)

- Senior Software Engineer: Python Data Science Tooling (Jupyter plugins)

- Senior Software Engineer: Build (CMake, C++, conda)

- Senior+ Software Engineer: Database Internals (C++, database query planning/execution, distributed execution)

Apply today at https://tiledb.workable.com !

TileDB, Inc. | Full-Time | REMOTE | USA | Greece | https://tiledb.com

TileDB transforms the lives of analytics professionals and data scientists with a universal database, allowing them to access, analyze, and share any data with any tool at global scale. TileDB unifies the way we think about data, delivering superior performance and foundational data management capabilities. All data — tables, genomics, images, videos, location, time-series — across multiple domains is captured as multi-dimensional arrays. TileDB offers extreme interoperability via numerous APIs and tool integrations across the data science ecosystem, eliminating the hassles and inefficiencies of data conversion. TileDB Cloud implements a totally serverless infrastructure and delivers access control, easier data and code sharing and distributed computing at global scale, eliminating cluster management, minimizing TCO and promoting scientific collaboration and reproducibility.

TileDB, Inc. was spun out of MIT and Intel Labs in May 2017 and is backed by Two Bear Capital, Nexus Venture Partners, Uncorrelated Ventures, Intel Capital and Big Pi.

Recent HN article: https://news.ycombinator.com/item?id=23896131

Website: https://tiledb.com

GitHub: https://github.com/TileDB-Inc/TileDB

Docs: https://docs.tiledb.com

Blog: https://tiledb.com/blog

Our headquarters are located in Cambridge, MA and we have a subsidiary in Athens, Greece. We offer the ability to work remotely. If you are located outside of the USA and Greece we have options to accommodate this, don't hesitate to apply!

We have several open positions aimed at increasing TileDB’s feature set, growth and adoption. You will have the opportunity to work on innovative technology that creates impact on challenging and exciting problems in Genomics, Geospatial, Time Series, and more. Immediate features on the roadmap for TileDB Cloud include, advanced distributed computations, advanced computation pushdown, improved multi-cloud deployments and more.

We are actively seeking:

- Senior Golang Engineer

- Senior Python Engineer

- Site Reliability Engineer

- React Frontend Engineer

Apply today at https://tiledb.workable.com !

TileDB, Inc. | Full-Time | REMOTE | USA | Greece | https://tiledb.com

TileDB transforms the lives of analytics professionals and data scientists with a universal database, allowing them to access, analyze, and share any data with any tool at global scale. TileDB unifies the way we think about data, delivering superior performance and foundational data management capabilities. All data — tables, genomics, images, videos, location, time-series — across multiple domains is captured as multi-dimensional arrays. TileDB offers extreme interoperability via numerous APIs and tool integrations across the data science ecosystem, eliminating the hassles and inefficiencies of data conversion. TileDB Cloud implements a totally serverless infrastructure and delivers access control, easier data and code sharing and distributed computing at global scale, eliminating cluster management, minimizing TCO and promoting scientific collaboration and reproducibility.

TileDB, Inc. was spun out of MIT and Intel Labs in May 2017 and is backed by Two Bear Capital, Nexus Venture Partners, Uncorrelated Ventures, Intel Capital and Big Pi.

Recent HN article: https://news.ycombinator.com/item?id=23896131

Website: https://tiledb.com

GitHub: https://github.com/TileDB-Inc/TileDB

Docs: https://docs.tiledb.com

Blog: https://tiledb.com/blog

Our headquarters are located in Cambridge, MA and we have a subsidiary in Athens, Greece. We offer the ability to work remotely. If you are located outside of the USA and Greece we have options to accommodate this, don't hesitate to apply!

We have several open positions aimed at increasing TileDB’s feature set, growth and adoption. You will have the opportunity to work on innovative technology that creates impact on challenging and exciting problems in Genomics, Geospatial, Time Series, and more. Immediate features on the roadmap for TileDB Cloud include, expanded marketplace functionality, advanced computation pushdown, improved multi-cloud deployments and more.

We are actively seeking:

- Senior Golang Engineer

- Senior C# Engineer

- Site Reliability Engineer

Apply today at https://tiledb.workable.com !

Hi folks, Stavros from TileDB here. Here are my two cents on tabular data. TileDB (Embedded) is a very serious competitor to Parquet, the only other sane choice IMO when it comes to storing large volumes of tabular data (especially when combined with Arrow). Admittedly, we haven’t been advertising TileDB’s tabular capabilities, but that’s only because we were busy with much more challenging applications, such as genomics (population and single-cell), LiDAR, imaging and other very convoluted (from a data format perspective) domains.

Similar to Parquet:

* TileDB is columnar and comes with a lot of compressors, checksum and encryption filters.

* TileDB is built in C++ with multi-threading and vectorization in mind

* TileDB integrates with Arrow, using zero-copy techniques

* TileDB has numerous optimized APIs (C, C++, C#, Python, R, Java, Go)

* TileDB pushes compute down to storage, similar to what Arrow does

Better than Parquet:

* TileDB is multi-dimensional, allowing rapid multi-column conditions

* TileDB builds versioning and time-traveling into the format (no need for Delta Lake, Iceberg, etc)

* TileDB allows for lock-free parallel writes / parallel reads with ACID properties (no need for Delta Lake, Iceberg, etc)

* TileDB can handle more than tables, for example n-dimensional dense arrays (e.g., for imaging, video, etc)

Useful links:

* Github repo (https://github.com/TileDB-Inc/TileDB)

* TileDB Embedded overview (https://tiledb.com/products/tiledb-embedded/)

* Docs (https://docs.tiledb.com/)

* Webinar on why arrays as a universal data model (https://tiledb.com/blog/why-arrays-as-a-universal-data-model)

Happy to hear everyone’s thoughts.

TileDB, Inc. | Full-Time | REMOTE | USA | Greece | https://tiledb.com

TileDB, Inc. is the company behind TileDB, the first universal data engine. TileDB allows analytics professionals and data scientists to access, analyze, and share complex data sets with any tool at extreme scale. TileDB overcomes the constraints of columnar tables, flat files, and SQL-only tools, handling all data with a multi-dimensional array engine and extreme interoperability across the data science ecosystem. TileDB Cloud is a totally serverless offering of TileDB, which delivers access control and enables distributed computing at planet-scale, eliminating all cluster management and minimizing cost. TileDB, Inc. was spun out of MIT and Intel Labs in May 2017 and closed a $15M Series A in July 2020, following a previous $4M Seed Round.

Recent HN article: https://news.ycombinator.com/item?id=23896131

Website: https://tiledb.com

GitHub: https://github.com/TileDB-Inc/TileDB

Docs: https://docs.tiledb.com

Blog: https://tiledb.com/blog

Our headquarters are located in Cambridge, MA and we have a subsidiary in Athens, Greece. We offer the ability to work remotely, but the candidates must reside either in the US or in Greece. US candidates must be US citizens, whereas Greek candidates must be Greek or EU citizens.

We have several open positions aimed at increasing TileDB’s feature set, growth and adoption. You will have the opportunity to work on innovative technology that creates impact on challenging and exciting problems in Genomics, Geospatial, Time Series, and more. Immediate features on the roadmap for TileDB Cloud include, optimizing our serverless framework, improving integration with JupyterLab, and integrations with Machine Learning frameworks.

We are primarily seeking:

- Engineering Project Manager

- Developer Advocate

- Senior Golang Engineer

- And many more!

Apply today at https://tiledb.workable.com !

TileDB, Inc. | Full-Time | REMOTE | USA | Greece | https://tiledb.com

TileDB, Inc. is the company behind TileDB, the first universal data engine. TileDB allows analytics professionals and data scientists to access, analyze, and share complex data sets with any tool at extreme scale. TileDB overcomes the constraints of columnar tables, flat files, and SQL-only tools, handling all data with a multi-dimensional array engine and extreme interoperability across the data science ecosystem. TileDB Cloud is a totally serverless offering of TileDB, which delivers access control and enables distributed computing at planet-scale, eliminating all cluster management and minimizing cost. TileDB, Inc. was spun out of MIT and Intel Labs in May 2017 and closed a $15M Series A in July 2020, following a previous $4M Seed Round.

Recent HN article: https://news.ycombinator.com/item?id=23896131

Website: https://tiledb.com

GitHub: https://github.com/TileDB-Inc/TileDB

Docs: https://docs.tiledb.com

Blog: https://tiledb.com/blog

Our headquarters are located in Cambridge, MA and we have a subsidiary in Athens, Greece. We offer the ability to work remotely, but the candidates must reside either in the US or in Greece. US candidates must be US citizens, whereas Greek candidates must be Greek or EU citizens.

We have several open positions aimed at increasing TileDB’s feature set, growth and adoption. You will have the opportunity to work on innovative technology that creates impact on challenging and exciting problems in Genomics, Geospatial, Time Series, and more. Immediate features on the roadmap for TileDB Cloud include, optimizing our serverless framework, improving integration with JupyterLab, and integrations with Machine Learning frameworks.

We are primarily seeking:

- Engineering Project Manager

- Developer Advocate

- Genomics C++ Engineer

- And many more!

Apply today at https://tiledb.workable.com !

TileDB, Inc. | Full-Time | REMOTE | USA | Greece | https://tiledb.com

TileDB, Inc. is the company behind TileDB, the first universal data engine. TileDB allows analytics professionals and data scientists to access, analyze, and share complex data sets with any tool at extreme scale. TileDB overcomes the constraints of columnar tables, flat files, and SQL-only tools, handling all data with a multi-dimensional array engine and extreme interoperability across the data science ecosystem. TileDB Cloud is a totally serverless offering of TileDB, which delivers access control and enables distributed computing at planet-scale, eliminating all cluster management and minimizing cost. TileDB, Inc. was spun out of MIT and Intel Labs in May 2017 and closed a $15M Series A in July 2020, following a previous $4M Seed Round.

Recent HN article: https://news.ycombinator.com/item?id=23896131

Website: https://tiledb.com

GitHub: https://github.com/TileDB-Inc/TileDB

Docs: https://docs.tiledb.com

Blog: https://tiledb.com/blog

Our headquarters are located in Cambridge, MA and we have a subsidiary in Athens, Greece. We offer the ability to work remotely, but the candidates must reside either in the US or in Greece. US candidates must be US citizens, whereas Greek candidates must be Greek or EU citizens.

We have several open positions aimed at increasing TileDB’s feature set, growth and adoption. You will have the opportunity to work on innovative technology that creates impact on challenging and exciting problems in Genomics, Geospatial, Time Series, and more. Immediate features on the roamap for TileDB Cloud include, optimizing our serverless framework, improving integration with JupyterLab, and expanding our marketplace functionality.

We are primarily seeking:

- Senior Golang Engineer

- Senior C++ Engineer

- Senior QA Engineer

- And many more!

Apply today at https://tiledb.workable.com !

TileDB, Inc. | Full-Time | REMOTE | USA | Greece | https://tiledb.com

TileDB, Inc. is the company behind TileDB, the first universal data engine. TileDB allows analytics professionals and data scientists to access, analyze, and share complex data sets with any tool at extreme scale. TileDB overcomes the constraints of columnar tables, flat files, and SQL-only tools, handling all data with a multi-dimensional array engine and extreme interoperability across the data science ecosystem. TileDB Cloud is a totally serverless offering of TileDB, which delivers access control and enables distributed computing at planet-scale, eliminating all cluster management and minimizing cost. TileDB, Inc. was spun out of MIT and Intel Labs in May 2017 and closed a $15M Series A in July 2020, following a previous $4M Seed Round.

Recent HN article: https://news.ycombinator.com/item?id=23896131

Website: https://tiledb.com

GitHub: https://github.com/TileDB-Inc/TileDB

Docs: https://docs.tiledb.com

Blog: https://tiledb.com/blog

Our headquarters are located in Cambridge, MA and we have a subsidiary in Athens, Greece. We offer the ability to work remotely, but the candidates must reside either in the US or in Greece. US candidates must be US citizens, whereas Greek candidates must be Greek or EU citizens.

We have several open positions aimed at increasing TileDB’s feature set, growth and adoption. You will have the opportunity to work on innovative technology that creates impact on challenging and exciting problems in Genomics, Geospatial, Time Series, and more. Immediate features on the roamap for TileDB Cloud include, optimizing our serverless framework, improving integration with JupyterLab, and expanding our marketplace functionality.

We are primarily seeking:

- Senior Golang Engineer

- Senior C++ Engineer

- Developer Advocate

- Senior QA Engineer

- And many more!

Apply today at https://tiledb.workable.com !

TileDB, Inc. | Full-Time | REMOTE | USA | Greece | https://tiledb.com

TileDB, Inc. is the company behind TileDB, the first universal data engine. TileDB allows analytics professionals and data scientists to access, analyze, and share complex data sets with any tool at extreme scale. TileDB overcomes the constraints of columnar tables, flat files, and SQL-only tools, handling all data with a multi-dimensional array engine and extreme interoperability across the data science ecosystem. TileDB Cloud is a totally serverless offering of TileDB, which delivers access control and enables distributed computing at planet-scale, eliminating all cluster management and minimizing cost. TileDB, Inc. was spun out of MIT and Intel Labs in May 2017 and closed a $15M Series A in July 2020, following a previous $4M Seed Round.

Recent HN article: https://news.ycombinator.com/item?id=23896131

Website: https://tiledb.com

GitHub: https://github.com/TileDB-Inc/TileDB

Docs: https://docs.tiledb.com

Blog: https://tiledb.com/blog

Our headquarters are located in Cambridge, MA and we have a subsidiary in Athens, Greece. We offer the ability to work remotely, but the candidates must reside either in the US or in Greece. US candidates must be US citizens, whereas Greek candidates must be Greek or EU citizens.

We have several open positions aimed at increasing TileDB’s feature set, growth and adoption. You will have the opportunity to work on innovative technology that creates impact on challenging and exciting problems in Genomics, Geospatial, Time Series, and more. Immediate features on the roamap for TileDB Cloud include, optimizing our serverless framework, improving integration with JupyterLab, and expanding our marketplace functionality.

We are primarily seeking:

- Senior Golang Engineer

- Senior C++ Engineer

- Senior UI/UX Web Designer

- Developer Advocate

- Senior QA Engineer

- And many more!

Apply today at https://tiledb.workable.com !

TileDB, Inc. | Full-Time | REMOTE | USA | Greece | https://tiledb.com

TileDB, Inc. is the company behind TileDB, the first universal data engine. TileDB allows analytics professionals and data scientists to access, analyze, and share complex data sets with any tool at extreme scale. TileDB overcomes the constraints of columnar tables, flat files, and SQL-only tools, handling all data with a multi-dimensional array engine and extreme interoperability across the data science ecosystem. TileDB Cloud is a totally serverless offering of TileDB, which delivers access control and enables distributed computing at planet-scale, eliminating all cluster management and minimizing cost. TileDB, Inc. was spun out of MIT and Intel Labs in May 2017 and closed a $15M Series A in July 2020, following a previous $4M Seed Round.

Recent HN article: https://news.ycombinator.com/item?id=23896131

Website: https://tiledb.com

GitHub: https://github.com/TileDB-Inc/TileDB

Docs: https://docs.tiledb.com

Blog: https://tiledb.com/blog

Our headquarters are located in Cambridge, MA and we have a subsidiary in Athens, Greece. We offer the ability to work remotely, but the candidates must reside either in the US or in Greece. US candidates must be US citizens, whereas Greek candidates must be Greek or EU citizens.

We have several open positions aimed at increasing TileDB’s feature set, growth and adoption. You will have the opportunity to work on innovative technology that creates impact on challenging and exciting problems in Genomics, Geospatial, Time Series, and more. Immediate features on the roamap for TileDB Cloud include, optimizing our serverless framework, improving integration with JupyterLab, and expanding our marketplace functionality.

We are primarily seeking:

- Senior Golang Engineer

- Senior C++ Engineer

- Senior UI/UX Web Designer

- Developer Advocate

- Senior QA Engineer

- Technical Support Engineer

- And many more!

Apply today at https://tiledb.workable.com !

Disclaimer: I'm a member of the TileDB team.

> embedded, immutable, syncable relational database.

TileDB Embedded (https://github.com/TileDB-Inc/TileDB) checks all four of these boxes. It is a universal storage engine based on dense and sparse multi-dimensional arrays. I explain each point below.

> embedded

TileDB Embedded [1] is an open source (MIT licensed) embeddable C++ library which exposes C and C++ APIs. We also built APIs for Python, R, C++, Java and Go, and integrations into MariaDB, PrestoDB, Spark, GDAL, PDAL and more. Via MariaDB we even have embeddable SQL[2]. TileDB abstracts the storage backends behind an extendible VFS class, and currently supports S3, GCS, Azure, HDFS, and local disk.

> immutable

TileDB is designed around immutable objects. Every write creates a new "fragment" using a MVCC model[3]. No file is ever updated in place. TileDB is designed to naturally handle the eventual consistency and restraints of cloud object stores. This allows for features like multi-reader/multi-writer support, time traveling, and update support all within the storage engine without any external orchestration.

> syncable

TileDB's MVCC approach to handling the eventual consistency of cloud object stores, yields directly into its sync-ability. Every write operation in TileDB creates a new "fragment" [3]. A fragment is an immutable folder (or prefix on object stores) that contains all the data from a single write session. Fragments are created with a timestamp and a UUID to ensure uniqueness. The fragment is ignored until a special `.ok` file is available in listing (each write is atomic). Incomplete fragments (without the .ok file) are gracefully ignored.

This means that every fragment is self-contained and syncable. The only requirement is that the special `.ok` file must show up only after the complete fragment folder is synced. With cloud object stores this is not a problem with the read-after-write consistency guarantees. For other systems the synchronization can be managed to handle this behavior.

> relational database

Tables can be easily modeled as multi-dimensional sparse arrays[4]. Through TileDB's integrations with MariaDB and PrestoDB (and Spark), you gain a full SQL interface, while being able to query the data directly via the language APIs without using SQL. For more robust features, such as foreign key enforcement, it is easy to use embeddable MariaDB to achieve this, while we work to push such features into the storage engine itself. We are always seeking feedback on which features are most used and requested to pushdown common operations to the storage engine.

[1] https://tiledb.com/embedded

[2] https://docs.tiledb.com/main/api-usage/embedded-sql

[3] https://docs.tiledb.com/main/basic-concepts/physical-storage

[4] https://docs.tiledb.com/main/handling-dataframes

Shouldn't this has 2014 inside the title? The story has also appeared several times in HN before.

Very interesting concept but now I think perhaps application file format using TileDB will be much better since it can support sparse data as well [1].

[1] https://github.com/TileDB-Inc/TileDB

Another option to consider here is storing your data in TileDB[1] which allows you to access it via Python/Pandas, MariaDB for SQL (embeddable or standalone), Spark, R and more. With embedded MariaDB[2] you can query directly into pandas with minimum overhead. TileDB is similar to parquet in that it can capture sparse dataframe usage, however it is more general in that TileDB supports multi-dimensional datasets and dense arrays. Other major features built directly into TileDB include handling updates, time traveling and partitioning at the library level, removing the need for using extra services like Delta Lake to deal with the numerous Parquet files you may create. TileDB also has native support for S3 and in the next release we'll have native Azure Blob Storage support.

[1] https://github.com/TileDB-Inc/TileDB

[2] https://docs.tiledb.com/developer/api-usage/embedded-sql

Disclosure: I am a member of the TileDB, Inc. team

Hi all, we'd love to hear your thoughts about TileDB (https://github.com/TileDB-Inc/TileDB), which offers the key advantages of both HDF5 and Zarr, such as chunked dense arrays, numerous compression filters and native cloud support. But TileDB is more as it also supports sparse arrays and integrates with more languages, databases and data science tools. If you'd like your data to be readable by multiple languages and tools or need array versioning, it is worth taking a look.

Docs: https://docs.tiledb.com/developer/

Website: https://tiledb.com/

Earlier HN post: https://news.ycombinator.com/item?id=15547749

Disclosure: I am a member of the TileDB team.

TileDB, Inc. | Senior Software Engineer | Cambridge, MA or REMOTE (US) | tiledb.io

TileDB, Inc. leads the development of the open source TileDB array data management software. The company closed a $1M seed in May 2017 led by Intel Capital and Nexus Venture Partners (http://www.businesswire.com/news/home/20171019005449/en), and is looking to raise a Series A round in the upcoming months.

TileDB has been featured on HN: https://news.ycombinator.com/item?id=15547749.

TileDB GitHub repo: https://github.com/TileDB-Inc/TileDB

Array data volumes are increasing in genomics, earth science, imaging, and other sensing applications, and TileDB is meeting the challenge head-on. We are a small distributed team looking to aggressively adapt TileDB to better take advantage of distributed storage and compute backends in the hybrid local-and-cloud domain.

We are primarily looking for someone to help us transition scientific data storage and analysis to the cloud. Anyone who has experience in the following areas is welcome to apply:

  - Cloud object storage and compute
  - Spark / Arrow integration
  - Scalable REST server / services and APIs
  - SaaS services around access control, data sharing, and encryption
Additionally, we would be interested in a candidate with experience in one or more of the following areas: Scientific data storage / analysis, modern C++ (C++11 and later), parallel and/or distributed programming, compute or I/O performance optimization, genomic data formats / libraries (such as bam, vcf, htslib, bcftools, etc.), encryption / security.

Our headquarters are located in Cambridge, MA. To cope efficiently with the different time zones and hiring processes, priority will be given to candidates that are located in the US, and are US citizens or permanent residents.

Apply at https://tiledb.workable.com

Contact us at [email protected] with questions.

TileDB, Inc. | Senior Software Engineer | Cambridge, MA or REMOTE (US) | tiledb.io

TileDB, Inc. leads the development of the open source TileDB array data management software. The company closed a $1M seed in May 2017 led by Intel Capital and Nexus Venture Partners (http://www.businesswire.com/news/home/20171019005449/en), and is looking to raise a Series A round in the upcoming months.

TileDB has been featured on HN: https://news.ycombinator.com/item?id=15547749

TileDB GitHub repo: https://github.com/TileDB-Inc/TileDB

We are a small distributed team looking to aggressively adapt TileDB to better take advantage of distributed storage and compute backends in the hybrid local-and-cloud domain.

We are primarily looking for someone to build access control / security into our product, however anyone who has the experience in the following areas is welcome to apply:

  - S3 object storage / AWS Batch / AWS Lambda
  - Azure blob storage / Azure Functions
  - Google cloud storage / Google Cloud Functions
  - Spark / Arrow integration
  - Scalable REST server / service and API
  - SaaS services around access control and encryption
Additionally, experience in any of the following would be a plus: Scientific data storage / analysis, Modern C++ (C++11 and later), Parallel and/or distributed programming, Compute or I/O performance optimization, Scalable object storage, Java / Spark ecosystem, Encryption / secure systems.

Our headquarters are located in Cambridge, MA. To cope efficiently with the different time zones and hiring processes, priority will be given to candidates that are located in the US, and are US citizens or permanent residents.

Apply at https://tiledb.workable.com

Contact us at [email protected] with questions.

TileDB, Inc. | Senior Software Engineer | FULL-TIME | Cambridge, MA or REMOTE (US) | tiledb.io

TileDB, Inc. leads the development of the open source TileDB array data management software. The company closed a $1M seed in May 2017 led by Intel Capital and Nexus Venture Partners (http://www.businesswire.com/news/home/20171019005449/en), and is looking to raise a Series A round in the upcoming months.

TileDB has been featured on HN: https://news.ycombinator.com/item?id=15547749.

TileDB GitHub repo: https://github.com/TileDB-Inc/TileDB

Array data volumes are increasing in genomics, earth science, imaging, and other sensing applications, and TileDB is meeting the challenge head-on. We are a small distributed team looking to aggressively adapt TileDB to better take advantage of distributed storage and compute backends in the hybrid local-and-cloud domain. We are currently looking to expand our team with someone experienced in one of the following areas:

  - S3 object storage / AWS Batch / AWS Lambda
  - Azure blob storage / Azure Functions
  - Google Cloud Storage / Google Cloud Functions
  - Spark / Arrow integration
  - Scalable REST server / service and API
  - SaaS services around access control and encryption
Additionally, experience in any of the following would be a plus:

  - Scientific data storage / analysis
  - Modern C++ (C++11 and later)
  - Parallel and/or distributed programming
  - Compute or I/O performance optimization
  - Scalable object storage
  - Java / Spark ecosystem
  - Encryption / secure systems
Our headquarters are located in Cambridge, MA. To cope efficiently with the different time zones and hiring processes, priority will be given to candidates that are located in the US, and are US citizens or permanent residents.

Apply at https://tiledb.workable.com

Contact us at [email protected] with questions.