[epic] MarkdownDB Index and Library v1 #3

rufuspollock · 2023-03-12T06:04:46Z

A database of markdown files so that you can quickly access the metadata and content you want.

All metadata including frontmatter, links, tags, tasks etc
Auto-reloading
Super simple javascript API

Bonus

Can generate sqlite so you get full sql access (if you want)

Non-features

Does not index the full-text content

Re Flowershow: Use this to replace contentlayer.dev.

See https://datahub.io/notes/markdowndb

Acceptance aka Roadmap

POC covering basic extraction etc [epic] MarkdownDB v0.1 #6
Research Obsidian dataview approach to a markdown db #5
[epic] MarkdownDB plugin system #2 - specifically parser plugins

Feature list

Marketing

[epic] MarkdownDB site (landing page) and "launch" #1

Features

Index a folder of files - create an "DB" index from a folder of markdown files (and other files including images)

Index a folder and get JS/TS objects
Index a folder and get json output
BONUS Index multiple folders (with support for configuring e.g. prefixing in some way e.g. i have all my blog files in this separate folder over here)
Command line tool for indexing: Create a markdowndb (index) on the command line
Index a folder and get SQLite

Extract structured data like:

Frontmatter metadata: Extract markdown frontmatter and add in a metadata field
Tags: Extracts tags in markdown pages
- Extract tags in frontmatter
- Extract tags in body like #abc Tags extraction from body #49
Links: links between files like [hello](abc.md) or wikilink style [[xyz]] so we can compute backlinks or deadlinks etc (see [parse] Extract Links #4)
Tasks: extract tasks like this - [ ] this is a task (See obsidian data view) #60

Data types, data enhancement and validation

Computed fields: add new metadata properties based on existing metadata e.g. a slug field computed from title field; or, adding a title based on the first h1 heading in a doc; or, a type field based on the folder of the file (e.g. these are blog posts). cf https://www.contentlayer.dev/docs/reference/source-files/define-document-type#computedfields. Computed metadata fields #54
Data validation and Document Types: validate metadata against a schema/type so that I know the data in the database is "valid" (Meta)Data Validation and Document Types #55
- deal with casting types e.g. string, number so that we can query in useful ways e.g. find me all blog posts before date X
- BYOT (bring your own types): i want to create my own types ... so that when i get an object out it is cast to the right typescript type

Inbox

Marketing

Sections on front page about major features

Have a section on front page about links feature
Have a section for tags
etc

💤

Refactor: improve our interfaces, do something similar to CachedMetadata and CachedFile
"multi-thread" support for fast indexing

Misc

➕ 2023-03-15 Add layout e.g. layout: blog as a rule in markdown db loading rather than in getStaticPaths for rendering blogs (follow up to work in datopian/datahub-next#51) ⛔2023-03-17 on having markdowndb support for rules

Rufus random notes

how can we get type stuff like contentlayer has e.g. a given type in markdown frontmatter leads to use of X typescript type/interface
check out astro-build - how do they do type stuff?

Notes

Questions

Notes on obsidian dataview API

blacksmithgu/obsidian-dataview#1811

How to handle document types 2023-03-09

I'm not sure how we want to handle types, since having it as a frontmatter field might not be the most ideal way because if we had a blog folder we'd have to add the type metadata to all the files individually.

On contentlayer.dev it uses a filePathPattern for that:

const Blog = defineDocumentType(() => ({
  name: "Blog",
  filePathPattern: `${siteConfig.blogDir}/!(index)*.md*`,
  contentType: "mdx",
  fields: {
  ...

I believe that's a good way of handling this. The caveat is that the path of a file is now determining its type and therefore folders with mixed types are impossible, although we could apply the pattern as something like *.blog.md*.

The use case I'm imaging is something like (there are probably better examples than blog):

blogs
  my-first-post.blog.mdx    // Blog type
  my-second-post.blog.mdx     // Blog type 
  index.mdx    // Generic page type 
  about-our-authors.mdx    // Generic page type
  write-for-us.contact.mdx    // Generic contact type

How could we index frontmatter into our db? 2023-03-09

My idea is to have another table for frontmatter, something like:

file_id	field	value	(maybe) type: array or string
d9fc09	title	My new post	string

file_id should be a foreign key pointing to file._id.

To increase performance, since we are going to have many more rows now, we can create a DB index on this table (using the file_id field)

If done this way we are going to be able to query mdx files using frontmatter fields. E.g: (may not be exactly this)

MyMdDb.query({ tags: [economy], frontmatter: { author: 'João' } })

The text was updated successfully, but these errors were encountered:

rufuspollock mentioned this issue Apr 28, 2023

[epic] MarkdownDB v0.1 #6

Closed

11 tasks

rufuspollock assigned rufuspollock and demenech Mar 12, 2023

rufuspollock mentioned this issue Mar 12, 2023

Research how we can build a richer metadata database to build site from datopian/flowershow#5

Closed

2 tasks

rufuspollock changed the title ~~[epic] MarkdownDB~~ [epic] MarkdownDB Index v0.1 Apr 28, 2023

rufuspollock transferred this issue from another repository Apr 28, 2023

rufuspollock changed the title ~~[epic] MarkdownDB Index v0.1~~ [epic] MarkdownDB Index and Library v1 Apr 28, 2023

rufuspollock assigned olayway Sep 23, 2023

rufuspollock pinned this issue Sep 23, 2023

rufuspollock added the Roadmap label Sep 29, 2023

rufuspollock unassigned demenech Nov 9, 2023

rufuspollock mentioned this issue Nov 22, 2023

Move job story and feature content from https://datahub.io/notes/markdowndb #50

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[epic] MarkdownDB Index and Library v1 #3

[epic] MarkdownDB Index and Library v1 #3

rufuspollock commented Mar 12, 2023 •

edited

Loading

[epic] MarkdownDB Index and Library v1 #3

[epic] MarkdownDB Index and Library v1 #3

Comments

rufuspollock commented Mar 12, 2023 • edited Loading

Acceptance aka Roadmap

Feature list

Marketing

Features

Inbox

Marketing

💤

Rufus random notes

Notes

Questions

Notes on obsidian dataview API

How to handle document types 2023-03-09

How could we index frontmatter into our db? 2023-03-09

rufuspollock commented Mar 12, 2023 •

edited

Loading