You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
deal with casting types e.g. string, number so that we can query in useful ways e.g. find me all blog posts before date X
BYOT (bring your own types): i want to create my own types ... so that when i get an object out it is cast to the right typescript type
Inbox
Marketing
Sections on front page about major features
Have a section on front page about links feature
Have a section for tags
etc
💤
Refactor: improve our interfaces, do something similar to CachedMetadata and CachedFile
"multi-thread" support for fast indexing
Misc
➕ 2023-03-15 Add layout e.g. layout: blog as a rule in markdown db loading rather than in getStaticPaths for rendering blogs (follow up to work in datopian/datahub-next#51) ⛔2023-03-17 on having markdowndb support for rules
Rufus random notes
how can we get type stuff like contentlayer has e.g. a given type in markdown frontmatter leads to use of X typescript type/interface
check out astro-build - how do they do type stuff?
Notes
Questions
What is a ContentBase / ContentDB? ✅2023-03-07 a database (index) of content e.g. of text files on disk, images etc. DB need not store content of files but it "indexes" them i.e. has a list of them, with associated metadata etc.
Why do we need one? ✅2023-03-07 a) to replace this (basic) functionality in ContentLayer.dev so we can replace ContentLayer.dev b) so we can richer things like get files with all tags etc
What contentlayer.dev API calls do we need to replace **✅2023-03-07 ~8 of them. quite simple. see below. **
What is the different between a Content Layer (API) and a ContentBase
What are the key technical components of a ContentBase ✅2023-03-07 see diagram
What is MarkdownDB? ✅2023-03-07 It is a ContentBase whose text files are in markdown format
What information do we index about markdown files in ContentBase? ✅2023-03-07
frontmatter
list of all blocks and their types?
tags?
What is the unique identifier for files?
What are the job stories that the MarkdownDB needs to support? 🔥
What about assets other than markdown files? e.g. images and pngs? ✅2023-03-07 these should also get processed.
Does something like this already exist and how does it work?
How big will the sqlite db get? (i.e. per 1k documents indexed) NB: we aren't storing the text ... (though perhaps we could ...) 🚧2023-03-07 guess metadata is ~1kb per file. so 1k files = 1Mb and 100k files = 100Mb so seems ok for memory
What happens if the sqlite file gets really big? ✅2023-03-07 we've probably have to store it somewhere in cloud etc
What DB should we use e.g. IndexedDB or sqlite? ✅2023-03-07 propose sqlite3 b/c you get sql etc and now pretty much supported in browser if we ever need that
How do we handle the indexing of remote files, such as files in GitHub repos? ✅2023-03-07 ❌ kind of invalid question. we can index the remote files easily and then cache that locally. We aren't indexing on the fly.
Do we just store a reference to that file?
What's a minimal viable API? 🚧2023-03-08 see section below
I'm not sure how we want to handle types, since having it as a frontmatter field might not be the most ideal way because if we had a blog folder we'd have to add the type metadata to all the files individually.
On contentlayer.dev it uses a filePathPattern for that:
I believe that's a good way of handling this. The caveat is that the path of a file is now determining its type and therefore folders with mixed types are impossible, although we could apply the pattern as something like *.blog.md*.
The use case I'm imaging is something like (there are probably better examples than blog):
blogs
my-first-post.blog.mdx // Blog type
my-second-post.blog.mdx // Blog type
index.mdx // Generic page type
about-our-authors.mdx // Generic page type
write-for-us.contact.mdx // Generic contact type
How could we index frontmatter into our db? 2023-03-09
My idea is to have another table for frontmatter, something like:
file_id
field
value
(maybe) type: array or string
d9fc09
title
My new post
string
file_id should be a foreign key pointing to file._id.
To increase performance, since we are going to have many more rows now, we can create a DB index on this table (using the file_id field)
If done this way we are going to be able to query mdx files using frontmatter fields. E.g: (may not be exactly this)
A database of markdown files so that you can quickly access the metadata and content you want.
Bonus
Non-features
Re Flowershow: Use this to replace contentlayer.dev.
See https://datahub.io/notes/markdowndb
Acceptance aka Roadmap
Feature list
Marketing
Features
Index a folder of files - create an "DB" index from a folder of markdown files (and other files including images)
Extract structured data like:
#abc
Tags extraction from body #49[hello](abc.md)
or wikilink style[[xyz]]
so we can compute backlinks or deadlinks etc (see [parse] Extract Links #4)- [ ] this is a task
(See obsidian data view) #60Data types, data enhancement and validation
Inbox
Marketing
Sections on front page about major features
💤
Misc
layout
e.g.layout: blog
as a rule in markdown db loading rather than ingetStaticPaths
for rendering blogs (follow up to work in datopian/datahub-next#51) ⛔2023-03-17 on having markdowndb support for rulesRufus random notes
Notes
Questions
Do we just store a reference to that file?Notes on obsidian dataview API
blacksmithgu/obsidian-dataview#1811
How to handle document types 2023-03-09
I'm not sure how we want to handle types, since having it as a frontmatter field might not be the most ideal way because if we had a blog folder we'd have to add the type metadata to all the files individually.
On
contentlayer.dev
it uses afilePathPattern
for that:I believe that's a good way of handling this. The caveat is that the path of a file is now determining its type and therefore folders with mixed types are impossible, although we could apply the pattern as something like
*.blog.md*
.The use case I'm imaging is something like (there are probably better examples than blog):
How could we index frontmatter into our db? 2023-03-09
My idea is to have another table for frontmatter, something like:
file_id
should be a foreign key pointing tofile._id
.To increase performance, since we are going to have many more rows now, we can create a DB index on this table (using the file_id field)
If done this way we are going to be able to query mdx files using frontmatter fields. E.g: (may not be exactly this)
The text was updated successfully, but these errors were encountered: