{:one 1 :two 2 :array [a b c] :yes true}
cf. https://github.com/edn-format/edn
Likewise, commas are considered whitespace. They are sometimes added to make lengthy maps easier to read.
It is:
- Unambiguous
- Streamable
- Extensible
- Whitespace-insensitive, but there are formatting conventions for readability
If you want a "language" for expressing data (like configuration data), you might be interesting in having a look at EDN. https://github.com/edn-format/edn
It is:
- Streamable
- Extensible
- Whitespace-insensitive, but there are formatting conventions for readability
How about something better, like EDN? https://github.com/edn-format/edn
`{num 5, val 4}` looks fine to me, but we can do even better! We already know objects/maps are always in pairs, so we don't really need that comma either. Just do `{num 5 val 4}` and we save yet another unnecessary characters.
Of course, I didn't come up with this format myself, what I actually want JSON to be is EDN (https://github.com/edn-format/edn) which is a standalone format but also directly used in Clojure, so it already exists inside a programming language and works very well. There keys are strings though, so you example would end up being `{"num" 5 "val" 5 "person" var}`, where commas are optional.
I’ll take edn over any of “em. https://github.com/edn-format/edn
Comments and time stamps allowed, arbitrary nesting of data structures, make your own tagged literals if you need them. And commas are whitespace, mostly unnecessary.
* no enclosing element (i.e., can be streamed)
* maps, lists, even sets
* tags (like "Person". UUIDs and timestamps are built-in tags)
* floating point numbers
* integers
* comments
* UTF-8
* true booleans
* no need to worry about too many or too few commas in the right or wrong place
Implementations in almost every language under the sun [1].
The format is simple enough that it's easy to implement, verify, and test. No strange string interpretation craziness (see YAML and "Norway problem"), no ambiguity between FP and integers (see JSON), comments. And if your editor has rainbow parenthesis support, reading is actually a pleasant experience.
Spec: https://github.com/edn-format/edn
Example (linter config): https://github.com/clj-kondo/clj-kondo/blob/634294183a0aa2ca...
Yes you could. I agree that YAML is hard to parse, but a moment's reflection reveals that "steps" is an array of things, and "node-version" is a property of "with", which is a property of the second element of "steps" (along with "uses").
The thing that has tripped me up about YAML the most is the question of when I have to indent, and by how much.
Generating configuration for a system is a pain in the ass and carries all the ills of a build system, but a loud comment saying "this is generated from X!" and a Makefile do well enough most of the time.
Some ideas for configuration formats:
- .js file that evaluates to a JSON-compatible value
- [EDN][1] and other s-expressions
- a file system tree
There are two extremes. On the "no code" side there are gigantic reams of JSON/XML/YAML whose overall structure is all but impossible to understand. On the "all code" side there is a bespoke program that concisely produces the configuration but that can't be understood without already knowing the output.
I aim for the middle.
> Extends the code-as-data paradigm to maps and vectors
Basically think of it as a better JSON with non of the issues you brought up.
See: https://github.com/edn-format/edn
And example:
> #myapp/Person {:first "Fred" :last "Mertz"}
I suspect it was inspired by the whole "data is code" philosophy of lisp languages, but it seemed like a well thought out pattern for encoding and decoding data in relatively safe ways. It had a way of tagging fields to indicate that they required processing to derive the decoded value, e.g.
#inst "1985-04-12T23:20:50.52Z"
Would be interpreted as a Java DateTime object, but one could just as easily read the raw data without respecting those tags if one didn't trust the safety of the data being read.In effect the format split the work of parsing the data from decoding the data, which is a distinction I haven't seen in many other data encoding mechanisms.
The Learn X in Y Minutes: https://learnxinyminutes.com/docs/edn/
A related talk by Rich Hickey that I think you'd find interesting: https://www.youtube.com/watch?v=ROor6_NGIWU
For a schema, I'd start with what CUE has done. The idea of types that constrain down as a lattice + a separate default path really resonates with me. https://cuelang.org/
YAML's problem is that whitespace is significant. TOML could be superior to it if it weren't for the fact that they forgot to forbid indentation. And now indented TOML is everywhere, including its wikipedia page.
If we have to make a change, why not finally bite the bullet and go to the form that has existed for decades and is obviously superior to all of these formats? S-expressions. There's even been a standard for data notation brewing for some time: https://github.com/edn-format/edn
Then we can actually forego http://xkcd.com/927 and do something useful with our significantly saved mental energy.
edit I see that I'm not at all alone in wanting edn to replace all this crap. So some action points on how to actually make that happen, in order of preference:
- write or improve robust edn parsers for your ecosystem
- write or improve robust x => edn converters for your ecosystems (x=yaml,json,toml,whateverpoisontheyuserightnow)
- use edn in your projects
- advocate the use of edn
Still, I prefer Crockford's choice: that JSON numbers are defined to be numbers. Infinity and the flavors of NaN are... not numbers.
In an extensible data interchange format, like [edn][1], people could define conventions about more specific interpretations of numbers, e.g.
#ieee754/b64 45.6653 ; this is a double
We could build such a format on top of JSON (there are probably multiple), but I again agree with Crockford that this sort of thing does not belong in JSON.Makes for a bunch of headaches, though, for sure.
One example is a data scientist I used to work with. He was working with lots of machine learning libraries that liked to use NaN to mean "nothing to see here." A fellow developer ended up writing code that used some sort of convention to work around it, e.g. number := decimal | {"magic-uuid": "NaN"}. I can see why some people are of the opinion "this is stupid, just allow NaNs." I disagree.
Hm, not sure that's true, S-expressions would only define the "shape" of how you're defining something, not the semantics of how you're defining something. EDN https://github.com/edn-format/edn for all purposes is S-expressions and have support for custom literals and more, to avoid "the trouble with data types from JSON"
Some of the neat features: Custom literals / tagged elements that can have their support added for them on runtime/compile time (dates can be represented, parsed and turned into proper dates in your language). Also being able to namespace data inside of it makes things a bit easier to manage without having to result to nesting or other hacks. Very human friendly, plus machine friendly.
Biggest drawback so far seems to be performance of parsing, although I'm not sure if that's actually about the format itself, or about the small adoption of the format and therefore not many parsers focusing on speed has been written.
I felt this way as a young programmer really getting my teeth in. I was also self-taught, so I didn't have familiarity with some things that would probably be considered basics/fundamentals.
My advice regardless is: when you get this unsettling mind-expanding feeling go research prior art. Go find out how other people solve problems like it. Even if you come up wanting more/better, at least you have a lay of the land. And learn the terminology used describing the problem space to expand your hunt. You'll be amazed what you turn up!
Edit: since this is on the topic of JSON (de)serialization, while I’d love to tout the very good pattern I see in my usual stack (TypeScript) where I’m working on an offering in the space, I’d actually recommend looking at prior art in a very different stack with very different goals:
- Transit[1] which standardizes type metadata within JSON (but leaves type resolution up to producers/consumers).
- EDN[2], which is the philosophical basis for Transit, written in Clojure syntax. It’s demonstrably worse for performance but syntactically a nicer format/DX if you have tooling to deal with it, and it’s nearly tooling-free if you use the stack.
A lot of efforts to standardize rich data type representation in JSON unfortunately do it very haphazardly, so I wanted to include examples that come from the “pattern recognition/solution mapping” side as an example. Both have downsides, but they’re exceptionally well designed for what they are and deserve to be part of this discussion.
; Maps are denoted as key-value pairs inside curly brackets:
{:title "Hello, World"}
; Vectors are denoted by square brackets:
[:bookmarks 12 15 188 1234]
; Or you can use a map with a keyword that maps to a vector:
{:bookmarks [12 15 188 1234]}
; Maps are collections of key-value pairs:
{:author "Peter Parker"
:email "[email protected]"
:active? true}
; Data structures are heterogeneous and nest:
[:contents
[:section "First section"
[:p "This is the first paragraph"]
[:p "This is the second paragraph"]]]
; ^ Hiccup actually renders these to HTML.
; Everything is an expression, so strings just work:
"This text is the value of an anonymous node!"
; A matrix is just a vector:
[1 0 0
0 1 0
0 0 1]
; Or you could partition it into rows:
[[1 0 0]
[0 1 0]
[0 0 1]]
https://github.com/edn-format/edn[0] https://github.com/edn-format/edn
[1] https://github.com/cognitect/transit-format
[2] https://github.com/Datomic/fressian/wiki
[3] https://www.youtube.com/watch?v=JArZqMqsaB0&ab_channel=Cloju...
2. At the moment you can also use cloze deletions. I do have plans for things like typing in an answer, or drawing with a touch screen, but I have no plans for something like multiple choice, but I could be convinced otherwise. I'm also open to suggestions.
3. Yeah, this is probably the best part of Anki IMO. I decided not to include it in the initial version of Mochi because I thought plain markdown documents would be easier for new users to "grok". I still plan to add this kind of templating thing in the future, but I still need some time to let the idea "bake".
4. You can actually get this behavior in Anki, but it's not the default. The initial inspiration for this change came from this blog post[2], but it is roughly equivalent to the Leitner System[3]. This other blog [4] also provided a lot of influence in some of the design of the SRS system.
[0] https://github.com/edn-format/edn
[1] https://mochi.cards/faq.html#how-can-i-create-my-own-mochi-f...
[2] https://eshapard.github.io/anki/anki-new-interval-after-a-la...
[3] https://en.wikipedia.org/wiki/Leitner_system
[4] https://massimmersionapproach.com/table-of-contents/anki/
https://github.com/edn-format/edn
It addresses most of the authors problems while _also having s-expressions_, e.g.:
(foo bar baz) ;; this is valid EDN
{:foo [bar baz]} ;; this is also valid EDN
There are parsers for many popular languages, and a language already entirely based on it: Clojure. [:div "The" [:a {:href "https://www.json.org/"} "JSON format"]
" was invented by " [:em "Douglas Crockford"] "."]
The fact that EDN[3] supports keywords makes it a bit easier to parse. Representing HTML in EDN this way was first done in a library called Hiccup[4], so it’s usually called “Hiccup” even when encountered outside of the original library.1: https://holmsand.github.io/reagent/
2: https://github.com/Day8/re-frame
Here's a Java parser: https://github.com/bpsm/edn-java
E.g., send an article from the server, formatted in EDN/Hiccup[2][3]. Insert it into a component in the frontend, and it's converted to VDOM nodes. No further logic or conversion required.
[1]: https://github.com/Day8/re-frame
it has no comma issues (commas are whitespace), it has comments, it's very structured, it got better set of primitive types and literals for them, it has native support for tagged values for encoding more complex types, it's got first class support of encoding computation by virtue of being subset of clojure - lists can represent function calls, the usual lisp shenanigans, it has readability of yaml and none of it's drawbacks.
it's a shame EDN is not more widely used.
So I can't see why JSON would be a bad choice either.
There are some things in EDN missing in JSON, richer types, can be extended by the user and support for comments mostly. I'm not sure if not having those would be a deal breaker though.
I've used YAML for some things, and I haven't found it better. In fact, it's annoying to have a different syntax for your config then you do for your code. And the whitespace matters definitly has been a source of frustration.
* https://github.com/edn-format/edn * https://learnxinyminutes.com/docs/edn/ * https://www.compoundtheory.com/clojure-edn-walkthrough/
I also rather liked thorough extensibility. Namespaces were the right idea, despite clunky syntax. Today you can see Clojure doing something similar in Spec.
And while we're on the subject of XML, XSLT and Clojure; I feel like this is the best solution for readable serialization of tree-like data, and an associated ecosystem of tools (to validate, transform etc). Note some nice features for humans, like the ability to comment out a specific node, in addition to the usual line-oriented comments.
You can extend it, convert it to JSON if necessary, and it is easy to read.
- The EDN (https://github.com/edn-format/edn) data format is a further development of the code is data idea. 'Tagged Literals' are interesting feature and used in Clojure to make code shared between different Hosts, mostly JS/JVM.
- Transducers are a new interesting features that are fairly unique to Clojure (https://clojure.org/reference/transducers)
- Reducers (https://clojure.org/reference/reducers)
- Clojure Multimethods are different from CLOS in CL and one interesting feature are stand alone hierarchies. Some people would maybe call this an step back from CLOS. https://clojure.org/reference/multimethods
- Concurrency primitives like Agents and Refs (featuring Software Transactional Memory) are fairly unique to Clojure.
- Metadata. In Clojure most types can have metadata attached, meaning data that flows threw the program with your data but does not effect things like equality or size.
- Protocols are dynamic single dispatch system
- spec. Clojure Spec is a core part of Clojure now and used internally as well. Its a type of dynamic specification system and not like most systems of this type (https://clojure.org/guides/spec)
Some more information here: https://clojure.org/reference/lisps
"A [REST] (REBL) client enters a [REST application] (REBL browser) through a [simple fixed URL] (initial EDN [2] data). [All] future actions the client may take are discovered within [resource representations] (metadata) returned [from the server]. The media types used for these representations, and the link relations they may contain, are standardized. The client transitions through application states by selecting from [the links within a representation] (the annotated data) or by manipulating the representation in other ways afforded by its media type. In this way, [RESTful] (REBL) interaction is driven by [hypermedia] (metadata), rather than out-of-band information.
1: https://en.wikipedia.org/wiki/HATEOAS , the one thing that makes REST REST but nobody implements in their "RESTful APIs".
I recently saw a talk by Rich Hickey about Effective Programs [1]. The talk explains why Hickey favors dynamic types, with EDN [2] and transit [3] proposed as an alternative to more statically typed data exchange formats like protobufs and less structured ones like Json (the talk explains the reasons, nearing the end I think).
1: https://www.youtube.com/watch?v=2V1FtfBDsLU 2: https://github.com/edn-format/edn 3: https://github.com/cognitect/transit-format
Rich datastructure literals in Clojure for sets #{'foo'}, maps {:a 1, :b 2}, lists (1,2,3) vectors [1, 2, 3]
The serialization format EDN is basically JSON on steroids.
edn is an extensible data notation. A superset of edn is used by Clojure to represent programs, and it is used by Datomic and other applications as a data transfer format. This spec describes edn in isolation from those and other specific use cases, to help facilitate implementation of readers and writers in other languages, and for other uses.
yeah, i get it, "but it has native support in all browsers" is a valid argument, i just wish it wasn't.
https://github.com/edn-format/edn https://learnxinyminutes.com/docs/edn/
Now, I don't know whether commas allow for faster parsers in some way, but edn[1] seems to be doing just fine without them.
Once you get used to optional commas, it really becomes a nuisance having to type them, especially in basic data type lists. The only place where I find commas visually helpful is C-style argument lists (with type and value pairs), which JSON doesn't even use.
https://github.com/edn-format/edn
Example:
https://github.com/milikicn/activity-stream-example/blob/4db...
Not S-expression-based, though.
There are data interchange formats that use S-Expressions, namely EDN[1]. But JSON remains the most popular format for its widespread support, and its few data types map to most languages.
Something like EDN for JSON would be cool: https://github.com/edn-format/edn
It is really a pleasure to use compared to JSON and XML. While it may not be as compact as ProtoBuffers, Thrift, or Avro, it is human readable and also valid Clojure code. Libraries are ready available to convert it to JSON.
I always thought that seemed like a nice alternative data format to JSON. Anyone using this it in the wild?
var config = HJSON.parse(fs.fileReadSync('config.hjson'))
Another more obscure, and more powerful serialization format that does away with commas (commas are treated as whitespace!), has a date type, and much more is Rich Hickey's (of Clojure fame) EDN. https://github.com/edn-format/edn
{:a 1, "foo" :bar, [1 2 3] four}
XML is actually IMO not that bad at human readability, it's pretty good. It's terrible at human writability. Conversely S-exps are lovely to work with.
See also: https://github.com/edn-format/edn
I believe Clojure's EDN format offers all of the above https://github.com/edn-format/edn
Spec - https://github.com/edn-format/edn
Walkthrough - http://www.compoundtheory.com/clojure-edn-walkthrough/
It's a relatively new data format designed by Rich Hickey that has versioning and backward-compatibility baked in from the start.
EDN stands for "Extensible Data Notation". It has an extensible type system that enables you to define custom types on top of its built-in primitives, and there's no schema.
To define a type, you simply use a custom prefix/tag inline:
#wolf/pack {:alpha "Greybeard" :betas ["Frostpaw" "Blackwind" "Bloodjaw"]}\n
\nWhile you can register custom handlers for specific tags, properly implemented readers can read unknown types without requiring custom extensions.The motivating use case behind EDN was enabling the exchange of native data structures between Clojure and ClojureScript, but it's not Clojure specific -- implementations are starting to pop up in a growing number of languages (https://github.com/edn-format/edn/wiki/Implementations).
Here's the InfoQ video and a few threads from when it was announced:
https://news.ycombinator.com/item?id=4487462, \n https://groups.google.com/forum/#!topic/clojure/aRUEIlAHguU, http://www.infoq.com/interviews/hickey-clojure-reader
> So the same argument as (say) YAML, Lua tables or TNetStrings. Or, if you include binary representations, stuff like Thrift, XDR and ASN.1.\nYou can embed a JVM in PostgreSQL and write the functions yourself (or use PL/Scheme I suppose), but I would be basically amazed if anyone decides to do it for you.\nJSON gets the nod because it's understood by billions of systems. Other formats are going to struggle.
While I do not agree with his position, it is shared by many people.
https://github.com/edn-format/edn
I'd like to see a little bit more love for edn... and if you're gonna pick something incompatible with plain old JSON, why not edn?
Now if you want to physically separate the view logic and the corresponding markup generation, that's more debatable: they're extremely strongly coupled (and in fairly small chunks ideally) so you often can't trivially change one without the other, and thus keeping them together makes logical sense. See Pete Hunt's presentation which lumpypua linked, it tries to make that point fairly nicely.
[0] https://github.com/swannodette/om
[1] https://github.com/edn-format/edn
Is it too late for edn? https://github.com/edn-format/edn
Symbols that aren't "strings" are kinda neat too, and you get downright attached to arbitrary key-value mappings once you have them.
If you, too, think this would be awesome, check out extensible data notation: https://github.com/edn-format/edn.