Wednesday, September 21, 2022

From What to How: what is seen from here is not seen from there.

It is well know citate of Ariel Sharon about Israel leaving Gaza. This move looks controversial taking into consideration that he was the head of troops conquested it and Sinai.

I'm not in a position to judge how true is this in politics. In programming it is 200% true.

No matter how good is your plan, it will dramatically change when you'll start to implement it. And the cause is exactly this: you see the thing from a different point.

Amendments to the previous plan:

Functionality to implement

  • Abilities to train: learning to read and listen will not be successful without learning to write and speak. Need to add 2 more flows:
    • Listen & Repeat - sentence-by-sentence or even word-by-word slow and then faster and faster
    • Dictation

Data

  • User profile data:
    • UI language
    • Subjects - in learning and available
    • Current subject and lessons available for the next learning session
  • Very useful flow - user adds a text into the First Read and system builds a new lesson out of it on the fly.

Technology to use

  • It looks like Firebase is a good starting point for PWA hosting
  • Store various relations of the data in Neo4j graph database
    • Graph database is designed specifically for storing relations
    • Query language looks very similar to one of MongoDB
    • What will happen to query performance when we'll need to query 2 different servers while collecting data for a lesson?

To be continued ...

Tuesday, August 9, 2022

From What to How: language learning app - initial analysis

This project is really huge.
Now I'm going to analyze it's first release with very limited functionality.

Functionality to implement

  • Abilities to train:
    • Read
    • Listen
  • Flow:
    • Show a text and read it to the user highlighting read portion with one color and a word to read next with another color
    • Prompt user to select unknown words
    • Show explanations for each word from publicly available dictionaries one at a time
    • Select the next texts as another uses of unknown words and add them to the reading queue
    • For the new user there are a number of simple texts in the queue.
    • If queue becomes empty - user knows each word in the text - present a list of topics to select from.
    • Ideally new text should not contain more than 10% of new words.

Data

  • Corpus of short texts indexed by:
    • Words
    • Morphems (when possible)
    • Idioms (when possible)
  • Word explanations collected on demand
  • Data about user's performance:
    • Frequency, time spent and payload of lessons
    • User's dictionary: words what never were marked as unknown or stopped to be marked as such (with count of both states or list of docs)
    • Attention: how many words are marked as unknown for the first time after a number of appearences
    • Learning effectiveness: how many times is a word marked as unknown?

Technology to use

  • A progressive web app (PWA) storing user's data on user's machine.
    • same code for desktop/tablet/mobile
    • no installation and upgrades
    • 100% private for the user
    • user == user agent(browser or mobile device)
  • Store corpus and word explanations in hosted MongoDB
    • MongoDB supports regular expression as data type.
    • I know how to do it in SQL. Now it's time to learn NoSQL 😉.
    • Hosted (not self-hosted) is a necessity. I'm not ready to run my own web server yet.
  • Use clustering for morphem and idiom extraction
    • It should be able to accept hits from users (with grain of salt of course)
  • Use GraphQL for client-server interaction
    • Easier to grow with the progect than REST
    • Steeper learning curve
  • Use Text-to-Speech (TTS) functionality available in browser
    • No need to store audio
    • Easy synchronization with text highlight
  • For the beginning implement backend in Python
    • Python is the most equiped language for ML
    • It supports all I need in backend now

Need to learn

  • Asynchrony implementation in Python
  • GraphQL
  • PyMongo
  • Motor?
  • Graphene or Ariadne?
  • Tornado?

Ready snippets

To be continued ...